[ https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4485: --------------------------------------- Attachment: LUCENE-4485.patch Simple patch ... > CheckIndex's term stats should not include deleted docs > ------------------------------------------------------- > > Key: LUCENE-4485 > URL: https://issues.apache.org/jira/browse/LUCENE-4485 > Project: Lucene - Core > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-4485.patch > > > I was looking at the CheckIndex output on and index that has deletions, eg: > {noformat} > 4 of 30: name=_90 docCount=588408 > codec=Lucene41 > compound=false > numFiles=14 > size (MB)=265.318 > diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, > source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64, > mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation} > has deletions [delGen=1] > test: open reader.........OK [39351 deleted docs] > test: fields..............OK [8 fields] > test: field norms.........OK [2 fields] > test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; > 65597188 tokens] > test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 > terms/docs pairs; 70293065 tokens] > test: stored fields.......OK [1647171 total field count; avg 3 fields per > doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > test: docvalues...........OK [0 total doc count; 1 docvalues fields] > {noformat} > If you compare the {{test: terms, freq, prox}} (includes deletions) and the > next line (doesn't include deletions), it's confusing because only the 3rd > number (tokens) reflects deletions. I think the first two numbers should > also reflect deletions? This way an app could get a sense of how much > "deadweight" is in the index due to un-reclaimed deletions... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org