[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769980#action_12769980 ]
Michael McCandless commented on LUCENE-2010: -------------------------------------------- bq. If you delete all documents from the whole index, no segments would keep alive if automatically removed. IW now has a dedicated method to [efficiently] delete all docs, but yeah we should also short-circuit this, in case someone didn't use that method and instead actually deleted every doc separately. I'd think that our solution here would automatically handle this case (drop all segments) as well. On materializing deletes (IndexWriter.applyDeletes) we should simply sweep the segmentInfos, and drop any fully deleted segments. Should be a simple change. > Remove segments with all documents deleted in commit/flush/close of > IndexWriter instead of waiting until a merge occurs. > ------------------------------------------------------------------------------------------------------------------------ > > Key: LUCENE-2010 > URL: https://issues.apache.org/jira/browse/LUCENE-2010 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.9 > Reporter: Uwe Schindler > > I do not know if this is a bug in 2.9.0, but it seems that segments with all > documents deleted are not automatically removed: > {noformat} > 4 of 14: name=_dlo docCount=5 > compound=true > hasProx=true > numFiles=2 > size (MB)=0.059 > diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - > 2009-09-21 10:25:09, os=SunOS, > os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, > source=flush} > has deletions [delFileName=_dlo_1.del] > test: open reader.........OK [5 deleted docs] > test: fields..............OK [136 fields] > test: field norms.........OK [136 fields] > test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] > test: stored fields.......OK [0 total field count; avg ? fields per doc] > test: term vectors........OK [0 total vector count; avg ? term/freq vector > fields per doc] > {noformat} > Shouldn't such segments not be removed automatically during the next > commit/close of IndexWriter? > *Mike McCandless:* > Lucene doesn't actually short-circuit this case, ie, if every single doc in a > given segment has been deleted, it will still merge it [away] like normal, > rather than simply dropping it immediately from the index, which I agree > would be a simple optimization. Can you open a new issue? I would think IW > can drop such a segment immediately (ie not wait for a merge or optimize) on > flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org