[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986602#action_12986602 ] Uwe Schindler commented on LUCENE-2010: --- Thanks for taking care! > Remove segments with all documents deleted in commit/flush/close of > IndexWriter instead of waiting until a merge occurs. > > > Key: LUCENE-2010 > URL: https://issues.apache.org/jira/browse/LUCENE-2010 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2010.patch > > > I do not know if this is a bug in 2.9.0, but it seems that segments with all > documents deleted are not automatically removed: > {noformat} > 4 of 14: name=_dlo docCount=5 > compound=true > hasProx=true > numFiles=2 > size (MB)=0.059 > diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - > 2009-09-21 10:25:09, os=SunOS, > os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, > source=flush} > has deletions [delFileName=_dlo_1.del] > test: open reader.OK [5 deleted docs] > test: fields..OK [136 fields] > test: field norms.OK [136 fields] > test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] > test: stored fields...OK [0 total field count; avg ? fields per doc] > test: term vectorsOK [0 total vector count; avg ? term/freq vector > fields per doc] > {noformat} > Shouldn't such segments not be removed automatically during the next > commit/close of IndexWriter? > *Mike McCandless:* > Lucene doesn't actually short-circuit this case, ie, if every single doc in a > given segment has been deleted, it will still merge it [away] like normal, > rather than simply dropping it immediately from the index, which I agree > would be a simple optimization. Can you open a new issue? I would think IW > can drop such a segment immediately (ie not wait for a merge or optimize) on > flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986346#action_12986346 ] Michael McCandless commented on LUCENE-2010: bq. Do you want to fix the rest of the tests and remove the text-only keepAllSegments method? It's actually only the QueryUtils test class that uses this... it makes an "empty" index by adding N docs and then deleting them all. So the test-only API needs to be public (QueryUtils is in oal.search). I'll mark it as lucene.internal... > Remove segments with all documents deleted in commit/flush/close of > IndexWriter instead of waiting until a merge occurs. > > > Key: LUCENE-2010 > URL: https://issues.apache.org/jira/browse/LUCENE-2010 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2010.patch > > > I do not know if this is a bug in 2.9.0, but it seems that segments with all > documents deleted are not automatically removed: > {noformat} > 4 of 14: name=_dlo docCount=5 > compound=true > hasProx=true > numFiles=2 > size (MB)=0.059 > diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - > 2009-09-21 10:25:09, os=SunOS, > os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, > source=flush} > has deletions [delFileName=_dlo_1.del] > test: open reader.OK [5 deleted docs] > test: fields..OK [136 fields] > test: field norms.OK [136 fields] > test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] > test: stored fields...OK [0 total field count; avg ? fields per doc] > test: term vectorsOK [0 total vector count; avg ? term/freq vector > fields per doc] > {noformat} > Shouldn't such segments not be removed automatically during the next > commit/close of IndexWriter? > *Mike McCandless:* > Lucene doesn't actually short-circuit this case, ie, if every single doc in a > given segment has been deleted, it will still merge it [away] like normal, > rather than simply dropping it immediately from the index, which I agree > would be a simple optimization. Can you open a new issue? I would think IW > can drop such a segment immediately (ie not wait for a merge or optimize) on > flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986267#action_12986267 ] Uwe Schindler commented on LUCENE-2010: --- Look fine to me! Its indeed quite simple. Will test this later. Do you want to fix the rest of the tests and remove the text-only keepAllSegments method? At least this method should be hidden by a package-private accessor or, if not possible, @lucene.internal. > Remove segments with all documents deleted in commit/flush/close of > IndexWriter instead of waiting until a merge occurs. > > > Key: LUCENE-2010 > URL: https://issues.apache.org/jira/browse/LUCENE-2010 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2010.patch > > > I do not know if this is a bug in 2.9.0, but it seems that segments with all > documents deleted are not automatically removed: > {noformat} > 4 of 14: name=_dlo docCount=5 > compound=true > hasProx=true > numFiles=2 > size (MB)=0.059 > diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - > 2009-09-21 10:25:09, os=SunOS, > os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, > source=flush} > has deletions [delFileName=_dlo_1.del] > test: open reader.OK [5 deleted docs] > test: fields..OK [136 fields] > test: field norms.OK [136 fields] > test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] > test: stored fields...OK [0 total field count; avg ? fields per doc] > test: term vectorsOK [0 total vector count; avg ? term/freq vector > fields per doc] > {noformat} > Shouldn't such segments not be removed automatically during the next > commit/close of IndexWriter? > *Mike McCandless:* > Lucene doesn't actually short-circuit this case, ie, if every single doc in a > given segment has been deleted, it will still merge it [away] like normal, > rather than simply dropping it immediately from the index, which I agree > would be a simple optimization. Can you open a new issue? I would think IW > can drop such a segment immediately (ie not wait for a merge or optimize) on > flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org