[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986346#action_12986346 ] Michael McCandless commented on LUCENE-2010: bq. Do you want to fix the rest of the tests and remove the text-only keepAllSegments method? It's actually only the QueryUtils test class that uses this... it makes an empty index by adding N docs and then deleting them all. So the test-only API needs to be public (QueryUtils is in oal.search). I'll mark it as lucene.internal... Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs. Key: LUCENE-2010 URL: https://issues.apache.org/jira/browse/LUCENE-2010 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2010.patch I do not know if this is a bug in 2.9.0, but it seems that segments with all documents deleted are not automatically removed: {noformat} 4 of 14: name=_dlo docCount=5 compound=true hasProx=true numFiles=2 size (MB)=0.059 diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09, os=SunOS, os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush} has deletions [delFileName=_dlo_1.del] test: open reader.OK [5 deleted docs] test: fields..OK [136 fields] test: field norms.OK [136 fields] test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] test: stored fields...OK [0 total field count; avg ? fields per doc] test: term vectorsOK [0 total vector count; avg ? term/freq vector fields per doc] {noformat} Shouldn't such segments not be removed automatically during the next commit/close of IndexWriter? *Mike McCandless:* Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment has been deleted, it will still merge it [away] like normal, rather than simply dropping it immediately from the index, which I agree would be a simple optimization. Can you open a new issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize) on flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986267#action_12986267 ] Uwe Schindler commented on LUCENE-2010: --- Look fine to me! Its indeed quite simple. Will test this later. Do you want to fix the rest of the tests and remove the text-only keepAllSegments method? At least this method should be hidden by a package-private accessor or, if not possible, @lucene.internal. Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs. Key: LUCENE-2010 URL: https://issues.apache.org/jira/browse/LUCENE-2010 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2010.patch I do not know if this is a bug in 2.9.0, but it seems that segments with all documents deleted are not automatically removed: {noformat} 4 of 14: name=_dlo docCount=5 compound=true hasProx=true numFiles=2 size (MB)=0.059 diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09, os=SunOS, os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush} has deletions [delFileName=_dlo_1.del] test: open reader.OK [5 deleted docs] test: fields..OK [136 fields] test: field norms.OK [136 fields] test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] test: stored fields...OK [0 total field count; avg ? fields per doc] test: term vectorsOK [0 total vector count; avg ? term/freq vector fields per doc] {noformat} Shouldn't such segments not be removed automatically during the next commit/close of IndexWriter? *Mike McCandless:* Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment has been deleted, it will still merge it [away] like normal, rather than simply dropping it immediately from the index, which I agree would be a simple optimization. Can you open a new issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize) on flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770917#action_12770917 ] Michael McCandless commented on LUCENE-2010: Note: IndexReader must also do this. Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs. Key: LUCENE-2010 URL: https://issues.apache.org/jira/browse/LUCENE-2010 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Uwe Schindler I do not know if this is a bug in 2.9.0, but it seems that segments with all documents deleted are not automatically removed: {noformat} 4 of 14: name=_dlo docCount=5 compound=true hasProx=true numFiles=2 size (MB)=0.059 diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09, os=SunOS, os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush} has deletions [delFileName=_dlo_1.del] test: open reader.OK [5 deleted docs] test: fields..OK [136 fields] test: field norms.OK [136 fields] test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] test: stored fields...OK [0 total field count; avg ? fields per doc] test: term vectorsOK [0 total vector count; avg ? term/freq vector fields per doc] {noformat} Shouldn't such segments not be removed automatically during the next commit/close of IndexWriter? *Mike McCandless:* Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment has been deleted, it will still merge it [away] like normal, rather than simply dropping it immediately from the index, which I agree would be a simple optimization. Can you open a new issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize) on flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769970#action_12769970 ] Uwe Schindler commented on LUCENE-2010: --- There is one special case: If you delete *all* documents from the whole index, no segments would keep alive if automatically removed. Can we handle that? It should remain an empty segments_xxx file. Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs. Key: LUCENE-2010 URL: https://issues.apache.org/jira/browse/LUCENE-2010 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Uwe Schindler I do not know if this is a bug in 2.9.0, but it seems that segments with all documents deleted are not automatically removed: {noformat} 4 of 14: name=_dlo docCount=5 compound=true hasProx=true numFiles=2 size (MB)=0.059 diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09, os=SunOS, os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush} has deletions [delFileName=_dlo_1.del] test: open reader.OK [5 deleted docs] test: fields..OK [136 fields] test: field norms.OK [136 fields] test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] test: stored fields...OK [0 total field count; avg ? fields per doc] test: term vectorsOK [0 total vector count; avg ? term/freq vector fields per doc] {noformat} Shouldn't such segments not be removed automatically during the next commit/close of IndexWriter? *Mike McCandless:* Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment has been deleted, it will still merge it [away] like normal, rather than simply dropping it immediately from the index, which I agree would be a simple optimization. Can you open a new issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize) on flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
[ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769980#action_12769980 ] Michael McCandless commented on LUCENE-2010: bq. If you delete all documents from the whole index, no segments would keep alive if automatically removed. IW now has a dedicated method to [efficiently] delete all docs, but yeah we should also short-circuit this, in case someone didn't use that method and instead actually deleted every doc separately. I'd think that our solution here would automatically handle this case (drop all segments) as well. On materializing deletes (IndexWriter.applyDeletes) we should simply sweep the segmentInfos, and drop any fully deleted segments. Should be a simple change. Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs. Key: LUCENE-2010 URL: https://issues.apache.org/jira/browse/LUCENE-2010 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Uwe Schindler I do not know if this is a bug in 2.9.0, but it seems that segments with all documents deleted are not automatically removed: {noformat} 4 of 14: name=_dlo docCount=5 compound=true hasProx=true numFiles=2 size (MB)=0.059 diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09, os=SunOS, os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush} has deletions [delFileName=_dlo_1.del] test: open reader.OK [5 deleted docs] test: fields..OK [136 fields] test: field norms.OK [136 fields] test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens] test: stored fields...OK [0 total field count; avg ? fields per doc] test: term vectorsOK [0 total vector count; avg ? term/freq vector fields per doc] {noformat} Shouldn't such segments not be removed automatically during the next commit/close of IndexWriter? *Mike McCandless:* Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment has been deleted, it will still merge it [away] like normal, rather than simply dropping it immediately from the index, which I agree would be a simple optimization. Can you open a new issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize) on flushing new deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org