[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986346#action_12986346
 ] 

Michael McCandless commented on LUCENE-2010:


bq. Do you want to fix the rest of the tests and remove the text-only 
keepAllSegments method?

It's actually only the QueryUtils test class that uses this... it makes an 
empty index by adding N docs and then deleting them all.  So the test-only 
API needs to be public (QueryUtils is in oal.search).  I'll mark it as 
lucene.internal...

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2010.patch


 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986267#action_12986267
 ] 

Uwe Schindler commented on LUCENE-2010:
---

Look fine to me! Its indeed quite simple. Will test this later.

Do you want to fix the rest of the tests and remove the text-only 
keepAllSegments method? At least this method should be hidden by a 
package-private accessor or, if not possible, @lucene.internal.

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2010.patch


 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2009-10-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770917#action_12770917
 ] 

Michael McCandless commented on LUCENE-2010:


Note: IndexReader must also do this.

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler

 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2009-10-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769970#action_12769970
 ] 

Uwe Schindler commented on LUCENE-2010:
---

There is one special case:
If you delete *all* documents from the whole index, no segments would keep 
alive if automatically removed.
Can we handle that? It should remain an empty segments_xxx file.

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler

 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2009-10-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769980#action_12769980
 ] 

Michael McCandless commented on LUCENE-2010:


bq. If you delete all documents from the whole index, no segments would keep 
alive if automatically removed.

IW now has a dedicated method to [efficiently] delete all docs, but yeah we 
should also short-circuit this, in case someone didn't use that method and 
instead actually deleted every doc separately.

I'd think that our solution here would automatically handle this case (drop all 
segments) as well.

On materializing deletes (IndexWriter.applyDeletes) we should simply sweep the 
segmentInfos, and drop any fully deleted segments.  Should be a simple change.

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler

 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org