[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Marvin Humphrey (JIRA) Thu, 29 Jan 2009 20:25:25 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668740#action_12668740
 ]


Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

> Numbers with Marvin's latest patch:

Presumably you spliced the improved nextSetBit into Jason's patch, correct?  I 
wonder how my patch on its own would do, since there's less abstraction. Didn't 
you have a close-to-ideal patch using sorted ints that performed well up to 10% 
deletions?  What did that look like?

> I think we should try list-of-sorted-ints?

That should help with the situation where deletes are sparse, particularly when 
the term is rare (such as those searches for "147"), since it will remove the 
cost of scanning through a bunch of empty bits.

I'm also curious what happens if we do without the null-check here:

{code}
+      if (deletedDocsIt != null) {
+        if (doc > nextDeletion) {
+          if (deletedDocsIt.skipTo(doc)) 
+            nextDeletion = deletedDocsIt.doc();
+        } 
+        if (doc == nextDeletion)
+          continue;
       }
{code}

When there are no deletions, nextDeletion is set and left at Integer.MAX_VALUE, 
so we'd get a comparison that's always false for the life of the TermDocs 
instead of an always-null null check. Possibly we'd slow down the no-deletions 
case while speeding up all others, but maybe the processor does a good job at 
predicting the result of the comparison.

I also suspect that when there are many deletions, the sheer number of method 
calls to perform the deletions iteration is a burden.  The iterator has to 
compete with an inline-able method from a final class (BitVector).



> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, 
> quasi_iterator_deletions_r2.diff, quasi_iterator_deletions_r3.diff, 
> searchdeletes.alg, sortBench2.py, sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Reply via email to