[
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668583#action_12668583
]
Michael McCandless commented on LUCENE-1476:
--------------------------------------------
Actually I made one mistake running your standalone test -- I had
allowed the "createIndex" to run more than once, and so I think I had
tested 30K docs with 1875 deletes (6.25%).
I just removed the index and recreated it, so I have 15K docs and 1875
deletes (12.5%). On the mac pro I now see the patch at 4.0% slower
(4672 ms to 4859 ms), and on a Debian Linux box (kernel 2.6.22.1, java
1.5.0_08-b03) I see it 0.8% slower (7298 ms to 7357 ms).
bq. The Mac can be somewhat unreliable for performance results
I've actually found it to be quite reliable. What I love most about
it is, as long as you shut down all extraneous processes, it gives
very repeatable results. I haven't found the same true (or, less so)
of various Linux's & Windows.
bq. OpenBitSet didn't seem to make much of a difference
This is very hard to believe -- the nextSetBit impl in BitVector (in
the patch) is extremely inefficient. OpenBitSet's impl ought to be
much faster.
{quote}
The other option is something like P4Delta which stores the doc
ids in a compressed form solely for iterating.
{quote}
I think that will be too costly here (but is a good fit for
postings).
bq. Is this what you mean by sparse representation?
Actually I meant a simple sorted list of ints, but even for that I'm
worried about the skipTo cost (if we use a normal binary search). I'm
not sure it can be made fast enough (ie faster than random access
we have today).
> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
> Key: LUCENE-1476
> URL: https://issues.apache.org/jira/browse/LUCENE-1476
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Priority: Trivial
> Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch,
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff,
> quasi_iterator_deletions_r2.diff, searchdeletes.alg, sortBench2.py,
> sortCollate2.py, TestDeletesDocIdSet.java
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from
> IndexReader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]