[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1476: --------------------------------------- Attachment: hacked-deliterator.patch Alas.... I had a bug in my original test (my SegmentTermDocs was incorrectly returning some deleted docs). But even with that bug I can't repro my original "it's faster at < 10% deletions". Here are my results using a pre-computed array of deleted docIDs: ||%tg deletes||query||hits||qps||qpsnew||pctg|| |0%|147| 4984|5560.1|5392.5| -3.0%| |0%|text| 97191| 347.3| 334.1| -3.8%| |0%|1 AND 2| 234634| 22.9| 22.8| -0.4%| |0%|1| 386435| 88.4| 86.0| -2.7%| |0%|1 OR 2| 535584| 20.9| 20.8| -0.5%| |1%|147| 4933|5082.0|3643.5|-28.3%| |1%|text| 96143| 313.9| 304.9| -2.9%| |1%|1 AND 2| 232250| 22.1| 22.3| 0.9%| |1%|1| 382498| 81.0| 82.3| 1.6%| |1%|1 OR 2| 530212| 20.2| 20.2| 0.0%| |2%|147| 4883|5133.6|3299.6|-35.7%| |2%|text| 95190| 315.8| 289.7| -8.3%| |2%|1 AND 2| 229870| 22.2| 22.1| -0.5%| |2%|1| 378641| 81.2| 80.9| -0.4%| |2%|1 OR 2| 524873| 20.3| 20.2| -0.5%| |5%|147| 4729|5073.6|2405.2|-52.6%| |5%|text| 92293| 315.2| 259.0|-17.8%| |5%|1 AND 2| 222859| 22.5| 22.0| -2.2%| |5%|1| 367000| 81.0| 77.6| -4.2%| |5%|1 OR 2| 508632| 20.4| 19.7| -3.4%| |10%|147| 4475|5049.6|1738.8|-65.6%| |10%|text| 87504| 314.8| 232.6|-26.1%| |10%|1 AND 2| 210982| 22.9| 21.7| -5.2%| |10%|1| 347664| 81.5| 74.0| -9.2%| |10%|1 OR 2| 481792| 21.2| 20.2| -4.7%| |20%|147| 4012|5045.0|1117.6|-77.8%| |20%|text| 77980| 317.2| 208.9|-34.1%| |20%|1 AND 2| 187605| 23.9| 21.4|-10.5%| |20%|1| 309040| 82.0| 68.2|-16.8%| |20%|1 OR 2| 428232| 22.3| 20.2| -9.4%| |50%|147| 2463|5283.2| 522.3|-90.1%| |50%|text| 48331| 336.9| 176.4|-47.6%| |50%|1 AND 2| 116887| 28.4| 23.0|-19.0%| |50%|1| 193154| 86.4| 63.5|-26.5%| |50%|1 OR 2| 267525| 27.6| 22.4|-18.8%| I've attached my patch, but note that some tests fail because I don't update the list of deleted docs when deleteDocument is called. I'm now feeling like we're gonna have to keep random-access to deleted docs.... > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: hacked-deliterator.patch, LUCENE-1476.patch, > LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, > quasi_iterator_deletions_r3.diff, searchdeletes.alg, sortBench2.py, > sortCollate2.py, TestDeletesDocIdSet.java > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org