[ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1476:
---------------------------------------

    Attachment: hacked-deliterator.patch

Alas.... I had a bug in my original test (my SegmentTermDocs was
incorrectly returning some deleted docs).  But even with that bug I
can't repro my original "it's faster at < 10% deletions".  Here are my
results using a pre-computed array of deleted docIDs:

||%tg deletes||query||hits||qps||qpsnew||pctg||
|0%|147|   4984|5560.1|5392.5| -3.0%|
|0%|text|  97191| 347.3| 334.1| -3.8%|
|0%|1 AND 2| 234634|  22.9|  22.8| -0.4%|
|0%|1| 386435|  88.4|  86.0| -2.7%|
|0%|1 OR 2| 535584|  20.9|  20.8| -0.5%|
|1%|147|   4933|5082.0|3643.5|-28.3%|
|1%|text|  96143| 313.9| 304.9| -2.9%|
|1%|1 AND 2| 232250|  22.1|  22.3|  0.9%|
|1%|1| 382498|  81.0|  82.3|  1.6%|
|1%|1 OR 2| 530212|  20.2|  20.2|  0.0%|
|2%|147|   4883|5133.6|3299.6|-35.7%|
|2%|text|  95190| 315.8| 289.7| -8.3%|
|2%|1 AND 2| 229870|  22.2|  22.1| -0.5%|
|2%|1| 378641|  81.2|  80.9| -0.4%|
|2%|1 OR 2| 524873|  20.3|  20.2| -0.5%|
|5%|147|   4729|5073.6|2405.2|-52.6%|
|5%|text|  92293| 315.2| 259.0|-17.8%|
|5%|1 AND 2| 222859|  22.5|  22.0| -2.2%|
|5%|1| 367000|  81.0|  77.6| -4.2%|
|5%|1 OR 2| 508632|  20.4|  19.7| -3.4%|
|10%|147|   4475|5049.6|1738.8|-65.6%|
|10%|text|  87504| 314.8| 232.6|-26.1%|
|10%|1 AND 2| 210982|  22.9|  21.7| -5.2%|
|10%|1| 347664|  81.5|  74.0| -9.2%|
|10%|1 OR 2| 481792|  21.2|  20.2| -4.7%|
|20%|147|   4012|5045.0|1117.6|-77.8%|
|20%|text|  77980| 317.2| 208.9|-34.1%|
|20%|1 AND 2| 187605|  23.9|  21.4|-10.5%|
|20%|1| 309040|  82.0|  68.2|-16.8%|
|20%|1 OR 2| 428232|  22.3|  20.2| -9.4%|
|50%|147|   2463|5283.2| 522.3|-90.1%|
|50%|text|  48331| 336.9| 176.4|-47.6%|
|50%|1 AND 2| 116887|  28.4|  23.0|-19.0%|
|50%|1| 193154|  86.4|  63.5|-26.5%|
|50%|1 OR 2| 267525|  27.6|  22.4|-18.8%|

I've attached my patch, but note that some tests fail because I don't update 
the list of deleted docs when deleteDocument is called.

I'm now feeling like we're gonna have to keep random-access to deleted docs....

> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: hacked-deliterator.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, 
> quasi_iterator_deletions_r3.diff, searchdeletes.alg, sortBench2.py, 
> sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to