[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657924#action_12657924
]
Michael McCandless commented on LUCENE-1483:
--------------------------------------------
I set the queue size to 1000 and reran the tests:
||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|1|wiki|score|147|1000| 4984| 0.3|1356.8| 0.3|1361.3| 0.3%|
|1|wiki|score|text|1000| 97191| 0.3| 224.0| 0.3| 223.0| -0.4%|
|1|wiki|score|1|1000| 386435| 0.3| 73.6| 0.3| 72.8| -1.1%|
|1|wiki|doc|147|1000| 4984| 0.3|1527.0| 0.3|1475.0| -3.4%|
|1|wiki|doc|text|1000| 97191| 0.3| 182.5| 0.3| 235.4| 29.0%|
|1|wiki|doc|1|1000| 386435| 0.3| 50.6| 0.3| 67.7| 33.8%|
|1|wiki|doc|<all>|1000|2000000| 0.1| 22.1| 0.1| 37.8| 71.0%|
|1|simple|int|text|1000|2000000| 0.7| 10.1| 0.7| 12.8| 26.7%|
|1|simple|int|<all>|1000|2000000| 0.6| 19.0| 0.6| 30.5| 60.5%|
|1|simple|country|text|1000|2000000| 0.9| 10.1| 0.7| 12.5| 23.8%|
|1|simple|country|<all>|1000|2000000| 0.9| 19.5| 0.6| 29.1| 49.2%|
|1|wiki|title|147|1000| 4984| 4.0| 733.1| 2.0| 732.2| -0.1%|
|1|wiki|title|text|1000| 97191| 4.1| 109.1| 2.1| 114.7| 5.1%|
|1|wiki|title|1|1000| 386435| 4.1| 47.1| 2.1| 55.4| 17.6%|
|1|wiki|title|<all>|1000|2000000| 4.1| 19.4| 2.1| 30.5| 57.2%|
||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|10|wiki|score|147|1000| 4984| 0.3|1259.4| 0.3|1274.0| 1.2%|
|10|wiki|score|text|1000| 97191| 0.4| 215.2| 0.4| 220.0| 2.2%|
|10|wiki|score|1|1000| 386435| 0.4| 69.6| 0.4| 72.0| 3.4%|
|10|wiki|doc|147|1000| 4984| 0.3|1409.0| 0.3|1394.7| -1.0%|
|10|wiki|doc|text|1000| 97191| 0.4| 192.0| 0.4| 232.5| 21.1%|
|10|wiki|doc|1|1000| 386435| 0.4| 53.0| 0.4| 66.3| 25.1%|
|10|wiki|doc|<all>|1000|2000000| 0.1| 11.9| 0.1| 37.5|215.1%|
|10|simple|int|text|1000|2000000| 1.2| 9.8| 0.6| 12.8| 30.6%|
|10|simple|int|<all>|1000|2000000| 1.2| 11.0| 0.8| 30.2|174.5%|
|10|simple|country|text|1000|2000000| 1.1| 9.8| 0.6| 12.4| 26.5%|
|10|simple|country|<all>|1000|2000000| 1.1| 11.0| 0.5| 29.1|164.5%|
|10|wiki|title|147|1000| 4984| 26.0| 655.2| 2.1| 84.7|-87.1%|
|10|wiki|title|text|1000| 97191| 26.3| 100.4| 2.2| 77.8|-22.5%|
|10|wiki|title|1|1000| 386435| 26.0| 42.3| 2.6| 48.4| 14.4%|
|10|wiki|title|<all>|1000|2000000| 26.1| 10.9| 2.6| 28.5|161.5%|
||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|100|wiki|score|147|1000| 4984| 0.4| 704.1| 0.5| 677.5| -3.8%|
|100|wiki|score|text|1000| 97191| 0.4| 169.5| 0.5| 186.0| 9.7%|
|100|wiki|score|1|1000| 386435| 0.4| 56.5| 0.5| 67.9| 20.2%|
|100|wiki|doc|147|1000| 4984| 0.4| 785.0| 0.4| 724.0| -7.8%|
|100|wiki|doc|text|1000| 97191| 0.4| 159.9| 0.4| 204.7| 28.0%|
|100|wiki|doc|1|1000| 386435| 0.4| 44.9| 0.4| 64.8| 44.3%|
|100|wiki|doc|<all>|1000|2000000| 0.2| 7.8| 0.1| 40.4|417.9%|
|100|simple|int|text|1000|2000000| 3.3| 8.4| 1.4| 10.3| 22.6%|
|100|simple|int|<all>|1000|2000000| 3.4| 7.4| 1.1| 32.4|337.8%|
|100|simple|country|text|1000|2000000| 1.4| 8.6| 0.7| 10.0| 16.3%|
|100|simple|country|<all>|1000|2000000| 1.5| 7.3| 0.6| 28.6|291.8%|
|100|wiki|title|147|1000| 4984| 189.0| 446.3| 2.4| 19.8|-95.6%|
|100|wiki|title|text|1000| 97191| 188.5| 87.7| 2.3| 27.5|-68.6%|
|100|wiki|title|1|1000| 386435| 190.4| 41.1| 2.7| 24.6|-40.1%|
|100|wiki|title|<all>|1000|2000000| 189.2| 7.4| 3.0| 18.4|148.6%|
Performance clearly gets worse for queries not hitting many docs, and
with a large queue, against an index with a large number of
segments, and sorting by a unique String field (like title).
The slowdown for "147" at 100 segments is quite bad.
So... I wonder how often users of Lucene set a very large queue size
(to do some sort of post filtering, which could be more efficiently
done as a real Filter, but...). I think it may a non-trivial number,
so... what to do?
EG we could offer a different collector that's better optimized
towards collecting a large topN (probably doing the toplevel
FieldCache that's done today)? Or, we could explore a hybrid approach
whereby a slot is only switched to the current segment when it's first
visited again (instead of updating all of them on switching readers)?
Or... something else?
> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.9
> Reporter: Mark Miller
> Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing
> for individual segment reloading on reopen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]