[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Michael McCandless (JIRA) Thu, 18 Dec 2008 13:54:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657924#action_12657924
 ]


Michael McCandless commented on LUCENE-1483:
--------------------------------------------

I set the queue size to 1000 and reran the tests:

||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|1|wiki|score|147|1000|   4984|   0.3|1356.8|   0.3|1361.3|  0.3%|
|1|wiki|score|text|1000|  97191|   0.3| 224.0|   0.3| 223.0| -0.4%|
|1|wiki|score|1|1000| 386435|   0.3|  73.6|   0.3|  72.8| -1.1%|
|1|wiki|doc|147|1000|   4984|   0.3|1527.0|   0.3|1475.0| -3.4%|
|1|wiki|doc|text|1000|  97191|   0.3| 182.5|   0.3| 235.4| 29.0%|
|1|wiki|doc|1|1000| 386435|   0.3|  50.6|   0.3|  67.7| 33.8%|
|1|wiki|doc|<all>|1000|2000000|   0.1|  22.1|   0.1|  37.8| 71.0%|
|1|simple|int|text|1000|2000000|   0.7|  10.1|   0.7|  12.8| 26.7%|
|1|simple|int|<all>|1000|2000000|   0.6|  19.0|   0.6|  30.5| 60.5%|
|1|simple|country|text|1000|2000000|   0.9|  10.1|   0.7|  12.5| 23.8%|
|1|simple|country|<all>|1000|2000000|   0.9|  19.5|   0.6|  29.1| 49.2%|
|1|wiki|title|147|1000|   4984|   4.0| 733.1|   2.0| 732.2| -0.1%|
|1|wiki|title|text|1000|  97191|   4.1| 109.1|   2.1| 114.7|  5.1%|
|1|wiki|title|1|1000| 386435|   4.1|  47.1|   2.1|  55.4| 17.6%|
|1|wiki|title|<all>|1000|2000000|   4.1|  19.4|   2.1|  30.5| 57.2%|
||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|10|wiki|score|147|1000|   4984|   0.3|1259.4|   0.3|1274.0|  1.2%|
|10|wiki|score|text|1000|  97191|   0.4| 215.2|   0.4| 220.0|  2.2%|
|10|wiki|score|1|1000| 386435|   0.4|  69.6|   0.4|  72.0|  3.4%|
|10|wiki|doc|147|1000|   4984|   0.3|1409.0|   0.3|1394.7| -1.0%|
|10|wiki|doc|text|1000|  97191|   0.4| 192.0|   0.4| 232.5| 21.1%|
|10|wiki|doc|1|1000| 386435|   0.4|  53.0|   0.4|  66.3| 25.1%|
|10|wiki|doc|<all>|1000|2000000|   0.1|  11.9|   0.1|  37.5|215.1%|
|10|simple|int|text|1000|2000000|   1.2|   9.8|   0.6|  12.8| 30.6%|
|10|simple|int|<all>|1000|2000000|   1.2|  11.0|   0.8|  30.2|174.5%|
|10|simple|country|text|1000|2000000|   1.1|   9.8|   0.6|  12.4| 26.5%|
|10|simple|country|<all>|1000|2000000|   1.1|  11.0|   0.5|  29.1|164.5%|
|10|wiki|title|147|1000|   4984|  26.0| 655.2|   2.1|  84.7|-87.1%|
|10|wiki|title|text|1000|  97191|  26.3| 100.4|   2.2|  77.8|-22.5%|
|10|wiki|title|1|1000| 386435|  26.0|  42.3|   2.6|  48.4| 14.4%|
|10|wiki|title|<all>|1000|2000000|  26.1|  10.9|   2.6|  28.5|161.5%|
||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg||
|100|wiki|score|147|1000|   4984|   0.4| 704.1|   0.5| 677.5| -3.8%|
|100|wiki|score|text|1000|  97191|   0.4| 169.5|   0.5| 186.0|  9.7%|
|100|wiki|score|1|1000| 386435|   0.4|  56.5|   0.5|  67.9| 20.2%|
|100|wiki|doc|147|1000|   4984|   0.4| 785.0|   0.4| 724.0| -7.8%|
|100|wiki|doc|text|1000|  97191|   0.4| 159.9|   0.4| 204.7| 28.0%|
|100|wiki|doc|1|1000| 386435|   0.4|  44.9|   0.4|  64.8| 44.3%|
|100|wiki|doc|<all>|1000|2000000|   0.2|   7.8|   0.1|  40.4|417.9%|
|100|simple|int|text|1000|2000000|   3.3|   8.4|   1.4|  10.3| 22.6%|
|100|simple|int|<all>|1000|2000000|   3.4|   7.4|   1.1|  32.4|337.8%|
|100|simple|country|text|1000|2000000|   1.4|   8.6|   0.7|  10.0| 16.3%|
|100|simple|country|<all>|1000|2000000|   1.5|   7.3|   0.6|  28.6|291.8%|
|100|wiki|title|147|1000|   4984| 189.0| 446.3|   2.4|  19.8|-95.6%|
|100|wiki|title|text|1000|  97191| 188.5|  87.7|   2.3|  27.5|-68.6%|
|100|wiki|title|1|1000| 386435| 190.4|  41.1|   2.7|  24.6|-40.1%|
|100|wiki|title|<all>|1000|2000000| 189.2|   7.4|   3.0|  18.4|148.6%|

Performance clearly gets worse for queries not hitting many docs, and
with a large queue, against an index with a large number of
segments, and sorting by a unique String field (like title).

The slowdown for "147" at 100 segments is quite bad.

So... I wonder how often users of Lucene set a very large queue size
(to do some sort of post filtering, which could be more efficiently
done as a real Filter, but...).  I think it may a non-trivial number,
so... what to do?

EG we could offer a different collector that's better optimized
towards collecting a large topN (probably doing the toplevel
FieldCache that's done today)?  Or, we could explore a hybrid approach
whereby a slot is only switched to the current segment when it's first
visited again (instead of updating all of them on switching readers)?
Or... something else?

> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Reply via email to