[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657924#action_12657924 ]
Michael McCandless commented on LUCENE-1483: -------------------------------------------- I set the queue size to 1000 and reran the tests: ||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg|| |1|wiki|score|147|1000| 4984| 0.3|1356.8| 0.3|1361.3| 0.3%| |1|wiki|score|text|1000| 97191| 0.3| 224.0| 0.3| 223.0| -0.4%| |1|wiki|score|1|1000| 386435| 0.3| 73.6| 0.3| 72.8| -1.1%| |1|wiki|doc|147|1000| 4984| 0.3|1527.0| 0.3|1475.0| -3.4%| |1|wiki|doc|text|1000| 97191| 0.3| 182.5| 0.3| 235.4| 29.0%| |1|wiki|doc|1|1000| 386435| 0.3| 50.6| 0.3| 67.7| 33.8%| |1|wiki|doc|<all>|1000|2000000| 0.1| 22.1| 0.1| 37.8| 71.0%| |1|simple|int|text|1000|2000000| 0.7| 10.1| 0.7| 12.8| 26.7%| |1|simple|int|<all>|1000|2000000| 0.6| 19.0| 0.6| 30.5| 60.5%| |1|simple|country|text|1000|2000000| 0.9| 10.1| 0.7| 12.5| 23.8%| |1|simple|country|<all>|1000|2000000| 0.9| 19.5| 0.6| 29.1| 49.2%| |1|wiki|title|147|1000| 4984| 4.0| 733.1| 2.0| 732.2| -0.1%| |1|wiki|title|text|1000| 97191| 4.1| 109.1| 2.1| 114.7| 5.1%| |1|wiki|title|1|1000| 386435| 4.1| 47.1| 2.1| 55.4| 17.6%| |1|wiki|title|<all>|1000|2000000| 4.1| 19.4| 2.1| 30.5| 57.2%| ||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg|| |10|wiki|score|147|1000| 4984| 0.3|1259.4| 0.3|1274.0| 1.2%| |10|wiki|score|text|1000| 97191| 0.4| 215.2| 0.4| 220.0| 2.2%| |10|wiki|score|1|1000| 386435| 0.4| 69.6| 0.4| 72.0| 3.4%| |10|wiki|doc|147|1000| 4984| 0.3|1409.0| 0.3|1394.7| -1.0%| |10|wiki|doc|text|1000| 97191| 0.4| 192.0| 0.4| 232.5| 21.1%| |10|wiki|doc|1|1000| 386435| 0.4| 53.0| 0.4| 66.3| 25.1%| |10|wiki|doc|<all>|1000|2000000| 0.1| 11.9| 0.1| 37.5|215.1%| |10|simple|int|text|1000|2000000| 1.2| 9.8| 0.6| 12.8| 30.6%| |10|simple|int|<all>|1000|2000000| 1.2| 11.0| 0.8| 30.2|174.5%| |10|simple|country|text|1000|2000000| 1.1| 9.8| 0.6| 12.4| 26.5%| |10|simple|country|<all>|1000|2000000| 1.1| 11.0| 0.5| 29.1|164.5%| |10|wiki|title|147|1000| 4984| 26.0| 655.2| 2.1| 84.7|-87.1%| |10|wiki|title|text|1000| 97191| 26.3| 100.4| 2.2| 77.8|-22.5%| |10|wiki|title|1|1000| 386435| 26.0| 42.3| 2.6| 48.4| 14.4%| |10|wiki|title|<all>|1000|2000000| 26.1| 10.9| 2.6| 28.5|161.5%| ||numSeg||index||sortBy||query||topN||hits||warm||qps||warmnew||qpsnew||pctg|| |100|wiki|score|147|1000| 4984| 0.4| 704.1| 0.5| 677.5| -3.8%| |100|wiki|score|text|1000| 97191| 0.4| 169.5| 0.5| 186.0| 9.7%| |100|wiki|score|1|1000| 386435| 0.4| 56.5| 0.5| 67.9| 20.2%| |100|wiki|doc|147|1000| 4984| 0.4| 785.0| 0.4| 724.0| -7.8%| |100|wiki|doc|text|1000| 97191| 0.4| 159.9| 0.4| 204.7| 28.0%| |100|wiki|doc|1|1000| 386435| 0.4| 44.9| 0.4| 64.8| 44.3%| |100|wiki|doc|<all>|1000|2000000| 0.2| 7.8| 0.1| 40.4|417.9%| |100|simple|int|text|1000|2000000| 3.3| 8.4| 1.4| 10.3| 22.6%| |100|simple|int|<all>|1000|2000000| 3.4| 7.4| 1.1| 32.4|337.8%| |100|simple|country|text|1000|2000000| 1.4| 8.6| 0.7| 10.0| 16.3%| |100|simple|country|<all>|1000|2000000| 1.5| 7.3| 0.6| 28.6|291.8%| |100|wiki|title|147|1000| 4984| 189.0| 446.3| 2.4| 19.8|-95.6%| |100|wiki|title|text|1000| 97191| 188.5| 87.7| 2.3| 27.5|-68.6%| |100|wiki|title|1|1000| 386435| 190.4| 41.1| 2.7| 24.6|-40.1%| |100|wiki|title|<all>|1000|2000000| 189.2| 7.4| 3.0| 18.4|148.6%| Performance clearly gets worse for queries not hitting many docs, and with a large queue, against an index with a large number of segments, and sorting by a unique String field (like title). The slowdown for "147" at 100 segments is quite bad. So... I wonder how often users of Lucene set a very large queue size (to do some sort of post filtering, which could be more efficiently done as a real Filter, but...). I think it may a non-trivial number, so... what to do? EG we could offer a different collector that's better optimized towards collecting a large topN (probably doing the toplevel FieldCache that's done today)? Or, we could explore a hybrid approach whereby a slot is only switched to the current segment when it's first visited again (instead of updating all of them on switching readers)? Or... something else? > Change IndexSearcher to use MultiSearcher semantics for multiple subreaders > --------------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > sortBench.py, sortCollate.py > > > FieldCache and Filters are forced down to a single segment reader, allowing > for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org