[
https://issues.apache.org/jira/browse/LUCENE-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564195#comment-16564195
]
Adrien Grand commented on LUCENE-8060:
--------------------------------------
To give some perspective, here are the results of luceneutil on wikimediumall
with this patch that only changes defaults. Some queries get a very serious
speedup.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
AndHighMed 45.93 (1.4%) 40.67 (2.7%)
-11.4% ( -15% - -7%)
AndHighLow 836.13 (2.5%) 771.97 (4.0%)
-7.7% ( -13% - -1%)
HighTermMonthSort 22.10 (12.7%) 20.76 (9.3%)
-6.1% ( -24% - 18%)
IntNRQ 8.35 (11.7%) 7.96 (14.1%)
-4.7% ( -27% - 23%)
Prefix3 69.56 (4.6%) 67.11 (4.4%)
-3.5% ( -11% - 5%)
Wildcard 41.95 (4.3%) 40.48 (5.0%)
-3.5% ( -12% - 6%)
HighSpanNear 13.65 (1.6%) 13.41 (2.0%)
-1.8% ( -5% - 1%)
LowSpanNear 7.84 (2.1%) 7.72 (3.3%)
-1.6% ( -6% - 3%)
MedSpanNear 15.41 (4.0%) 15.21 (3.8%)
-1.3% ( -8% - 6%)
LowSloppyPhrase 6.67 (2.4%) 6.59 (3.7%)
-1.1% ( -7% - 5%)
Respell 260.19 (2.1%) 258.37 (2.6%)
-0.7% ( -5% - 4%)
OrHighHigh 9.49 (3.8%) 10.45 (2.1%)
10.0% ( 4% - 16%)
MedSloppyPhrase 42.89 (2.5%) 49.14 (4.7%)
14.6% ( 7% - 22%)
HighTermDayOfYearSort 19.88 (5.5%) 23.16 (5.4%)
16.5% ( 5% - 29%)
LowPhrase 21.20 (1.5%) 24.98 (1.7%)
17.8% ( 14% - 21%)
HighSloppyPhrase 8.70 (4.1%) 12.41 (5.8%)
42.7% ( 31% - 54%)
OrNotHighLow 703.04 (1.9%) 1024.70 (4.9%)
45.8% ( 38% - 53%)
AndHighHigh 22.95 (1.2%) 35.81 (4.4%)
56.0% ( 49% - 62%)
HighPhrase 6.41 (4.8%) 10.25 (2.8%)
60.0% ( 49% - 70%)
Fuzzy1 71.70 (2.8%) 130.01 (7.0%)
81.3% ( 69% - 93%)
MedPhrase 3.69 (7.0%) 7.05 (3.2%)
90.7% ( 75% - 108%)
OrHighMed 27.98 (3.5%) 68.75 (7.7%)
145.7% ( 129% - 162%)
Fuzzy2 15.03 (3.3%) 43.46 (11.5%)
189.1% ( 168% - 210%)
LowTerm 312.94 (4.6%) 1939.36 (35.1%)
519.7% ( 458% - 586%)
OrHighLow 47.32 (4.4%) 658.31 (46.8%)
1291.1% (1187% - 1404%)
MedTerm 82.45 (3.0%) 1532.45 (94.1%)
1758.6% (1612% - 1913%)
OrNotHighMed 55.34 (3.3%) 1075.65 (47.4%)
1843.8% (1736% - 1958%)
OrHighNotLow 50.30 (2.8%) 1369.26 (113.1%)
2622.1% (2438% - 2815%)
OrHighNotHigh 22.97 (4.2%) 1371.00 (205.8%)
5869.7% (5430% - 6347%)
OrNotHighHigh 15.33 (4.0%) 1070.48 (254.4%)
6881.0% (6368% - 7435%)
OrHighNotMed 11.76 (2.9%) 1097.78 (315.6%)
9235.1% (8669% - 9833%)
HighTerm 12.71 (3.4%) 1549.32
(581.9%)12094.4% (11132% - 13123%)
{noformat}
> Enable top-docs collection optimizations by default
> ---------------------------------------------------
>
> Key: LUCENE-8060
> URL: https://issues.apache.org/jira/browse/LUCENE-8060
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (8.0)
>
> Attachments: LUCENE-8060.patch
>
>
> We are getting optimizations when hit counts are not required (sorted
> indexes, MAXSCORE, short-circuiting of phrase queries) but our users won't
> benefit from them unless we disable exact hit counts by default or we require
> them to tell us whether hit counts are required.
> I think making hit counts approximate by default is going to be a bit trappy,
> so I'm rather leaning towards requiring users to tell us explicitly whether
> they need total hit counts. I can think of two ways to do that: either by
> passing a boolean to the IndexSearcher constructor or by adding a boolean to
> all methods that produce TopDocs instances. I like the latter better but I'm
> open to discussion or other ideas?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]