[ 
https://issues.apache.org/jira/browse/LUCENE-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564195#comment-16564195
 ] 

Adrien Grand commented on LUCENE-8060:
--------------------------------------

To give some perspective, here are the results of luceneutil on wikimediumall 
with this patch that only changes defaults. Some queries get a very serious 
speedup.
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
              AndHighMed       45.93      (1.4%)       40.67      (2.7%)  
-11.4% ( -15% -   -7%)
              AndHighLow      836.13      (2.5%)      771.97      (4.0%)   
-7.7% ( -13% -   -1%)
       HighTermMonthSort       22.10     (12.7%)       20.76      (9.3%)   
-6.1% ( -24% -   18%)
                  IntNRQ        8.35     (11.7%)        7.96     (14.1%)   
-4.7% ( -27% -   23%)
                 Prefix3       69.56      (4.6%)       67.11      (4.4%)   
-3.5% ( -11% -    5%)
                Wildcard       41.95      (4.3%)       40.48      (5.0%)   
-3.5% ( -12% -    6%)
            HighSpanNear       13.65      (1.6%)       13.41      (2.0%)   
-1.8% (  -5% -    1%)
             LowSpanNear        7.84      (2.1%)        7.72      (3.3%)   
-1.6% (  -6% -    3%)
             MedSpanNear       15.41      (4.0%)       15.21      (3.8%)   
-1.3% (  -8% -    6%)
         LowSloppyPhrase        6.67      (2.4%)        6.59      (3.7%)   
-1.1% (  -7% -    5%)
                 Respell      260.19      (2.1%)      258.37      (2.6%)   
-0.7% (  -5% -    4%)
              OrHighHigh        9.49      (3.8%)       10.45      (2.1%)   
10.0% (   4% -   16%)
         MedSloppyPhrase       42.89      (2.5%)       49.14      (4.7%)   
14.6% (   7% -   22%)
   HighTermDayOfYearSort       19.88      (5.5%)       23.16      (5.4%)   
16.5% (   5% -   29%)
               LowPhrase       21.20      (1.5%)       24.98      (1.7%)   
17.8% (  14% -   21%)
        HighSloppyPhrase        8.70      (4.1%)       12.41      (5.8%)   
42.7% (  31% -   54%)
            OrNotHighLow      703.04      (1.9%)     1024.70      (4.9%)   
45.8% (  38% -   53%)
             AndHighHigh       22.95      (1.2%)       35.81      (4.4%)   
56.0% (  49% -   62%)
              HighPhrase        6.41      (4.8%)       10.25      (2.8%)   
60.0% (  49% -   70%)
                  Fuzzy1       71.70      (2.8%)      130.01      (7.0%)   
81.3% (  69% -   93%)
               MedPhrase        3.69      (7.0%)        7.05      (3.2%)   
90.7% (  75% -  108%)
               OrHighMed       27.98      (3.5%)       68.75      (7.7%)  
145.7% ( 129% -  162%)
                  Fuzzy2       15.03      (3.3%)       43.46     (11.5%)  
189.1% ( 168% -  210%)
                 LowTerm      312.94      (4.6%)     1939.36     (35.1%)  
519.7% ( 458% -  586%)
               OrHighLow       47.32      (4.4%)      658.31     (46.8%) 
1291.1% (1187% - 1404%)
                 MedTerm       82.45      (3.0%)     1532.45     (94.1%) 
1758.6% (1612% - 1913%)
            OrNotHighMed       55.34      (3.3%)     1075.65     (47.4%) 
1843.8% (1736% - 1958%)
            OrHighNotLow       50.30      (2.8%)     1369.26    (113.1%) 
2622.1% (2438% - 2815%)
           OrHighNotHigh       22.97      (4.2%)     1371.00    (205.8%) 
5869.7% (5430% - 6347%)
           OrNotHighHigh       15.33      (4.0%)     1070.48    (254.4%) 
6881.0% (6368% - 7435%)
            OrHighNotMed       11.76      (2.9%)     1097.78    (315.6%) 
9235.1% (8669% - 9833%)
                HighTerm       12.71      (3.4%)     1549.32    
(581.9%)12094.4% (11132% - 13123%)
{noformat}

> Enable top-docs collection optimizations by default
> ---------------------------------------------------
>
>                 Key: LUCENE-8060
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8060
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (8.0)
>
>         Attachments: LUCENE-8060.patch
>
>
> We are getting optimizations when hit counts are not required (sorted 
> indexes, MAXSCORE, short-circuiting of phrase queries) but our users won't 
> benefit from them unless we disable exact hit counts by default or we require 
> them to tell us whether hit counts are required.
> I think making hit counts approximate by default is going to be a bit trappy, 
> so I'm rather leaning towards requiring users to tell us explicitly whether 
> they need total hit counts. I can think of two ways to do that: either by 
> passing a boolean to the IndexSearcher constructor or by adding a boolean to 
> all methods that produce TopDocs instances. I like the latter better but I'm 
> open to discussion or other ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to