[GitHub] [lucene-solr] jimczi edited a comment on issue #904: LUCENE-8992: Share minimum score across segment in concurrent search

GitBox Mon, 07 Oct 2019 01:27:45 -0700

jimczi edited a comment on issue #904: LUCENE-8992: Share minimum score across 
segment in concurrent search
URL: https://github.com/apache/lucene-solr/pull/904#issuecomment-538892799
 
 
   We discussed offline with @jpountz and while it proved difficult to 
implement the check directly in the IndexSearcher it seems better to 
decorrelate the checks of the global minimum score from the update of the local 
one. I pushed a 
[commit](https://github.com/apache/lucene-solr/pull/904/commits/47f9bf6f31ce2b429c77807c2d08111d9ff93b07)
 that implements this idea by checking the global score every 1024 documents 
and since the rate of the check is known I also switched to a `LongAccumulator` 
in order to speed up the updates. Finally I added the maximum document id 
associated with the current maximum minimum score in order to be able to 
require the next float in the `TopScoreDocCollector` for leaves that are after 
the document id registered in the global maximum score. Here's the result of 
the benchmark on `TopScoreDocCollector` using wikimedium:
   ````
                      TaskQPS baseline      StdDev   QPS patch      StdDev      
          Pct diff
       HighIntervalsOrdered        6.23      (0.0%)        5.98      (0.0%)   
-4.0% ( -3% -  -3%)
               HighSpanNear        5.47      (0.0%)        5.29     (0.0%)    
-3.3% ( -3% -  -3%)
                LowSpanNear       19.06      (0.0%)       18.06      (0.0%)   
-5.2% (  -5% -   -5%)
                  OrHighMed       71.30      (0.0%)       67.68      (0.0%)   
-5.1% (  -5% -   -5%)
                MedSpanNear       17.86      (0.0%)       16.96      (0.0%)   
-5.0% (  -5% -   -5%)
                    Respell       29.32      (0.0%)       28.16      (0.0%)   
-4.0% (  -3% -   -3%)
                 AndHighMed      107.02      (0.0%)      102.92      (0.0%)   
-3.8% (  -3% -   -3%)
                     Fuzzy2       43.22      (0.0%)       41.87      (0.0%)   
-3.1% (  -3% -   -3%)
                     IntNRQ       58.96      (0.0%)       57.60      (0.0%)   
-2.3% (  -2% -   -2%)
                     Fuzzy1       55.31      (0.0%)       54.05      (0.0%)   
-2.3% (  -2% -   -2%)
                  LowPhrase       39.99      (0.0%)       39.19      (0.0%)   
-2.0% (  -1% -   -1%)
            LowSloppyPhrase       23.71      (0.0%)       23.51      (0.0%)   
-0.8% (   0% -    0%)
                 AndHighLow      820.39      (0.0%)      815.03      (0.0%)   
-0.7% (   0% -    0%)
                 HighPhrase       65.78      (0.0%)       65.64      (0.0%)   
-0.2% (   0% -    0%)
            MedSloppyPhrase       18.55      (0.0%)       18.89      (0.0%)    
1.8% (   1% -    1%)
           HighSloppyPhrase        7.06      (0.0%)        7.22      (0.0%)    
2.4% (   2% -    2%)
                   Wildcard       63.42      (0.0%)       65.49      (0.0%)    
3.3% (   3% -    3%)
                  MedPhrase       59.06      (0.0%)       61.16      (0.0%)    
3.5% (   3% -    3%)
                    Prefix3       72.62      (0.0%)       75.86      (0.0%)    
4.5% (   4% -    4%)
               OrNotHighLow      777.16      (0.0%)      812.50      (0.0%)    
4.5% (   4% -    4%)
                AndHighHigh       31.35      (0.0%)       33.66      (0.0%)    
7.3% (   7% -    7%)
                 OrHighHigh       18.98      (0.0%)       20.95      (0.0%)   
10.4% (  10% -   10%)
                    LowTerm      598.85      (0.0%)      745.84      (0.0%)   
24.5% (  24% -   24%)
               OrNotHighMed      388.26      (0.0%)      549.63      (0.0%)   
41.6% (  41% -   41%)
                    MedTerm      386.78      (0.0%)      595.30      (0.0%)   
53.9% (  53% -   53%)
               OrHighNotMed      308.92      (0.0%)      496.75      (0.0%)   
60.8% (  60% -   60%)
                   HighTerm      310.13      (0.0%)      515.95      (0.0%)   
66.4% (  66% -   66%)
               OrHighNotLow      304.05      (0.0%)      521.76      (0.0%)   
71.6% (  71% -   71%)
              OrHighNotHigh      273.30      (0.0%)      470.54      (0.0%)   
72.2% (  72% -   72%)
              OrNotHighHigh      296.77      (0.0%)      512.47      (0.0%)   
72.7% (  72% -   72%)
                  OrHighLow      108.61      (0.0%)      325.00      (0.0%)  
199.2% ( 199% -  199%)
   ````
   
   Note that I ran the benchmark against a version that is before LUCENE-8978. 
The results against the already committed code in LUCENE-8978 show small 
regressions on some queries (high-phrase) and better results on others 
(highorlow) but the overall is comparable. I have a slight preference over this 
version because the behavior does not depend on the rate of the updates of the 
local minimum score.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jimczi edited a comment on issue #904: LUCENE-8992: Share minimum score across segment in concurrent search

Reply via email to