[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

Luca Cavanna (JIRA) Thu, 09 May 2019 08:20:25 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836467#comment-16836467
 ]


Luca Cavanna commented on LUCENE-8796:
--------------------------------------

I have updated the PR after applying Yonik's suggestion and re-run benchmarks a 
few times. The run with the least noise had these results (note that I disabled 
the bitset optimization on both sides):

{{
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
                HighTerm     1575.07      (5.9%)     1541.27      (6.9%)   
-2.1% ( -14% -   11%)
                 MedTerm     1363.22      (6.5%)     1337.03      (7.0%)   
-1.9% ( -14% -   12%)
                 LowTerm     1441.86      (4.2%)     1420.77      (5.2%)   
-1.5% ( -10% -    8%)
       IntNRQConjMedTerm      280.55      (4.0%)      277.64      (4.1%)   
-1.0% (  -8% -    7%)
               MedPhrase      153.84      (3.5%)      152.44      (3.3%)   
-0.9% (  -7% -    6%)
                 Prefix3      224.92      (4.0%)      223.13      (3.7%)   
-0.8% (  -8% -    7%)
        HighSloppyPhrase       19.70      (3.7%)       19.56      (4.5%)   
-0.7% (  -8% -    7%)
         MedSloppyPhrase       18.23      (4.3%)       18.11      (4.7%)   
-0.7% (  -9% -    8%)
            OrNotHighMed      586.33      (3.4%)      582.47      (4.9%)   
-0.7% (  -8% -    7%)
         LowSloppyPhrase       18.56      (3.6%)       18.46      (3.9%)   
-0.5% (  -7% -    7%)
              HighPhrase       22.64      (2.7%)       22.54      (3.0%)   
-0.4% (  -6% -    5%)
               LowPhrase      144.10      (3.8%)      143.55      (3.3%)   
-0.4% (  -7% -    6%)
              AndHighLow      539.26      (3.7%)      537.25      (3.2%)   
-0.4% (  -7% -    6%)
                PKLookup      132.96      (3.0%)      132.48      (4.6%)   
-0.4% (  -7% -    7%)
               OrHighMed      115.79      (2.7%)      115.49      (3.5%)   
-0.3% (  -6% -    6%)
      PrefixConjHighTerm       36.98      (2.8%)       36.93      (3.4%)   
-0.1% (  -6% -    6%)
    WildcardConjHighTerm       45.79      (3.0%)       45.73      (3.1%)   
-0.1% (  -6% -    6%)
               OrHighLow      448.91      (3.7%)      448.70      (6.3%)   
-0.0% (  -9% -   10%)
                Wildcard       78.89      (3.2%)       78.95      (3.6%)    
0.1% (  -6% -    7%)
      IntNRQConjHighTerm       78.35      (2.3%)       78.48      (2.4%)    
0.2% (  -4% -    4%)
                  IntNRQ      100.56      (2.7%)      100.84      (2.8%)    
0.3% (  -5% -    5%)
            OrHighNotLow      732.45      (2.8%)      734.56      (5.3%)    
0.3% (  -7% -    8%)
           OrHighNotHigh      544.87      (2.8%)      546.47      (4.6%)    
0.3% (  -6% -    7%)
       IntNRQConjLowTerm      249.20      (4.2%)      249.99      (3.8%)    
0.3% (  -7% -    8%)
                 Respell       73.05      (3.1%)       73.28      (3.4%)    
0.3% (  -6% -    7%)
              OrHighHigh       35.56      (3.0%)       35.68      (4.2%)    
0.3% (  -6% -    7%)
            OrNotHighLow      695.41      (4.8%)      697.88      (6.5%)    
0.4% ( -10% -   12%)
             MedSpanNear       59.99      (3.8%)       60.30      (4.0%)    
0.5% (  -7% -    8%)
              AndHighMed      190.02      (3.1%)      191.04      (3.6%)    
0.5% (  -5% -    7%)
             LowSpanNear       12.73      (3.9%)       12.81      (4.2%)    
0.6% (  -7% -    8%)
   HighTermDayOfYearSort       88.42      (7.0%)       89.09      (7.1%)    
0.8% ( -12% -   15%)
       PrefixConjLowTerm       54.95      (3.7%)       55.43      (3.8%)    
0.9% (  -6% -    8%)
            OrHighNotMed      628.44      (3.4%)      634.02      (6.1%)    
0.9% (  -8% -   10%)
            HighSpanNear       28.86      (3.2%)       29.11      (3.5%)    
0.9% (  -5% -    7%)
     WildcardConjMedTerm       72.48      (3.4%)       73.19      (4.8%)    
1.0% (  -7% -    9%)
                  Fuzzy2       49.17      (9.9%)       49.68     (11.7%)    
1.0% ( -18% -   25%)
             AndHighHigh       63.44      (3.8%)       64.11      (3.8%)    
1.1% (  -6% -    9%)
                  Fuzzy1       79.43      (9.9%)       80.55      (9.7%)    
1.4% ( -16% -   23%)
           OrNotHighHigh      574.89      (3.6%)      584.43      (5.5%)    
1.7% (  -7% -   11%)
       PrefixConjMedTerm       79.00      (3.2%)       80.50      (3.6%)    
1.9% (  -4% -    8%)
     WildcardConjLowTerm       90.67      (2.9%)       92.49      (3.7%)    
2.0% (  -4% -    8%)
       HighTermMonthSort       86.13     (11.8%)       88.79     (12.4%)    
3.1% ( -18% -   30%)
}}

I also ran benchmarks with the bitset optimization in place on both ends:

{{
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
                  IntNRQ       63.46     (24.6%)       62.28     (24.2%)   
-1.9% ( -40% -   62%)
            OrNotHighMed      596.89      (3.5%)      589.18      (4.6%)   
-1.3% (  -9% -    7%)
           OrNotHighHigh      769.65      (3.1%)      760.89      (2.9%)   
-1.1% (  -6% -    4%)
                  Fuzzy1       76.45      (6.8%)       75.62      (7.4%)   
-1.1% ( -14% -   14%)
           OrHighNotHigh      626.48      (3.4%)      619.67      (2.6%)   
-1.1% (  -6% -    5%)
                 MedTerm     1345.24      (3.6%)     1332.07      (3.7%)   
-1.0% (  -7% -    6%)
                PKLookup      136.70      (4.0%)      135.57      (3.9%)   
-0.8% (  -8% -    7%)
                HighTerm     1103.39      (3.5%)     1095.23      (3.7%)   
-0.7% (  -7% -    6%)
            OrNotHighLow      780.85      (3.8%)      775.40      (2.7%)   
-0.7% (  -6% -    5%)
            OrHighNotLow      815.32      (4.7%)      810.72      (5.7%)   
-0.6% ( -10% -   10%)
                 Prefix3      310.00      (4.9%)      308.31      (3.7%)   
-0.5% (  -8% -    8%)
                 LowTerm     1462.30      (3.7%)     1455.27      (4.4%)   
-0.5% (  -8% -    7%)
               OrHighLow      446.56      (4.6%)      445.11      (3.2%)   
-0.3% (  -7% -    7%)
              AndHighLow      594.90      (2.9%)      593.39      (3.4%)   
-0.3% (  -6% -    6%)
                 Respell       64.46      (2.5%)       64.36      (2.6%)   
-0.2% (  -5% -    5%)
            OrHighNotMed      685.98      (4.5%)      685.69      (3.7%)   
-0.0% (  -7% -    8%)
               OrHighMed       67.90      (5.1%)       67.91      (3.4%)    
0.0% (  -8% -    8%)
                  Fuzzy2       50.18      (4.5%)       50.21      (5.8%)    
0.1% (  -9% -   10%)
             LowSpanNear       59.27      (3.9%)       59.34      (4.0%)    
0.1% (  -7% -    8%)
              OrHighHigh       30.89      (5.2%)       30.94      (3.3%)    
0.2% (  -7% -    9%)
               LowPhrase      114.67      (3.1%)      114.87      (2.5%)    
0.2% (  -5% -    5%)
              HighPhrase       22.34      (2.7%)       22.42      (2.2%)    
0.4% (  -4% -    5%)
             AndHighHigh       59.53      (3.8%)       59.89      (4.4%)    
0.6% (  -7% -    9%)
               MedPhrase       29.99      (2.9%)       30.19      (2.3%)    
0.7% (  -4% -    6%)
         MedSloppyPhrase       71.57      (3.1%)       72.10      (3.0%)    
0.7% (  -5% -    7%)
      IntNRQConjHighTerm      113.74      (7.3%)      114.66      (7.1%)    
0.8% ( -12% -   16%)
         LowSloppyPhrase       14.18      (3.4%)       14.30      (2.6%)    
0.8% (  -4% -    6%)
       PrefixConjLowTerm       89.05      (4.6%)       89.80      (5.1%)    
0.8% (  -8% -   11%)
              AndHighMed      166.34      (3.1%)      167.76      (3.8%)    
0.9% (  -5% -    7%)
     WildcardConjMedTerm       51.44      (2.6%)       51.88      (3.0%)    
0.9% (  -4% -    6%)
       PrefixConjMedTerm       68.16      (4.8%)       68.80      (4.6%)    
0.9% (  -8% -   10%)
      PrefixConjHighTerm       42.34      (6.1%)       42.81      (5.0%)    
1.1% (  -9% -   13%)
             MedSpanNear       15.57      (5.5%)       15.74      (5.4%)    
1.1% (  -9% -   12%)
     WildcardConjLowTerm       51.56      (3.7%)       52.15      (4.2%)    
1.1% (  -6% -    9%)
            HighSpanNear        5.66      (5.8%)        5.73      (5.9%)    
1.2% (  -9% -   13%)
       IntNRQConjLowTerm      120.28      (8.5%)      121.67      (8.8%)    
1.2% ( -14% -   20%)
    WildcardConjHighTerm       55.43      (3.2%)       56.10      (3.4%)    
1.2% (  -5% -    8%)
       IntNRQConjMedTerm       97.79      (8.3%)       98.98      (8.6%)    
1.2% ( -14% -   19%)
                Wildcard      106.37      (2.9%)      107.75      (3.6%)    
1.3% (  -5% -    7%)
        HighSloppyPhrase       18.21      (4.9%)       18.48      (4.4%)    
1.5% (  -7% -   11%)
       HighTermMonthSort      146.10     (11.0%)      148.89     (10.5%)    
1.9% ( -17% -   26%)
   HighTermDayOfYearSort       68.62      (6.1%)       70.08      (3.9%)    
2.1% (  -7% -   12%)
}}
 
I will next have a look at what Atri is suggesting.

> Use exponential search in IntArrayDocIdSet advance method
> ---------------------------------------------------------
>
>                 Key: LUCENE-8796
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8796
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Luca Cavanna
>            Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

Reply via email to