Re: [PR] Filter by `maxOtherDoc` first to reduce the overhead of `filterCompetitiveHits` [lucene]

via GitHub Mon, 18 Aug 2025 08:19:14 -0700


HUSTERGS commented on PR #15081:
URL: https://github.com/apache/lucene/pull/15081#issuecomment-3197362760


   Actually I did another experiment, I was worried about the `findNextGEQ` 
operation might cause a unexpected slowdown on those machine that do not have 
fast SIMD instructions, so I added a new `copyWithMinDocRequired` method to 
`DocAndScoreAccBuffer`, now instead of call `findNextGEQ`, I directly filter 
those docs when copy, here is the result:
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                 CombinedOrHighHigh        5.54      (3.2%)        5.41      
(3.6%)   -2.5% (  -9% -    4%) 0.023
                      TermTitleSort       51.50      (3.9%)       50.60      
(4.1%)   -1.7% (  -9% -    6%) 0.169
                CombinedAndHighHigh        5.60      (2.2%)        5.53      
(1.3%)   -1.2% (  -4% -    2%) 0.032
                    DismaxOrHighMed       45.35      (7.4%)       44.93      
(9.2%)   -0.9% ( -16% -   16%) 0.726
                           SpanNear        2.51      (4.1%)        2.50      
(4.1%)   -0.4% (  -8% -    8%) 0.733
                       CombinedTerm       10.96      (4.0%)       10.91      
(4.8%)   -0.4% (  -8% -    8%) 0.761
                    AndHighOrMedMed       13.50      (2.6%)       13.47      
(3.0%)   -0.2% (  -5% -    5%) 0.789
                FilteredAndHighHigh       10.23      (2.9%)       10.24      
(2.7%)    0.0% (  -5% -    5%) 0.959
                            Term10K      438.89      (6.1%)      439.16      
(6.6%)    0.1% ( -11% -   13%) 0.975
                               Term      439.07      (6.0%)      439.36      
(6.6%)    0.1% ( -11% -   13%) 0.974
                            TermB1M      439.19      (5.9%)      439.51      
(6.6%)    0.1% ( -11% -   13%) 0.971
                       SloppyPhrase        1.12      (3.3%)        1.12      
(4.1%)    0.1% (  -7% -    7%) 0.929
                  CombinedOrHighMed       19.36      (5.3%)       19.40      
(5.7%)    0.2% ( -10% -   11%) 0.926
               FilteredAndStopWords        8.22      (3.0%)        8.23      
(2.5%)    0.2% (  -5% -    5%) 0.849
                            Term100      438.48      (6.0%)      439.22      
(6.6%)    0.2% ( -11% -   13%) 0.933
                          TermB1M1P      438.65      (6.0%)      439.45      
(6.7%)    0.2% ( -11% -   13%) 0.927
                             Term1M      438.40      (6.1%)      439.36      
(6.5%)    0.2% ( -11% -   13%) 0.913
                        CountPhrase        2.64      (3.2%)        2.64      
(3.8%)    0.2% (  -6% -    7%) 0.836
             CountFilteredOrHighMed       17.85      (0.7%)       17.90      
(0.7%)    0.3% (  -1% -    1%) 0.186
            CountFilteredOrHighHigh       15.76      (0.7%)       15.81      
(0.9%)    0.3% (  -1% -    1%) 0.173
                         DismaxTerm      473.34      (4.1%)      475.20      
(4.9%)    0.4% (  -8% -    9%) 0.782
                         OrHighRare       93.56      (3.0%)       93.93      
(3.3%)    0.4% (  -5% -    6%) 0.683
                CountFilteredIntNRQ       16.34      (1.2%)       16.42      
(1.2%)    0.5% (  -1% -    2%) 0.193
                             IntSet      286.37      (4.4%)      287.83      
(4.7%)    0.5% (  -8% -    9%) 0.723
                    FilteredPrefix3       68.84      (4.4%)       69.19      
(3.6%)    0.5% (  -7% -    8%) 0.682
                    CountAndHighMed       71.34      (2.8%)       71.73      
(2.6%)    0.6% (  -4% -    6%) 0.516
                           Wildcard       46.52      (2.5%)       46.78      
(2.6%)    0.6% (  -4% -    5%) 0.476
                   IntervalsOrdered        2.43      (2.9%)        2.44      
(3.1%)    0.6% (  -5% -    6%) 0.528
                FilteredOrStopWords        7.87      (2.0%)        7.93      
(2.1%)    0.7% (  -3% -    4%) 0.251
                            Prefix3       73.58      (4.8%)       74.13      
(3.7%)    0.8% (  -7% -    9%) 0.578
                CountFilteredOrMany        4.29      (2.5%)        4.33      
(2.6%)    0.9% (  -4% -    6%) 0.278
                   CountAndHighHigh       48.27      (1.6%)       48.70      
(1.5%)    0.9% (  -2% -    4%) 0.072
                    CountOrHighHigh       49.25      (1.9%)       49.71      
(1.8%)    0.9% (  -2% -    4%) 0.101
                        OrStopWords        8.91      (8.4%)        9.00     
(12.2%)    1.0% ( -18% -   23%) 0.771
                  TermDayOfYearSort      252.19      (1.9%)      254.74      
(1.9%)    1.0% (  -2% -    4%) 0.093
                             Phrase        7.36      (2.4%)        7.43      
(2.1%)    1.0% (  -3% -    5%) 0.145
                       FilteredTerm       60.82      (2.3%)       61.48      
(2.6%)    1.1% (  -3% -    6%) 0.165
                     CountOrHighMed       71.77      (2.2%)       72.56      
(1.9%)    1.1% (  -2% -    5%) 0.094
                     FilteredPhrase        9.30      (2.9%)        9.41      
(2.7%)    1.2% (  -4% -    6%) 0.188
                 CombinedAndHighMed       19.58      (5.1%)       19.81      
(5.0%)    1.2% (  -8% -   11%) 0.460
                        CountOrMany        4.81      (2.3%)        4.87      
(2.6%)    1.2% (  -3% -    6%) 0.132
                             OrMany        4.17      (6.2%)        4.22      
(5.3%)    1.2% (  -9% -   13%) 0.509
                     FilteredOrMany        3.87      (3.0%)        3.92      
(2.9%)    1.2% (  -4% -    7%) 0.196
                             IntNRQ       42.46      (2.9%)       42.99      
(3.0%)    1.3% (  -4% -    7%) 0.179
                 FilteredOrHighHigh       12.37      (2.8%)       12.54      
(2.9%)    1.3% (  -4% -    7%) 0.134
                            Respell       34.57      (2.7%)       35.05      
(3.5%)    1.4% (  -4% -    7%) 0.162
                         OrHighHigh       20.69     (10.5%)       20.98     
(14.8%)    1.4% ( -21% -   29%) 0.730
                CountFilteredPhrase        8.57      (3.3%)        8.69      
(3.3%)    1.4% (  -5% -    8%) 0.181
                      TermMonthSort     2048.69      (2.8%)     2078.78      
(1.8%)    1.5% (  -3% -    6%) 0.049
                          CountTerm     5432.73      (2.5%)     5516.44      
(2.7%)    1.5% (  -3% -    6%) 0.063
                             Fuzzy2       33.85      (3.7%)       34.38      
(3.7%)    1.6% (  -5% -    9%) 0.180
                     FilteredIntNRQ       42.05      (2.9%)       42.75      
(3.1%)    1.7% (  -4% -    7%) 0.080
                   DismaxOrHighHigh       33.48      (5.4%)       34.03      
(7.0%)    1.7% ( -10% -   14%) 0.400
                  FilteredAnd3Terms       99.67      (3.3%)      101.33      
(3.2%)    1.7% (  -4% -    8%) 0.104
                         TermDTSort      136.90      (3.5%)      139.43      
(4.0%)    1.9% (  -5% -    9%) 0.118
                  FilteredOrHighMed       36.11      (4.1%)       36.80      
(4.3%)    1.9% (  -6% -   10%) 0.146
                             Fuzzy1       37.33      (4.3%)       38.06      
(4.1%)    2.0% (  -6% -   10%) 0.136
                   FilteredOr3Terms       41.00      (3.8%)       41.84      
(4.2%)    2.0% (  -5% -   10%) 0.106
                 FilteredAndHighMed       30.09      (4.2%)       30.78      
(3.6%)    2.3% (  -5% -   10%) 0.061
         FilteredOr2Terms2StopWords       45.64      (4.9%)       46.83      
(5.3%)    2.6% (  -7% -   13%) 0.105
                           Or3Terms       59.85      (9.7%)       61.50     
(12.9%)    2.8% ( -18% -   28%) 0.444
                          OrHighMed       60.34     (11.4%)       62.06     
(15.6%)    2.9% ( -21% -   33%) 0.507
                 Or2Terms2StopWords       53.33      (8.4%)       55.96     
(10.6%)    4.9% ( -12% -   26%) 0.103
        FilteredAnd2Terms2StopWords       54.99      (5.3%)       58.02      
(5.9%)    5.5% (  -5% -   17%) 0.002
                        AndHighHigh       21.28     (10.6%)       22.68     
(10.7%)    6.6% ( -13% -   31%) 0.051
                          And3Terms       65.77      (9.0%)       70.21      
(8.8%)    6.8% ( -10% -   27%) 0.017
                         AndHighMed       49.53     (10.6%)       52.99     
(10.4%)    7.0% ( -12% -   31%) 0.036
                And2Terms2StopWords       51.74      (8.1%)       55.41      
(8.8%)    7.1% (  -9% -   26%) 0.008
                   AndMedOrHighHigh       14.90      (4.2%)       16.03      
(4.0%)    7.6% (   0% -   16%) 0.000
                       AndStopWords        8.18      (6.7%)        8.86      
(7.4%)    8.3% (  -5% -   24%) 0.000
   ```
   BTW, This benchmark contains #15039, which is different from previous 
benchmark setup. The affected tasks also changed, e.g. OrHighHigh no longer got 
a speedup.
   
   I'd push the newest code latter, maybe you can help run the benchmark to 
verify it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Filter by `maxOtherDoc` first to reduce the overhead of `filterCompetitiveHits` [lucene]

Reply via email to