[PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

via GitHub Fri, 18 Jul 2025 21:12:40 -0700


HUSTERGS opened a new pull request, #14968:
URL: https://github.com/apache/lucene/pull/14968


   ### Description
   Inspired by #14690, this PR essentially tries to bring back the 
`Scorer#applyAsRequiredClause` interface,  but different from #14690 , I'm 
wondering whether we can pass the `DocAndScoreAccBuffer` all the way down to 
the posting, so maybe we can benefit from reducing the `advance` function calls 
when the buffer have a dense doc id set, eg, utilize the SIMID again. So I 
added a new interface on `PostingsEnum#nextRequiredFreqBuffer` (not stable 
yet), currently I only implement the default implemetation, still trying to 
speedup the process under `BlockPostingEnum`. 
   
   This is still under development, I know we should be cautious about adding 
new public interface (especially two at once!), but I want to share current 
progress, below are the luceneutil benchmark result on `wikimediumall` with 
`searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50`, here is the 
result after 20 iterations (against the latest code):
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                               Term      472.96      (6.1%)      461.21      
(6.8%)   -2.5% ( -14% -   11%) 0.224
                             Term1M      472.39      (5.9%)      460.90      
(7.0%)   -2.4% ( -14% -   11%) 0.232
                            Term100      472.69      (5.9%)      461.52      
(6.8%)   -2.4% ( -14% -   10%) 0.239
                          TermB1M1P      472.28      (5.9%)      461.42      
(6.9%)   -2.3% ( -14% -   11%) 0.257
                            Term10K      472.60      (6.0%)      461.82      
(6.8%)   -2.3% ( -14% -   11%) 0.260
                            TermB1M      472.22      (6.0%)      461.48      
(6.9%)   -2.3% ( -14% -   11%) 0.266
                   DismaxOrHighHigh       35.68      (3.5%)       35.03      
(3.9%)   -1.8% (  -8% -    5%) 0.118
                             OrMany        4.73      (5.3%)        4.66      
(5.7%)   -1.4% ( -11% -   10%) 0.429
                   IntervalsOrdered        2.47      (1.9%)        2.44      
(3.1%)   -1.3% (  -6% -    3%) 0.120
                         OrHighRare       96.38      (6.9%)       95.20      
(6.5%)   -1.2% ( -13% -   13%) 0.564
                    DismaxOrHighMed       50.96      (4.1%)       50.35      
(4.2%)   -1.2% (  -9% -    7%) 0.358
                       CombinedTerm       11.17      (3.7%)       11.04      
(4.8%)   -1.1% (  -9% -    7%) 0.402
                         DismaxTerm      516.28      (5.0%)      510.53      
(4.1%)   -1.1% (  -9% -    8%) 0.440
                         OrHighHigh       21.39      (2.5%)       21.22      
(2.8%)   -0.8% (  -5% -    4%) 0.319
                        AndHighHigh       22.46      (2.4%)       22.27      
(2.5%)   -0.8% (  -5% -    4%) 0.292
                          OrHighMed       68.77      (4.0%)       68.47      
(4.3%)   -0.4% (  -8% -    8%) 0.748
                CountFilteredPhrase        9.21      (2.8%)        9.17      
(3.4%)   -0.4% (  -6% -    6%) 0.675
                           SpanNear        2.51      (4.1%)        2.51      
(4.5%)   -0.3% (  -8% -    8%) 0.817
                 Or2Terms2StopWords       62.44      (6.0%)       62.25      
(6.2%)   -0.3% ( -11% -   12%) 0.876
                       SloppyPhrase        1.14      (3.2%)        1.13      
(4.1%)   -0.3% (  -7% -    7%) 0.796
                FilteredOrStopWords        8.16      (2.3%)        8.14      
(2.8%)   -0.3% (  -5% -    4%) 0.729
                        CountPhrase        2.71      (2.0%)        2.71      
(2.3%)   -0.2% (  -4% -    4%) 0.731
                CountFilteredIntNRQ       16.35      (1.4%)       16.34      
(1.3%)   -0.1% (  -2% -    2%) 0.862
                   AndMedOrHighHigh       16.75      (2.1%)       16.74      
(3.3%)   -0.1% (  -5% -    5%) 0.935
                     FilteredIntNRQ       42.25      (3.1%)       42.23      
(3.1%)   -0.0% (  -6% -    6%) 0.967
                    AndHighOrMedMed       14.11      (3.2%)       14.11      
(3.1%)   -0.0% (  -6% -    6%) 1.000
                 FilteredOrHighHigh       13.03      (2.6%)       13.03      
(2.9%)    0.0% (  -5% -    5%) 0.996
                             IntNRQ       42.59      (3.2%)       42.60      
(3.2%)    0.0% (  -6% -    6%) 0.979
             CountFilteredOrHighMed       17.95      (0.8%)       17.97      
(0.9%)    0.1% (  -1% -    1%) 0.806
            CountFilteredOrHighHigh       15.86      (0.9%)       15.88      
(1.0%)    0.1% (  -1% -    2%) 0.668
                            Respell       37.00      (4.4%)       37.07      
(3.9%)    0.2% (  -7% -    8%) 0.897
                         AndHighMed       54.33      (3.3%)       54.43      
(3.6%)    0.2% (  -6% -    7%) 0.861
                             Phrase        7.62      (2.6%)        7.64      
(3.3%)    0.2% (  -5% -    6%) 0.834
                             Fuzzy2       36.78      (4.3%)       36.89      
(4.8%)    0.3% (  -8% -    9%) 0.836
                     FilteredPhrase        9.86      (2.6%)        9.89      
(3.1%)    0.3% (  -5% -    6%) 0.737
                             Fuzzy1       40.61      (4.3%)       40.74      
(4.9%)    0.3% (  -8% -    9%) 0.822
                  FilteredOrHighMed       39.08      (3.7%)       39.21      
(3.8%)    0.3% (  -6% -    8%) 0.780
                       FilteredTerm       65.09      (3.6%)       65.34      
(3.9%)    0.4% (  -6% -    8%) 0.756
                    CountAndHighMed       75.20      (2.5%)       75.50      
(3.1%)    0.4% (  -5% -    6%) 0.649
                   FilteredOr3Terms       44.00      (3.7%)       44.21      
(3.9%)    0.5% (  -6% -    8%) 0.683
                CombinedAndHighHigh        5.73      (2.1%)        5.76      
(2.0%)    0.5% (  -3% -    4%) 0.430
                CountFilteredOrMany        4.46      (2.6%)        4.48      
(2.9%)    0.6% (  -4% -    6%) 0.511
                      TermTitleSort       51.76      (4.4%)       52.07      
(5.6%)    0.6% (  -9% -   11%) 0.707
                  FilteredAnd3Terms      101.78      (3.5%)      102.42      
(3.3%)    0.6% (  -5% -    7%) 0.552
                   CountAndHighHigh       48.46      (2.3%)       48.77      
(2.5%)    0.6% (  -4% -    5%) 0.406
                     CountOrHighMed       77.72      (2.3%)       78.26      
(2.5%)    0.7% (  -4% -    5%) 0.360
                And2Terms2StopWords       60.68      (6.5%)       61.13      
(6.6%)    0.7% ( -11% -   14%) 0.720
                 CombinedOrHighHigh        5.62      (3.5%)        5.66      
(4.1%)    0.8% (  -6% -    8%) 0.527
                          And3Terms       72.70      (4.0%)       73.25      
(4.2%)    0.8% (  -7% -    9%) 0.559
         FilteredOr2Terms2StopWords       50.50      (4.8%)       50.89      
(5.0%)    0.8% (  -8% -   11%) 0.624
                 CombinedAndHighMed       21.62      (4.9%)       21.78      
(4.5%)    0.8% (  -8% -   10%) 0.608
                    CountOrHighHigh       49.83      (2.6%)       50.25      
(2.6%)    0.8% (  -4% -    6%) 0.307
                     FilteredOrMany        4.03      (2.7%)        4.06      
(2.8%)    0.9% (  -4% -    6%) 0.324
                  CombinedOrHighMed       21.21      (4.8%)       21.41      
(5.1%)    0.9% (  -8% -   11%) 0.552
                        CountOrMany        5.02      (3.1%)        5.07      
(2.9%)    0.9% (  -4% -    7%) 0.328
                         TermDTSort      145.50      (5.2%)      147.04      
(4.9%)    1.1% (  -8% -   11%) 0.509
                           Or3Terms       65.31      (3.8%)       66.02      
(4.4%)    1.1% (  -6% -    9%) 0.404
                  TermDayOfYearSort      266.30      (3.6%)      269.35      
(3.8%)    1.1% (  -6% -    8%) 0.331
                           Wildcard       47.14      (3.5%)       47.75      
(4.4%)    1.3% (  -6% -    9%) 0.298
                             IntSet      298.82      (4.0%)      303.31      
(5.5%)    1.5% (  -7% -   11%) 0.325
                    FilteredPrefix3       69.72      (3.6%)       70.79      
(3.3%)    1.5% (  -5% -    8%) 0.162
                      TermMonthSort     2187.76      (4.5%)     2231.91      
(4.8%)    2.0% (  -7% -   11%) 0.173
        FilteredAnd2Terms2StopWords       60.66      (4.9%)       61.89      
(5.0%)    2.0% (  -7% -   12%) 0.192
                            Prefix3       74.19      (4.0%)       75.82      
(3.6%)    2.2% (  -5% -   10%) 0.069
                          CountTerm     6268.63      (6.8%)     6406.68      
(7.1%)    2.2% ( -11% -   17%) 0.319
                        OrStopWords        8.96      (2.2%)        9.21      
(3.7%)    2.9% (  -3% -    8%) 0.003
                       AndStopWords        8.78      (2.9%)        9.14      
(2.8%)    4.1% (  -1% -   10%) 0.000
                 FilteredAndHighMed       31.61      (2.4%)       32.97      
(2.2%)    4.3% (   0% -    9%) 0.000
               FilteredAndStopWords        8.34      (2.3%)        8.86      
(2.1%)    6.3% (   1% -   10%) 0.000
                FilteredAndHighHigh       10.32      (2.5%)       11.00      
(1.9%)    6.7% (   2% -   11%) 0.000
   ```
   
   I think it's promissing to look into this approach more. If I understand 
correctly , this speedup should only come from the reduces virtual function 
calls ?
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

Reply via email to