Re: [PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

via GitHub Sun, 20 Jul 2025 03:08:11 -0700


HUSTERGS commented on PR #14968:
URL: https://github.com/apache/lucene/pull/14968#issuecomment-3094415292


   > I'd suggest to focus this first PR on the `Scorer#applyAsRequiredClause` 
API and later see if there's more room for speedups by adding new APIs to 
`PostingsEnum` in a follow-up PR?
   
   Yeah, I think it's a good idea, I did some experiment with some detail of 
current version of code these days. 
   I've move the `PostingEnum` related code directly into the 
`applyAsRequiredClause` and removed the dependency for newly intruduced 
`NormAndFreqBuffer`, the luceneutil benchmark result seems no longer yield a 
good performance gain (at least not as good as before):, especially for the 
`OrStopWords` query, Here is the result:
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                          OrHighMed       69.09      (3.6%)       67.53     
(10.6%)   -2.3% ( -15% -   12%) 0.365
                 Or2Terms2StopWords       63.02      (5.4%)       61.66      
(8.7%)   -2.2% ( -15% -   12%) 0.344
                       CombinedTerm       11.07      (5.1%)       10.85      
(4.6%)   -2.1% ( -11% -    8%) 0.180
                        AndHighHigh       22.49      (2.8%)       22.03      
(9.5%)   -2.1% ( -13% -   10%) 0.354
                         OrHighHigh       21.41      (2.3%)       20.98      
(9.9%)   -2.0% ( -13% -   10%) 0.384
                          And3Terms       73.13      (3.5%)       71.87      
(8.7%)   -1.7% ( -13% -   10%) 0.409
                            TermB1M      473.09      (4.4%)      465.04      
(8.0%)   -1.7% ( -13% -   11%) 0.404
                          TermB1M1P      473.02      (4.4%)      465.15      
(8.1%)   -1.7% ( -13% -   11%) 0.419
                         AndHighMed       54.53      (3.2%)       53.64     
(10.0%)   -1.6% ( -14% -   11%) 0.485
                           Or3Terms       65.73      (3.6%)       64.70      
(9.4%)   -1.6% ( -13% -   11%) 0.484
                            Term10K      472.65      (4.4%)      465.27      
(8.1%)   -1.6% ( -13% -   11%) 0.450
                            Term100      472.96      (4.2%)      465.67      
(8.1%)   -1.5% ( -13% -   11%) 0.452
                             Term1M      472.95      (4.4%)      465.91      
(8.1%)   -1.5% ( -13% -   11%) 0.469
                               Term      472.92      (4.5%)      465.98      
(8.2%)   -1.5% ( -13% -   11%) 0.481
                And2Terms2StopWords       61.13      (5.8%)       60.43      
(8.9%)   -1.2% ( -14% -   14%) 0.627
                      TermMonthSort     2219.47      (3.2%)     2201.87      
(3.7%)   -0.8% (  -7% -    6%) 0.467
                    DismaxOrHighMed       50.75      (3.5%)       50.42      
(7.9%)   -0.7% ( -11% -   11%) 0.733
                   AndMedOrHighHigh       16.90      (2.9%)       16.80      
(4.8%)   -0.6% (  -8% -    7%) 0.621
                             IntSet      298.20      (4.4%)      296.49      
(4.0%)   -0.6% (  -8% -    8%) 0.666
                         DismaxTerm      513.00      (4.7%)      510.11      
(6.5%)   -0.6% ( -11% -   11%) 0.753
                   DismaxOrHighHigh       35.47      (3.2%)       35.31      
(6.3%)   -0.4% (  -9% -    9%) 0.776
                   FilteredOr3Terms       44.20      (3.2%)       44.11      
(3.1%)   -0.2% (  -6% -    6%) 0.831
         FilteredOr2Terms2StopWords       50.82      (4.1%)       50.72      
(4.3%)   -0.2% (  -8% -    8%) 0.886
                             Fuzzy1       40.73      (3.7%)       40.65      
(4.9%)   -0.2% (  -8% -    8%) 0.891
                             OrMany        4.69      (3.7%)        4.69      
(6.1%)   -0.1% (  -9% -   10%) 0.950
                 CombinedAndHighMed       21.52      (4.6%)       21.51      
(4.3%)   -0.1% (  -8% -    9%) 0.967
                     CountOrHighMed       78.12      (1.7%)       78.08      
(2.6%)   -0.0% (  -4% -    4%) 0.944
                     FilteredOrMany        4.06      (2.4%)        4.06      
(2.2%)   -0.0% (  -4% -    4%) 0.964
        FilteredAnd2Terms2StopWords       61.06      (4.3%)       61.04      
(6.2%)   -0.0% ( -10% -   10%) 0.986
                  FilteredOrHighMed       39.18      (3.3%)       39.17      
(3.2%)   -0.0% (  -6% -    6%) 0.986
                          CountTerm     6298.40      (4.3%)     6297.77      
(4.5%)   -0.0% (  -8% -    9%) 0.994
                             Fuzzy2       36.89      (3.5%)       36.90      
(4.7%)    0.0% (  -7% -    8%) 0.980
                    CountAndHighMed       75.47      (1.7%)       75.50      
(2.1%)    0.0% (  -3% -    3%) 0.941
                             IntNRQ       42.55      (2.2%)       42.57      
(3.0%)    0.1% (  -4% -    5%) 0.946
                  FilteredAnd3Terms      101.94      (2.3%)      102.05      
(2.9%)    0.1% (  -5% -    5%) 0.900
             CountFilteredOrHighMed       17.95      (0.7%)       17.98      
(0.6%)    0.2% (  -1% -    1%) 0.460
                 FilteredOrHighHigh       13.02      (2.5%)       13.05      
(2.3%)    0.2% (  -4% -    5%) 0.813
                     FilteredIntNRQ       42.16      (2.3%)       42.24      
(3.0%)    0.2% (  -4% -    5%) 0.820
                CountFilteredIntNRQ       16.31      (0.8%)       16.35      
(1.2%)    0.2% (  -1% -    2%) 0.468
            CountFilteredOrHighHigh       15.86      (0.8%)       15.90      
(0.8%)    0.3% (  -1% -    1%) 0.331
                    CountOrHighHigh       50.16      (2.4%)       50.29      
(2.5%)    0.3% (  -4% -    5%) 0.724
                CountFilteredPhrase        9.18      (2.5%)        9.21      
(3.4%)    0.3% (  -5% -    6%) 0.771
                           Wildcard       47.34      (3.3%)       47.48      
(3.7%)    0.3% (  -6% -    7%) 0.790
                    AndHighOrMedMed       14.04      (2.2%)       14.08      
(2.6%)    0.3% (  -4% -    5%) 0.688
                   IntervalsOrdered        2.43      (3.4%)        2.44      
(3.3%)    0.3% (  -6% -    7%) 0.760
                        CountOrMany        5.04      (2.9%)        5.06      
(2.8%)    0.4% (  -5% -    6%) 0.696
                CountFilteredOrMany        4.46      (2.5%)        4.48      
(2.6%)    0.4% (  -4% -    5%) 0.635
                      TermTitleSort       51.93      (4.8%)       52.13      
(5.1%)    0.4% (  -9% -   10%) 0.809
                   CountAndHighHigh       48.66      (2.2%)       48.85      
(2.2%)    0.4% (  -3% -    4%) 0.560
                CombinedAndHighHigh        5.67      (2.8%)        5.69      
(2.3%)    0.4% (  -4% -    5%) 0.597
                            Prefix3       75.57      (3.8%)       75.90      
(3.3%)    0.4% (  -6% -    7%) 0.699
                    FilteredPrefix3       70.64      (3.3%)       70.98      
(3.1%)    0.5% (  -5% -    7%) 0.637
                            Respell       36.72      (3.5%)       36.93      
(3.6%)    0.6% (  -6% -    7%) 0.603
                           SpanNear        2.45      (5.5%)        2.46      
(5.4%)    0.6% (  -9% -   12%) 0.733
                FilteredOrStopWords        8.13      (2.2%)        8.18      
(2.0%)    0.7% (  -3% -    5%) 0.329
                       FilteredTerm       64.92      (3.0%)       65.36      
(3.6%)    0.7% (  -5% -    7%) 0.522
                         TermDTSort      144.97      (3.3%)      146.07      
(4.8%)    0.8% (  -7% -    9%) 0.561
                     FilteredPhrase        9.83      (2.2%)        9.91      
(2.6%)    0.8% (  -3% -    5%) 0.297
                       SloppyPhrase        1.12      (5.3%)        1.13      
(4.9%)    0.8% (  -8% -   11%) 0.616
                             Phrase        7.57      (4.3%)        7.64      
(4.3%)    0.9% (  -7% -    9%) 0.490
                  TermDayOfYearSort      264.98      (2.6%)      267.70      
(2.9%)    1.0% (  -4% -    6%) 0.241
                         OrHighRare       94.68      (6.8%)       95.91      
(5.4%)    1.3% ( -10% -   14%) 0.501
                  CombinedOrHighMed       21.05      (5.6%)       21.35      
(4.6%)    1.4% (  -8% -   12%) 0.396
                       AndStopWords        8.87      (3.6%)        9.01      
(7.4%)    1.6% (  -9% -   12%) 0.399
                        CountPhrase        2.65      (4.9%)        2.69      
(3.2%)    1.9% (  -5% -   10%) 0.154
                 CombinedOrHighHigh        5.54      (5.1%)        5.66      
(3.0%)    2.2% (  -5% -   10%) 0.092
                        OrStopWords        8.99      (3.2%)        9.20      
(8.8%)    2.3% (  -9% -   14%) 0.263
                 FilteredAndHighMed       31.76      (2.4%)       32.52      
(4.0%)    2.4% (  -3% -    8%) 0.020
               FilteredAndStopWords        8.41      (3.1%)        8.75      
(2.0%)    4.0% (  -1% -    9%) 0.000
                FilteredAndHighHigh       10.41      (3.1%)       10.87      
(1.8%)    4.4% (   0% -    9%) 0.000
   ```
   
   If I still use the `NormAndFreqBuffer` (instead of `freqs` and `normValues` 
raw arrays inside `TermScorer`), the performance seems to be better? A little 
bit strange to me, Here is the result under identical setup (only related 
querys are showed below)
   ```
                  CombinedOrHighMed       21.60      (4.0%)       21.98      
(3.8%)    1.8% (  -5% -    9%) 0.151
                        OrStopWords        9.05      (1.4%)        9.23      
(3.1%)    2.0% (  -2% -    6%) 0.009
                 CombinedOrHighHigh        5.68      (2.7%)        5.81      
(2.2%)    2.2% (  -2% -    7%) 0.005
                 FilteredAndHighMed       31.77      (2.2%)       32.77      
(1.5%)    3.1% (   0% -    7%) 0.000
               FilteredAndStopWords        8.40      (2.4%)        8.72      
(1.9%)    3.8% (   0% -    8%) 0.000
                FilteredAndHighHigh       10.37      (2.4%)       10.84      
(1.3%)    4.5% (   0% -    8%) 0.000
   ```
   
   Not sure what causes the differences : (
   Will push a new commit using raw array though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

Reply via email to