Re: [PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

via GitHub Tue, 12 Aug 2025 04:28:27 -0700


HUSTERGS commented on PR #14968:
URL: https://github.com/apache/lucene/pull/14968#issuecomment-3178922979


   > Another idea: in order to not add new APIs, an alternative would be to 
implement specialized bulk scorers for the case when all scorers are term 
scorers, on the same field (a common case, and arguably the case we're most 
interested in optimizing) and work directly on `ImpactsEnum`, norms, and 
`SimScorer`. This should allow us to do interesting things without introducing 
new APIs, such as reading norms only once per doc ID or vectorizing score 
computations of required/non-essential clauses.
   
   I'm waiting for #15039 to merge, and looking forward to dig a little bit 
more about this
   
   
   
   
   > I suspect there is some connections between #15004 and this PR (there are 
some overlaps of affected tasks), maybe we should wait for the #15004 being 
merged into the main branch and compare the performance diff of this PR then ?
   
   Since #15004 is merged, I ran the benchmark with result below:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                       CombinedTerm       11.25      (3.8%)       11.03      
(4.6%)   -1.9% (  -9% -    6%) 0.150
                         OrHighHigh       21.77      (2.1%)       21.50      
(2.0%)   -1.3% (  -5% -    2%) 0.050
                 Or2Terms2StopWords       59.40      (5.5%)       58.70      
(5.1%)   -1.2% ( -11% -   10%) 0.485
                            TermB1M      445.35      (2.8%)      440.99      
(2.9%)   -1.0% (  -6% -    4%) 0.284
                        AndHighHigh       22.79      (2.7%)       22.57      
(2.2%)   -1.0% (  -5% -    4%) 0.214
                            Term100      445.94      (2.9%)      441.68      
(2.8%)   -1.0% (  -6% -    4%) 0.291
                               Term      445.68      (2.9%)      441.44      
(3.0%)   -1.0% (  -6% -    5%) 0.303
                            Term10K      445.19      (2.9%)      441.01      
(2.8%)   -0.9% (  -6% -    4%) 0.298
                          TermB1M1P      445.76      (2.9%)      441.60      
(2.9%)   -0.9% (  -6% -    5%) 0.311
                          And3Terms       72.47      (3.7%)       71.80      
(3.5%)   -0.9% (  -7% -    6%) 0.420
                             Term1M      445.53      (2.8%)      441.52      
(2.9%)   -0.9% (  -6% -    4%) 0.320
                    FilteredPrefix3       71.42      (3.0%)       70.80      
(3.6%)   -0.9% (  -7% -    5%) 0.410
                         OrHighRare       95.32      (5.4%)       94.53      
(5.0%)   -0.8% ( -10% -   10%) 0.615
                          OrHighMed       66.68      (4.1%)       66.21      
(3.6%)   -0.7% (  -7% -    7%) 0.561
                         AndHighMed       54.25      (3.0%)       53.88      
(2.8%)   -0.7% (  -6% -    5%) 0.454
                         DismaxTerm      480.63      (3.5%)      477.44      
(3.7%)   -0.7% (  -7% -    6%) 0.556
                       FilteredTerm       63.21      (2.6%)       62.84      
(2.4%)   -0.6% (  -5% -    4%) 0.460
                    CountAndHighMed       75.83      (1.8%)       75.42      
(2.4%)   -0.5% (  -4% -    3%) 0.426
                           Or3Terms       64.39      (3.7%)       64.07      
(3.4%)   -0.5% (  -7% -    6%) 0.648
                   DismaxOrHighHigh       35.26      (2.4%)       35.09      
(2.6%)   -0.5% (  -5% -    4%) 0.528
                     CountOrHighMed       78.98      (1.7%)       78.64      
(1.8%)   -0.4% (  -3% -    3%) 0.447
                            Prefix3       76.18      (3.1%)       75.90      
(4.0%)   -0.4% (  -7% -    6%) 0.742
                     FilteredPhrase        9.72      (2.7%)        9.68      
(2.2%)   -0.3% (  -5% -    4%) 0.675
                    DismaxOrHighMed       49.38      (3.3%)       49.23      
(3.1%)   -0.3% (  -6% -    6%) 0.774
                And2Terms2StopWords       57.85      (5.9%)       57.68      
(5.6%)   -0.3% ( -11% -   11%) 0.874
                           Wildcard       47.49      (3.0%)       47.36      
(3.4%)   -0.3% (  -6% -    6%) 0.795
                             Phrase        7.53      (3.0%)        7.51      
(2.4%)   -0.2% (  -5% -    5%) 0.801
                    AndHighOrMedMed       14.10      (3.4%)       14.08      
(3.3%)   -0.2% (  -6% -    6%) 0.864
                  FilteredAnd3Terms      104.14      (2.9%)      103.97      
(2.3%)   -0.2% (  -5% -    5%) 0.840
                             IntSet      287.42      (4.0%)      286.99      
(3.8%)   -0.2% (  -7% -    7%) 0.903
                FilteredOrStopWords        8.14      (2.1%)        8.13      
(2.4%)   -0.1% (  -4% -    4%) 0.844
                 FilteredOrHighHigh       12.87      (2.7%)       12.86      
(2.5%)   -0.1% (  -5% -    5%) 0.875
                             Fuzzy1       39.10      (3.8%)       39.05      
(3.2%)   -0.1% (  -6% -    7%) 0.911
                  FilteredOrHighMed       38.08      (3.7%)       38.04      
(3.2%)   -0.1% (  -6% -    7%) 0.931
                     FilteredIntNRQ       42.38      (2.5%)       42.35      
(2.3%)   -0.1% (  -4% -    4%) 0.919
                   FilteredOr3Terms       42.96      (3.7%)       42.95      
(3.1%)   -0.0% (  -6% -    6%) 0.983
                   IntervalsOrdered        2.43      (3.9%)        2.42      
(3.3%)   -0.0% (  -6% -    7%) 0.985
         FilteredOr2Terms2StopWords       48.16      (4.5%)       48.17      
(4.0%)    0.0% (  -8% -    8%) 0.985
                     FilteredOrMany        3.98      (3.4%)        3.98      
(2.7%)    0.1% (  -5% -    6%) 0.949
                             Fuzzy2       35.37      (3.5%)       35.42      
(3.1%)    0.1% (  -6% -    7%) 0.905
                CountFilteredIntNRQ       16.31      (1.1%)       16.33      
(0.9%)    0.1% (  -1% -    2%) 0.673
                        CountPhrase        2.67      (3.8%)        2.67      
(3.4%)    0.2% (  -6% -    7%) 0.870
                CountFilteredPhrase        8.89      (3.3%)        8.91      
(3.0%)    0.2% (  -5% -    6%) 0.839
             CountFilteredOrHighMed       17.86      (0.6%)       17.89      
(0.5%)    0.2% (   0% -    1%) 0.234
            CountFilteredOrHighHigh       15.78      (0.8%)       15.81      
(0.7%)    0.2% (  -1% -    1%) 0.334
                             IntNRQ       42.71      (2.5%)       42.80      
(2.2%)    0.2% (  -4% -    5%) 0.766
        FilteredAnd2Terms2StopWords       59.46      (4.6%)       59.66      
(4.3%)    0.3% (  -8% -    9%) 0.810
                    CountOrHighHigh       50.23      (2.1%)       50.41      
(2.0%)    0.4% (  -3% -    4%) 0.558
                  CombinedOrHighMed       20.51      (4.4%)       20.59      
(5.0%)    0.4% (  -8% -   10%) 0.799
                        CountOrMany        4.93      (3.3%)        4.95      
(3.2%)    0.5% (  -5% -    7%) 0.653
                             OrMany        4.55      (5.4%)        4.57      
(4.8%)    0.5% (  -9% -   11%) 0.770
                 CombinedAndHighMed       20.75      (4.2%)       20.86      
(4.2%)    0.5% (  -7% -    9%) 0.690
                   CountAndHighHigh       48.78      (1.9%)       49.08      
(1.8%)    0.6% (  -2% -    4%) 0.295
                            Respell       35.79      (4.3%)       36.05      
(2.5%)    0.7% (  -5% -    7%) 0.519
                 CombinedOrHighHigh        5.65      (3.3%)        5.69      
(3.7%)    0.8% (  -6% -    8%) 0.492
                CountFilteredOrMany        4.35      (2.6%)        4.39      
(2.6%)    0.8% (  -4% -    6%) 0.332
                          CountTerm     5812.39      (2.7%)     5862.14      
(2.9%)    0.9% (  -4% -    6%) 0.335
                       SloppyPhrase        1.14      (4.5%)        1.15      
(4.8%)    0.9% (  -8% -   10%) 0.538
                CombinedAndHighHigh        5.71      (1.7%)        5.76      
(1.8%)    1.0% (  -2% -    4%) 0.075
                   AndMedOrHighHigh       16.62      (3.2%)       16.78      
(3.2%)    1.0% (  -5% -    7%) 0.316
                           SpanNear        2.48      (5.2%)        2.51      
(5.3%)    1.0% (  -8% -   12%) 0.538
                       AndStopWords        9.11      (3.0%)        9.31      
(1.9%)    2.2% (  -2% -    7%) 0.006
                 FilteredAndHighMed       31.76      (2.6%)       32.53      
(1.6%)    2.4% (  -1% -    6%) 0.000
                        OrStopWords        9.17      (1.9%)        9.39      
(3.1%)    2.5% (  -2% -    7%) 0.002
               FilteredAndStopWords        8.57      (2.8%)        8.80      
(1.3%)    2.7% (  -1% -    6%) 0.000
                FilteredAndHighHigh       10.61      (2.6%)       10.92      
(1.0%)    2.9% (   0% -    6%) 0.000
   ```
   
   I'm planning to do another round of benchmark after 
https://github.com/mikemccand/luceneutil/pull/436 is merged, maybe the speedup 
is not real ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Brings back `Scorer#applyAsRequiredClause` [lucene]

Reply via email to