HUSTERGS commented on PR #14968:
URL: https://github.com/apache/lucene/pull/14968#issuecomment-3178922979
> Another idea: in order to not add new APIs, an alternative would be to
implement specialized bulk scorers for the case when all scorers are term
scorers, on the same field (a common case, and arguably the case we're most
interested in optimizing) and work directly on `ImpactsEnum`, norms, and
`SimScorer`. This should allow us to do interesting things without introducing
new APIs, such as reading norms only once per doc ID or vectorizing score
computations of required/non-essential clauses.
I'm waiting for #15039 to merge, and looking forward to dig a little bit
more about this
> I suspect there is some connections between #15004 and this PR (there are
some overlaps of affected tasks), maybe we should wait for the #15004 being
merged into the main branch and compare the performance diff of this PR then ?
Since #15004 is merged, I ran the benchmark with result below:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
CombinedTerm 11.25 (3.8%) 11.03
(4.6%) -1.9% ( -9% - 6%) 0.150
OrHighHigh 21.77 (2.1%) 21.50
(2.0%) -1.3% ( -5% - 2%) 0.050
Or2Terms2StopWords 59.40 (5.5%) 58.70
(5.1%) -1.2% ( -11% - 10%) 0.485
TermB1M 445.35 (2.8%) 440.99
(2.9%) -1.0% ( -6% - 4%) 0.284
AndHighHigh 22.79 (2.7%) 22.57
(2.2%) -1.0% ( -5% - 4%) 0.214
Term100 445.94 (2.9%) 441.68
(2.8%) -1.0% ( -6% - 4%) 0.291
Term 445.68 (2.9%) 441.44
(3.0%) -1.0% ( -6% - 5%) 0.303
Term10K 445.19 (2.9%) 441.01
(2.8%) -0.9% ( -6% - 4%) 0.298
TermB1M1P 445.76 (2.9%) 441.60
(2.9%) -0.9% ( -6% - 5%) 0.311
And3Terms 72.47 (3.7%) 71.80
(3.5%) -0.9% ( -7% - 6%) 0.420
Term1M 445.53 (2.8%) 441.52
(2.9%) -0.9% ( -6% - 4%) 0.320
FilteredPrefix3 71.42 (3.0%) 70.80
(3.6%) -0.9% ( -7% - 5%) 0.410
OrHighRare 95.32 (5.4%) 94.53
(5.0%) -0.8% ( -10% - 10%) 0.615
OrHighMed 66.68 (4.1%) 66.21
(3.6%) -0.7% ( -7% - 7%) 0.561
AndHighMed 54.25 (3.0%) 53.88
(2.8%) -0.7% ( -6% - 5%) 0.454
DismaxTerm 480.63 (3.5%) 477.44
(3.7%) -0.7% ( -7% - 6%) 0.556
FilteredTerm 63.21 (2.6%) 62.84
(2.4%) -0.6% ( -5% - 4%) 0.460
CountAndHighMed 75.83 (1.8%) 75.42
(2.4%) -0.5% ( -4% - 3%) 0.426
Or3Terms 64.39 (3.7%) 64.07
(3.4%) -0.5% ( -7% - 6%) 0.648
DismaxOrHighHigh 35.26 (2.4%) 35.09
(2.6%) -0.5% ( -5% - 4%) 0.528
CountOrHighMed 78.98 (1.7%) 78.64
(1.8%) -0.4% ( -3% - 3%) 0.447
Prefix3 76.18 (3.1%) 75.90
(4.0%) -0.4% ( -7% - 6%) 0.742
FilteredPhrase 9.72 (2.7%) 9.68
(2.2%) -0.3% ( -5% - 4%) 0.675
DismaxOrHighMed 49.38 (3.3%) 49.23
(3.1%) -0.3% ( -6% - 6%) 0.774
And2Terms2StopWords 57.85 (5.9%) 57.68
(5.6%) -0.3% ( -11% - 11%) 0.874
Wildcard 47.49 (3.0%) 47.36
(3.4%) -0.3% ( -6% - 6%) 0.795
Phrase 7.53 (3.0%) 7.51
(2.4%) -0.2% ( -5% - 5%) 0.801
AndHighOrMedMed 14.10 (3.4%) 14.08
(3.3%) -0.2% ( -6% - 6%) 0.864
FilteredAnd3Terms 104.14 (2.9%) 103.97
(2.3%) -0.2% ( -5% - 5%) 0.840
IntSet 287.42 (4.0%) 286.99
(3.8%) -0.2% ( -7% - 7%) 0.903
FilteredOrStopWords 8.14 (2.1%) 8.13
(2.4%) -0.1% ( -4% - 4%) 0.844
FilteredOrHighHigh 12.87 (2.7%) 12.86
(2.5%) -0.1% ( -5% - 5%) 0.875
Fuzzy1 39.10 (3.8%) 39.05
(3.2%) -0.1% ( -6% - 7%) 0.911
FilteredOrHighMed 38.08 (3.7%) 38.04
(3.2%) -0.1% ( -6% - 7%) 0.931
FilteredIntNRQ 42.38 (2.5%) 42.35
(2.3%) -0.1% ( -4% - 4%) 0.919
FilteredOr3Terms 42.96 (3.7%) 42.95
(3.1%) -0.0% ( -6% - 6%) 0.983
IntervalsOrdered 2.43 (3.9%) 2.42
(3.3%) -0.0% ( -6% - 7%) 0.985
FilteredOr2Terms2StopWords 48.16 (4.5%) 48.17
(4.0%) 0.0% ( -8% - 8%) 0.985
FilteredOrMany 3.98 (3.4%) 3.98
(2.7%) 0.1% ( -5% - 6%) 0.949
Fuzzy2 35.37 (3.5%) 35.42
(3.1%) 0.1% ( -6% - 7%) 0.905
CountFilteredIntNRQ 16.31 (1.1%) 16.33
(0.9%) 0.1% ( -1% - 2%) 0.673
CountPhrase 2.67 (3.8%) 2.67
(3.4%) 0.2% ( -6% - 7%) 0.870
CountFilteredPhrase 8.89 (3.3%) 8.91
(3.0%) 0.2% ( -5% - 6%) 0.839
CountFilteredOrHighMed 17.86 (0.6%) 17.89
(0.5%) 0.2% ( 0% - 1%) 0.234
CountFilteredOrHighHigh 15.78 (0.8%) 15.81
(0.7%) 0.2% ( -1% - 1%) 0.334
IntNRQ 42.71 (2.5%) 42.80
(2.2%) 0.2% ( -4% - 5%) 0.766
FilteredAnd2Terms2StopWords 59.46 (4.6%) 59.66
(4.3%) 0.3% ( -8% - 9%) 0.810
CountOrHighHigh 50.23 (2.1%) 50.41
(2.0%) 0.4% ( -3% - 4%) 0.558
CombinedOrHighMed 20.51 (4.4%) 20.59
(5.0%) 0.4% ( -8% - 10%) 0.799
CountOrMany 4.93 (3.3%) 4.95
(3.2%) 0.5% ( -5% - 7%) 0.653
OrMany 4.55 (5.4%) 4.57
(4.8%) 0.5% ( -9% - 11%) 0.770
CombinedAndHighMed 20.75 (4.2%) 20.86
(4.2%) 0.5% ( -7% - 9%) 0.690
CountAndHighHigh 48.78 (1.9%) 49.08
(1.8%) 0.6% ( -2% - 4%) 0.295
Respell 35.79 (4.3%) 36.05
(2.5%) 0.7% ( -5% - 7%) 0.519
CombinedOrHighHigh 5.65 (3.3%) 5.69
(3.7%) 0.8% ( -6% - 8%) 0.492
CountFilteredOrMany 4.35 (2.6%) 4.39
(2.6%) 0.8% ( -4% - 6%) 0.332
CountTerm 5812.39 (2.7%) 5862.14
(2.9%) 0.9% ( -4% - 6%) 0.335
SloppyPhrase 1.14 (4.5%) 1.15
(4.8%) 0.9% ( -8% - 10%) 0.538
CombinedAndHighHigh 5.71 (1.7%) 5.76
(1.8%) 1.0% ( -2% - 4%) 0.075
AndMedOrHighHigh 16.62 (3.2%) 16.78
(3.2%) 1.0% ( -5% - 7%) 0.316
SpanNear 2.48 (5.2%) 2.51
(5.3%) 1.0% ( -8% - 12%) 0.538
AndStopWords 9.11 (3.0%) 9.31
(1.9%) 2.2% ( -2% - 7%) 0.006
FilteredAndHighMed 31.76 (2.6%) 32.53
(1.6%) 2.4% ( -1% - 6%) 0.000
OrStopWords 9.17 (1.9%) 9.39
(3.1%) 2.5% ( -2% - 7%) 0.002
FilteredAndStopWords 8.57 (2.8%) 8.80
(1.3%) 2.7% ( -1% - 6%) 0.000
FilteredAndHighHigh 10.61 (2.6%) 10.92
(1.0%) 2.9% ( 0% - 6%) 0.000
```
I'm planning to do another round of benchmark after
https://github.com/mikemccand/luceneutil/pull/436 is merged, maybe the speedup
is not real ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]