HUSTERGS commented on PR #14968:
URL: https://github.com/apache/lucene/pull/14968#issuecomment-3094415292
> I'd suggest to focus this first PR on the `Scorer#applyAsRequiredClause`
API and later see if there's more room for speedups by adding new APIs to
`PostingsEnum` in a follow-up PR?
Yeah, I think it's a good idea, I did some experiment with some detail of
current version of code these days.
I've move the `PostingEnum` related code directly into the
`applyAsRequiredClause` and removed the dependency for newly intruduced
`NormAndFreqBuffer`, the luceneutil benchmark result seems no longer yield a
good performance gain (at least not as good as before):, especially for the
`OrStopWords` query, Here is the result:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
OrHighMed 69.09 (3.6%) 67.53
(10.6%) -2.3% ( -15% - 12%) 0.365
Or2Terms2StopWords 63.02 (5.4%) 61.66
(8.7%) -2.2% ( -15% - 12%) 0.344
CombinedTerm 11.07 (5.1%) 10.85
(4.6%) -2.1% ( -11% - 8%) 0.180
AndHighHigh 22.49 (2.8%) 22.03
(9.5%) -2.1% ( -13% - 10%) 0.354
OrHighHigh 21.41 (2.3%) 20.98
(9.9%) -2.0% ( -13% - 10%) 0.384
And3Terms 73.13 (3.5%) 71.87
(8.7%) -1.7% ( -13% - 10%) 0.409
TermB1M 473.09 (4.4%) 465.04
(8.0%) -1.7% ( -13% - 11%) 0.404
TermB1M1P 473.02 (4.4%) 465.15
(8.1%) -1.7% ( -13% - 11%) 0.419
AndHighMed 54.53 (3.2%) 53.64
(10.0%) -1.6% ( -14% - 11%) 0.485
Or3Terms 65.73 (3.6%) 64.70
(9.4%) -1.6% ( -13% - 11%) 0.484
Term10K 472.65 (4.4%) 465.27
(8.1%) -1.6% ( -13% - 11%) 0.450
Term100 472.96 (4.2%) 465.67
(8.1%) -1.5% ( -13% - 11%) 0.452
Term1M 472.95 (4.4%) 465.91
(8.1%) -1.5% ( -13% - 11%) 0.469
Term 472.92 (4.5%) 465.98
(8.2%) -1.5% ( -13% - 11%) 0.481
And2Terms2StopWords 61.13 (5.8%) 60.43
(8.9%) -1.2% ( -14% - 14%) 0.627
TermMonthSort 2219.47 (3.2%) 2201.87
(3.7%) -0.8% ( -7% - 6%) 0.467
DismaxOrHighMed 50.75 (3.5%) 50.42
(7.9%) -0.7% ( -11% - 11%) 0.733
AndMedOrHighHigh 16.90 (2.9%) 16.80
(4.8%) -0.6% ( -8% - 7%) 0.621
IntSet 298.20 (4.4%) 296.49
(4.0%) -0.6% ( -8% - 8%) 0.666
DismaxTerm 513.00 (4.7%) 510.11
(6.5%) -0.6% ( -11% - 11%) 0.753
DismaxOrHighHigh 35.47 (3.2%) 35.31
(6.3%) -0.4% ( -9% - 9%) 0.776
FilteredOr3Terms 44.20 (3.2%) 44.11
(3.1%) -0.2% ( -6% - 6%) 0.831
FilteredOr2Terms2StopWords 50.82 (4.1%) 50.72
(4.3%) -0.2% ( -8% - 8%) 0.886
Fuzzy1 40.73 (3.7%) 40.65
(4.9%) -0.2% ( -8% - 8%) 0.891
OrMany 4.69 (3.7%) 4.69
(6.1%) -0.1% ( -9% - 10%) 0.950
CombinedAndHighMed 21.52 (4.6%) 21.51
(4.3%) -0.1% ( -8% - 9%) 0.967
CountOrHighMed 78.12 (1.7%) 78.08
(2.6%) -0.0% ( -4% - 4%) 0.944
FilteredOrMany 4.06 (2.4%) 4.06
(2.2%) -0.0% ( -4% - 4%) 0.964
FilteredAnd2Terms2StopWords 61.06 (4.3%) 61.04
(6.2%) -0.0% ( -10% - 10%) 0.986
FilteredOrHighMed 39.18 (3.3%) 39.17
(3.2%) -0.0% ( -6% - 6%) 0.986
CountTerm 6298.40 (4.3%) 6297.77
(4.5%) -0.0% ( -8% - 9%) 0.994
Fuzzy2 36.89 (3.5%) 36.90
(4.7%) 0.0% ( -7% - 8%) 0.980
CountAndHighMed 75.47 (1.7%) 75.50
(2.1%) 0.0% ( -3% - 3%) 0.941
IntNRQ 42.55 (2.2%) 42.57
(3.0%) 0.1% ( -4% - 5%) 0.946
FilteredAnd3Terms 101.94 (2.3%) 102.05
(2.9%) 0.1% ( -5% - 5%) 0.900
CountFilteredOrHighMed 17.95 (0.7%) 17.98
(0.6%) 0.2% ( -1% - 1%) 0.460
FilteredOrHighHigh 13.02 (2.5%) 13.05
(2.3%) 0.2% ( -4% - 5%) 0.813
FilteredIntNRQ 42.16 (2.3%) 42.24
(3.0%) 0.2% ( -4% - 5%) 0.820
CountFilteredIntNRQ 16.31 (0.8%) 16.35
(1.2%) 0.2% ( -1% - 2%) 0.468
CountFilteredOrHighHigh 15.86 (0.8%) 15.90
(0.8%) 0.3% ( -1% - 1%) 0.331
CountOrHighHigh 50.16 (2.4%) 50.29
(2.5%) 0.3% ( -4% - 5%) 0.724
CountFilteredPhrase 9.18 (2.5%) 9.21
(3.4%) 0.3% ( -5% - 6%) 0.771
Wildcard 47.34 (3.3%) 47.48
(3.7%) 0.3% ( -6% - 7%) 0.790
AndHighOrMedMed 14.04 (2.2%) 14.08
(2.6%) 0.3% ( -4% - 5%) 0.688
IntervalsOrdered 2.43 (3.4%) 2.44
(3.3%) 0.3% ( -6% - 7%) 0.760
CountOrMany 5.04 (2.9%) 5.06
(2.8%) 0.4% ( -5% - 6%) 0.696
CountFilteredOrMany 4.46 (2.5%) 4.48
(2.6%) 0.4% ( -4% - 5%) 0.635
TermTitleSort 51.93 (4.8%) 52.13
(5.1%) 0.4% ( -9% - 10%) 0.809
CountAndHighHigh 48.66 (2.2%) 48.85
(2.2%) 0.4% ( -3% - 4%) 0.560
CombinedAndHighHigh 5.67 (2.8%) 5.69
(2.3%) 0.4% ( -4% - 5%) 0.597
Prefix3 75.57 (3.8%) 75.90
(3.3%) 0.4% ( -6% - 7%) 0.699
FilteredPrefix3 70.64 (3.3%) 70.98
(3.1%) 0.5% ( -5% - 7%) 0.637
Respell 36.72 (3.5%) 36.93
(3.6%) 0.6% ( -6% - 7%) 0.603
SpanNear 2.45 (5.5%) 2.46
(5.4%) 0.6% ( -9% - 12%) 0.733
FilteredOrStopWords 8.13 (2.2%) 8.18
(2.0%) 0.7% ( -3% - 5%) 0.329
FilteredTerm 64.92 (3.0%) 65.36
(3.6%) 0.7% ( -5% - 7%) 0.522
TermDTSort 144.97 (3.3%) 146.07
(4.8%) 0.8% ( -7% - 9%) 0.561
FilteredPhrase 9.83 (2.2%) 9.91
(2.6%) 0.8% ( -3% - 5%) 0.297
SloppyPhrase 1.12 (5.3%) 1.13
(4.9%) 0.8% ( -8% - 11%) 0.616
Phrase 7.57 (4.3%) 7.64
(4.3%) 0.9% ( -7% - 9%) 0.490
TermDayOfYearSort 264.98 (2.6%) 267.70
(2.9%) 1.0% ( -4% - 6%) 0.241
OrHighRare 94.68 (6.8%) 95.91
(5.4%) 1.3% ( -10% - 14%) 0.501
CombinedOrHighMed 21.05 (5.6%) 21.35
(4.6%) 1.4% ( -8% - 12%) 0.396
AndStopWords 8.87 (3.6%) 9.01
(7.4%) 1.6% ( -9% - 12%) 0.399
CountPhrase 2.65 (4.9%) 2.69
(3.2%) 1.9% ( -5% - 10%) 0.154
CombinedOrHighHigh 5.54 (5.1%) 5.66
(3.0%) 2.2% ( -5% - 10%) 0.092
OrStopWords 8.99 (3.2%) 9.20
(8.8%) 2.3% ( -9% - 14%) 0.263
FilteredAndHighMed 31.76 (2.4%) 32.52
(4.0%) 2.4% ( -3% - 8%) 0.020
FilteredAndStopWords 8.41 (3.1%) 8.75
(2.0%) 4.0% ( -1% - 9%) 0.000
FilteredAndHighHigh 10.41 (3.1%) 10.87
(1.8%) 4.4% ( 0% - 9%) 0.000
```
If I still use the `NormAndFreqBuffer` (instead of `freqs` and `normValues`
raw arrays inside `TermScorer`), the performance seems to be better? A little
bit strange to me, Here is the result under identical setup (only related
querys are showed below)
```
CombinedOrHighMed 21.60 (4.0%) 21.98
(3.8%) 1.8% ( -5% - 9%) 0.151
OrStopWords 9.05 (1.4%) 9.23
(3.1%) 2.0% ( -2% - 6%) 0.009
CombinedOrHighHigh 5.68 (2.7%) 5.81
(2.2%) 2.2% ( -2% - 7%) 0.005
FilteredAndHighMed 31.77 (2.2%) 32.77
(1.5%) 3.1% ( 0% - 7%) 0.000
FilteredAndStopWords 8.40 (2.4%) 8.72
(1.9%) 3.8% ( 0% - 8%) 0.000
FilteredAndHighHigh 10.37 (2.4%) 10.84
(1.3%) 4.5% ( 0% - 8%) 0.000
```
Not sure what causes the differences : (
Will push a new commit using raw array though
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]