HUSTERGS opened a new pull request, #14968:
URL: https://github.com/apache/lucene/pull/14968
### Description
Inspired by #14690, this PR essentially tries to bring back the
`Scorer#applyAsRequiredClause` interface, but different from #14690 , I'm
wondering whether we can pass the `DocAndScoreAccBuffer` all the way down to
the posting, so maybe we can benefit from reducing the `advance` function calls
when the buffer have a dense doc id set, eg, utilize the SIMID again. So I
added a new interface on `PostingsEnum#nextRequiredFreqBuffer` (not stable
yet), currently I only implement the default implemetation, still trying to
speedup the process under `BlockPostingEnum`.
This is still under development, I know we should be cautious about adding
new public interface (especially two at once!), but I want to share current
progress, below are the luceneutil benchmark result on `wikimediumall` with
`searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50`, here is the
result after 20 iterations (against the latest code):
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
Term 472.96 (6.1%) 461.21
(6.8%) -2.5% ( -14% - 11%) 0.224
Term1M 472.39 (5.9%) 460.90
(7.0%) -2.4% ( -14% - 11%) 0.232
Term100 472.69 (5.9%) 461.52
(6.8%) -2.4% ( -14% - 10%) 0.239
TermB1M1P 472.28 (5.9%) 461.42
(6.9%) -2.3% ( -14% - 11%) 0.257
Term10K 472.60 (6.0%) 461.82
(6.8%) -2.3% ( -14% - 11%) 0.260
TermB1M 472.22 (6.0%) 461.48
(6.9%) -2.3% ( -14% - 11%) 0.266
DismaxOrHighHigh 35.68 (3.5%) 35.03
(3.9%) -1.8% ( -8% - 5%) 0.118
OrMany 4.73 (5.3%) 4.66
(5.7%) -1.4% ( -11% - 10%) 0.429
IntervalsOrdered 2.47 (1.9%) 2.44
(3.1%) -1.3% ( -6% - 3%) 0.120
OrHighRare 96.38 (6.9%) 95.20
(6.5%) -1.2% ( -13% - 13%) 0.564
DismaxOrHighMed 50.96 (4.1%) 50.35
(4.2%) -1.2% ( -9% - 7%) 0.358
CombinedTerm 11.17 (3.7%) 11.04
(4.8%) -1.1% ( -9% - 7%) 0.402
DismaxTerm 516.28 (5.0%) 510.53
(4.1%) -1.1% ( -9% - 8%) 0.440
OrHighHigh 21.39 (2.5%) 21.22
(2.8%) -0.8% ( -5% - 4%) 0.319
AndHighHigh 22.46 (2.4%) 22.27
(2.5%) -0.8% ( -5% - 4%) 0.292
OrHighMed 68.77 (4.0%) 68.47
(4.3%) -0.4% ( -8% - 8%) 0.748
CountFilteredPhrase 9.21 (2.8%) 9.17
(3.4%) -0.4% ( -6% - 6%) 0.675
SpanNear 2.51 (4.1%) 2.51
(4.5%) -0.3% ( -8% - 8%) 0.817
Or2Terms2StopWords 62.44 (6.0%) 62.25
(6.2%) -0.3% ( -11% - 12%) 0.876
SloppyPhrase 1.14 (3.2%) 1.13
(4.1%) -0.3% ( -7% - 7%) 0.796
FilteredOrStopWords 8.16 (2.3%) 8.14
(2.8%) -0.3% ( -5% - 4%) 0.729
CountPhrase 2.71 (2.0%) 2.71
(2.3%) -0.2% ( -4% - 4%) 0.731
CountFilteredIntNRQ 16.35 (1.4%) 16.34
(1.3%) -0.1% ( -2% - 2%) 0.862
AndMedOrHighHigh 16.75 (2.1%) 16.74
(3.3%) -0.1% ( -5% - 5%) 0.935
FilteredIntNRQ 42.25 (3.1%) 42.23
(3.1%) -0.0% ( -6% - 6%) 0.967
AndHighOrMedMed 14.11 (3.2%) 14.11
(3.1%) -0.0% ( -6% - 6%) 1.000
FilteredOrHighHigh 13.03 (2.6%) 13.03
(2.9%) 0.0% ( -5% - 5%) 0.996
IntNRQ 42.59 (3.2%) 42.60
(3.2%) 0.0% ( -6% - 6%) 0.979
CountFilteredOrHighMed 17.95 (0.8%) 17.97
(0.9%) 0.1% ( -1% - 1%) 0.806
CountFilteredOrHighHigh 15.86 (0.9%) 15.88
(1.0%) 0.1% ( -1% - 2%) 0.668
Respell 37.00 (4.4%) 37.07
(3.9%) 0.2% ( -7% - 8%) 0.897
AndHighMed 54.33 (3.3%) 54.43
(3.6%) 0.2% ( -6% - 7%) 0.861
Phrase 7.62 (2.6%) 7.64
(3.3%) 0.2% ( -5% - 6%) 0.834
Fuzzy2 36.78 (4.3%) 36.89
(4.8%) 0.3% ( -8% - 9%) 0.836
FilteredPhrase 9.86 (2.6%) 9.89
(3.1%) 0.3% ( -5% - 6%) 0.737
Fuzzy1 40.61 (4.3%) 40.74
(4.9%) 0.3% ( -8% - 9%) 0.822
FilteredOrHighMed 39.08 (3.7%) 39.21
(3.8%) 0.3% ( -6% - 8%) 0.780
FilteredTerm 65.09 (3.6%) 65.34
(3.9%) 0.4% ( -6% - 8%) 0.756
CountAndHighMed 75.20 (2.5%) 75.50
(3.1%) 0.4% ( -5% - 6%) 0.649
FilteredOr3Terms 44.00 (3.7%) 44.21
(3.9%) 0.5% ( -6% - 8%) 0.683
CombinedAndHighHigh 5.73 (2.1%) 5.76
(2.0%) 0.5% ( -3% - 4%) 0.430
CountFilteredOrMany 4.46 (2.6%) 4.48
(2.9%) 0.6% ( -4% - 6%) 0.511
TermTitleSort 51.76 (4.4%) 52.07
(5.6%) 0.6% ( -9% - 11%) 0.707
FilteredAnd3Terms 101.78 (3.5%) 102.42
(3.3%) 0.6% ( -5% - 7%) 0.552
CountAndHighHigh 48.46 (2.3%) 48.77
(2.5%) 0.6% ( -4% - 5%) 0.406
CountOrHighMed 77.72 (2.3%) 78.26
(2.5%) 0.7% ( -4% - 5%) 0.360
And2Terms2StopWords 60.68 (6.5%) 61.13
(6.6%) 0.7% ( -11% - 14%) 0.720
CombinedOrHighHigh 5.62 (3.5%) 5.66
(4.1%) 0.8% ( -6% - 8%) 0.527
And3Terms 72.70 (4.0%) 73.25
(4.2%) 0.8% ( -7% - 9%) 0.559
FilteredOr2Terms2StopWords 50.50 (4.8%) 50.89
(5.0%) 0.8% ( -8% - 11%) 0.624
CombinedAndHighMed 21.62 (4.9%) 21.78
(4.5%) 0.8% ( -8% - 10%) 0.608
CountOrHighHigh 49.83 (2.6%) 50.25
(2.6%) 0.8% ( -4% - 6%) 0.307
FilteredOrMany 4.03 (2.7%) 4.06
(2.8%) 0.9% ( -4% - 6%) 0.324
CombinedOrHighMed 21.21 (4.8%) 21.41
(5.1%) 0.9% ( -8% - 11%) 0.552
CountOrMany 5.02 (3.1%) 5.07
(2.9%) 0.9% ( -4% - 7%) 0.328
TermDTSort 145.50 (5.2%) 147.04
(4.9%) 1.1% ( -8% - 11%) 0.509
Or3Terms 65.31 (3.8%) 66.02
(4.4%) 1.1% ( -6% - 9%) 0.404
TermDayOfYearSort 266.30 (3.6%) 269.35
(3.8%) 1.1% ( -6% - 8%) 0.331
Wildcard 47.14 (3.5%) 47.75
(4.4%) 1.3% ( -6% - 9%) 0.298
IntSet 298.82 (4.0%) 303.31
(5.5%) 1.5% ( -7% - 11%) 0.325
FilteredPrefix3 69.72 (3.6%) 70.79
(3.3%) 1.5% ( -5% - 8%) 0.162
TermMonthSort 2187.76 (4.5%) 2231.91
(4.8%) 2.0% ( -7% - 11%) 0.173
FilteredAnd2Terms2StopWords 60.66 (4.9%) 61.89
(5.0%) 2.0% ( -7% - 12%) 0.192
Prefix3 74.19 (4.0%) 75.82
(3.6%) 2.2% ( -5% - 10%) 0.069
CountTerm 6268.63 (6.8%) 6406.68
(7.1%) 2.2% ( -11% - 17%) 0.319
OrStopWords 8.96 (2.2%) 9.21
(3.7%) 2.9% ( -3% - 8%) 0.003
AndStopWords 8.78 (2.9%) 9.14
(2.8%) 4.1% ( -1% - 10%) 0.000
FilteredAndHighMed 31.61 (2.4%) 32.97
(2.2%) 4.3% ( 0% - 9%) 0.000
FilteredAndStopWords 8.34 (2.3%) 8.86
(2.1%) 6.3% ( 1% - 10%) 0.000
FilteredAndHighHigh 10.32 (2.5%) 11.00
(1.9%) 6.7% ( 2% - 11%) 0.000
```
I think it's promissing to look into this approach more. If I understand
correctly , this speedup should only come from the reduces virtual function
calls ?
<!--
If this is your first contribution to Lucene, please make sure you have
reviewed the contribution guide.
https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]