costin commented on code in PR #16141:
URL: https://github.com/apache/lucene/pull/16141#discussion_r3315771868
##########
lucene/core/src/java/org/apache/lucene/search/Weight.java:
##########
@@ -277,7 +282,12 @@ public int score(LeafCollector collector, Bits acceptDocs,
int min, int max)
// iterator implementations.
if (twoPhase == null && competitiveIterator == null) {
// Optimize simple iterators with collectors that can't skip
- scoreIterator(collector, acceptDocs, iterator, max);
+ if (scorer instanceof ConstantScoreScorer constantScoreScorer
+ && constantScoreScorer.canBulkCollectDocIdStream()) {
+ scoreIteratorIntoBitSet(collector, acceptDocs, iterator, max);
Review Comment:
Ran the benchmark at the sparse densities mentioned (Apple M3/aarch64).
Results (1M docs, no deletions):
```
Density Docs Baseline (ops/ms) intoBitSet (ops/ms) Winner
0.01% 100 387 169 Baseline 2.3x
0.1% 1,000 181 105 Baseline 1.7x
0.2% 2,000 108 89 Baseline 1.2x
0.3% 3,000 62 86 intoBitSet 1.4x
← crossover
0.5% 5,000 46 86 intoBitSet 1.9x
1% 10,000 47 85 intoBitSet 1.8x
3% 30,000 8 84 intoBitSet 10.9x
```
With deletions the crossover shifts slightly (between 0.1% and 0.5%) but the
pattern is the same.
The crossover is around 0.25% density. Below that, per-window overhead
(clearing + iterating ~244 windows for 1M docs) exceeds the per-doc savings.
Above that, intoBitSet wins and the gap widens fast.
In absolute terms the regression at extreme sparsity is small. At 0.01%
density (100 matching docs in 1M), the difference is ~3.3µs per query. At 10M
docs, ~33µs. Both are insignificant compared to the milliseconds of a typical
Lucene query or even a single disk seek (50-100µs).
Queries that land on `DefaultBulkScorer` at <0.3% density are also rare in
practice since filters that selective typically resolve through more
specialized scorers.
That said, I can add the density gate if you prefer. Something like `cost >=
(max - min) / 512` would match the observed crossover without being overly
conservative.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]