Re: [PR] Optimize DefaultBulkScorer for ConstantScoreScorer via intoBitSet batching [lucene]

via GitHub Wed, 27 May 2026 23:31:02 -0700


costin commented on code in PR #16141:
URL: https://github.com/apache/lucene/pull/16141#discussion_r3315771868



##########
lucene/core/src/java/org/apache/lucene/search/Weight.java:
##########
@@ -277,7 +282,12 @@ public int score(LeafCollector collector, Bits acceptDocs, 
int min, int max)
       // iterator implementations.
       if (twoPhase == null && competitiveIterator == null) {
         // Optimize simple iterators with collectors that can't skip
-        scoreIterator(collector, acceptDocs, iterator, max);
+        if (scorer instanceof ConstantScoreScorer constantScoreScorer
+            && constantScoreScorer.canBulkCollectDocIdStream()) {
+          scoreIteratorIntoBitSet(collector, acceptDocs, iterator, max);

Review Comment:
   Ran the benchmark at the sparse densities mentioned (Apple M3/aarch64). 
Results (1M docs, no deletions):
   
   ```
   Density   Docs   Baseline (ops/ms)   intoBitSet (ops/ms)   Winner
   0.01%      100       387                 169               Baseline 2.3x
   0.1%     1,000       181                 105               Baseline 1.7x
   0.2%     2,000       108                  89               Baseline 1.2x
   0.3%     3,000        62                  86               intoBitSet 1.4x  
← crossover
   0.5%     5,000        46                  86               intoBitSet 1.9x
   1%      10,000        47                  85               intoBitSet 1.8x
   3%      30,000         8                  84               intoBitSet 10.9x
   ```
   
   With deletions the crossover shifts slightly (between 0.1% and 0.5%) but the 
pattern is the same.
   
   The crossover is around 0.25% density. Below that, per-window overhead 
(clearing + iterating ~244 windows for 1M docs) exceeds the per-doc savings. 
Above that, intoBitSet wins and the gap widens fast.
   
   In absolute terms the regression at extreme sparsity is small. At 0.01% 
density (100 matching docs in 1M), the difference is ~3.3µs per query. At 10M 
docs, ~33µs. Both are insignificant compared to the milliseconds of a typical 
Lucene query or even a single disk seek (50-100µs).
   
   Queries that land on `DefaultBulkScorer` at <0.3% density are also rare in 
practice since filters that selective typically resolve through more 
specialized scorers.
   
   That said, I can add the density gate if you prefer. Something like `cost >= 
(max - min) / 512` would match the observed crossover without being overly 
conservative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize DefaultBulkScorer for ConstantScoreScorer via intoBitSet batching [lucene]

Reply via email to