[PR] Optimize DefaultBulkScorer for ConstantScoreScorer via intoBitSet batching [lucene]

via GitHub Wed, 27 May 2026 14:21:16 -0700


costin opened a new pull request, #16141:
URL: https://github.com/apache/lucene/pull/16141


   `DefaultBulkScorer` iterates doc-by-doc calling `collector.collect(int)` for 
each match. When the scorer is a `ConstantScoreScorer` with no two-phase 
iterator and no competitive iterator, we can instead batch doc IDs into a 
`FixedBitSet` window via `intoBitSet`, apply the live-docs mask in bulk, and 
hand the result as a `DocIdStream` to the collector.
   
   This replaces per-doc virtual dispatch with bulk bitwise operations over a 
4096-doc window. The new path triggers for `ConstantScoreQuery` filters, 
constant-score multi-term queries (`WildcardQuery`, `PrefixQuery`, 
`RegexpQuery`), and filter clauses in boolean queries.
   
   ### Benchmarks
   
   `DefaultBulkScorerBenchmark` (1M docs, varying match density, ±10% deletions)
   
   ```
   JDK 25.0.3+9-LTS (Temurin), JMH 1.37
   JVM args: -Xmx2g -Xms2g -XX:+AlwaysPreTouch
   Warmup: 3 x 5s, Measurement: 5 x 5s, Fork: 2
   Mode: Throughput (ops/ms, higher is better)
   ```
   
   **AMD EPYC 7R32 (c5a.2xlarge)**
   
   | density | deletions | baseline | intoBitSet | speedup |
   |---------|-----------|----------|------------|---------|
   | 1%      | no        | 25.2     | 43.4       | 1.7x    |
   | 1%      | yes       | 16.7     | 27.9       | 1.7x    |
   | 10%     | no        | 2.6      | 43.2       | 17x     |
   | 10%     | yes       | 1.7      | 28.2       | 17x     |
   | 50%     | no        | 0.5      | 61.7       | 115x    |
   | 100%    | no        | 0.2      | 61.9       | 358x    |
   
   **Intel Xeon 8275CL (c5.2xlarge)**
   
   | density | deletions | baseline | intoBitSet | speedup |
   |---------|-----------|----------|------------|---------|
   | 1%      | no        | 25.2     | 41.2       | 1.6x    |
   | 1%      | yes       | 20.2     | 25.9       | 1.3x    |
   | 10%     | no        | 2.5      | 41.2       | 16x     |
   | 10%     | yes       | 2.2      | 25.9       | 12x     |
   | 50%     | no        | 0.5      | 57.9       | 106x    |
   | 100%    | no        | 0.2      | 58.0       | 259x    |
   
   The baseline degrades linearly with density (one virtual call per matching 
doc). The intoBitSet path stays flat (cost scales with window size, not match 
count). Real filter queries typically match 1-10% of docs where the gain is 
1.3-17x. The 100x+ ratios at high density apply to broad filters (`exists`, 
match-all, wide ranges).
   
   This microbenchmark isolates scoring+collection with a counting collector. 
End-to-end query improvement will be smaller since collection is one component 
alongside posting decoding and result materialization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Optimize DefaultBulkScorer for ConstantScoreScorer via intoBitSet batching [lucene]

Reply via email to