iprithv opened a new pull request, #15971: URL: https://github.com/apache/lucene/pull/15971
## Description In `MaxScoreBulkScorer.scoreInnerWindowMultipleEssentialClauses()`, the `cardinality()` call was used solely to pre-size the `docAndScoreAccBuffer` before extracting matches from the bitset via `forEach()`. This resulted in two full passes over the bitset's 64 longs (for `INNER_WINDOW_SIZE=4096`): one for counting, one for extraction. This change replaces `growNoCopy(windowMatches.cardinality(0, innerWindowSize))` with `growNoCopy(INNER_WINDOW_SIZE)`, eliminating the counting pass entirely. The buffer is reused across inner windows, so the one-time over-allocation (~48KB for `int[] + double[]`) is negligible. ## Benchmark Results JMH benchmark on JDK 25, Apple M-series (higher is better): ``` Benchmark (matchCount) Mode Cnt Score Units oldCardinalityForEach (before) 50 thrpt 3 6.809 ops/us newForEachNoCardinality (after) 50 thrpt 3 7.686 ops/us → +12.9% faster oldCardinalityForEach (before) 128 thrpt 3 3.044 ops/us newForEachNoCardinality (after) 128 thrpt 3 3.170 ops/us → +4.1% faster oldCardinalityForEach (before) 500 thrpt 3 0.466 ops/us newForEachNoCardinality (after) 500 thrpt 3 0.502 ops/us → +7.7% faster oldCardinalityForEach (before) 1000 thrpt 3 0.242 ops/us newForEachNoCardinality (after) 1000 thrpt 3 0.234 ops/us → ~same ``` **5-13% improvement** across typical match densities (50-500 docs per window), which is the common range for multi-term BooleanQuery workloads. ## Context This method is on the hot path for multi-clause BooleanQuery scoring: `IndexSearcher.search()` → `MaxScoreBulkScorer.score()` → `scoreInnerWindowMultipleEssentialClauses()` It is invoked for every 4096-doc inner window when a query has 2+ essential clauses. The `cardinality()` call was iterating all 64 longs of the bitset purely to determine a buffer size — work that can be avoided by pre-allocating to the maximum possible size. An `intoArray()`-based approach was also evaluated but proved slower for sparse windows (10-128 matches) due to scanning empty words. The `forEach()` approach with pre-allocation is the best strategy across all densities. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
