costin opened a new pull request, #16146:
URL: https://github.com/apache/lucene/pull/16146

   This builds on #16141 by allowing `DefaultBulkScorer` to use the 
`DocIdStream` path for dense no-score conjunctions backed by 
`BitSetConjunctionDISI`.
   The useful target is cached-filter conjunctions, especially when the cached 
filter is dense enough for bulk masking to amortize the 4096-doc window cost. 
Those cases show the clearest wins, up to `2.38x` in this short run.
   
   The uncached comparison is _included to explain the gate_: this is not a 
general no-score conjunction optimization. 
   Without a cached FixedBitSet, the conjunction can't use intoBitSet / 
andRange and falls back to per-doc iteration through postings. 
   The gating condition (canBulkIntoBitSet) correctly detects this and doesn't 
activate.Uncached results are mostly neutral/noisy, so the implementation only 
enables this path for `BitSetConjunctionDISI` and keeps sparse/non-bitset 
conjunctions on the per-doc path.
   
   JMH run:
   
   `CachedFilterConjunctionBenchmark -p docCount=1000000 -p 
leadSelectivity=0.001,0.03,0.10 -p filterSelectivity=0.01,0.10,0.50,1.0 -p 
variant=baseline,gated -wi 1 -i 2 -w 1s -r 1s -f 1`
   
   Mode: throughput, units: ops/s, higher is better.
   
   #### Platform: c5a.2xlarge, AVX2, JDK 25, 1M docs
   
   #### Cached filter conjunction
   
     | lead  | filter | baseline (ops/s) | gated (ops/s) | speedup |
     |-------|--------|------------------:|--------------:|--------:|
     | 0.001 | 0.01   |           60,997 |        59,139 | 0.97x   |
     | 0.001 | 0.10   |          110,607 |       114,116 | 1.03x   |
     | 0.001 | 0.50   |          105,088 |       104,365 | 0.99x   |
     | 0.001 | 1.0    |          105,648 |       101,335 | 0.96x   |
     | 0.03  | 0.01   |            3,887 |         3,892 | 1.00x   |
     | 0.03  | 0.10   |            4,765 |        10,416 | **2.19x**   |
     | 0.03  | 0.50   |            2,749 |         8,355 | **3.04x**   |
     | 0.03  | 1.0    |            4,472 |        10,865 | **2.43x**   |
     | 0.10  | 0.01   |            2,526 |         2,557 | 1.01x   |
     | 0.10  | 0.10   |            1,274 |         3,797 | **2.98x**   |
     | 0.10  | 0.50   |              799 |         3,954 | **4.95x**   |
     | 0.10  | 1.0    |            1,299 |         3,968 | **3.05x**   |
   
   #### Uncached filter conjunction
   
     | lead  | filter | baseline (ops/s) | gated (ops/s) | speedup |
     |-------|--------|------------------:|--------------:|--------:|
     | 0.001 | 0.01   |           40,074 |        40,700 | 1.02x   |
     | 0.001 | 0.10   |            8,224 |         9,053 | 1.10x   |
     | 0.001 | 0.50   |           25,071 |        22,442 | 0.90x   |
     | 0.001 | 1.0    |           17,426 |        16,511 | 0.95x   |
     | 0.03  | 0.01   |            5,157 |         5,099 | 0.99x   |
     | 0.03  | 0.10   |            1,547 |         1,530 | 0.99x   |
     | 0.03  | 0.50   |            2,048 |         2,013 | 0.98x   |
     | 0.03  | 1.0    |            3,188 |         3,231 | 1.01x   |
     | 0.10  | 0.01   |            3,315 |         3,248 | 0.98x   |
     | 0.10  | 0.10   |              651 |           657 | 1.01x   |
     | 0.10  | 0.50   |              701 |           684 | 0.98x   |
     | 0.10  | 1.0    |            1,173 |         1,181 | 1.01x   |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to