costin opened a new pull request, #16172:
URL: https://github.com/apache/lucene/pull/16172
When a sorted or sorted-set doc-values field has a skip index, route range
queries through a new BatchDocValuesOrdinalRangeIterator that implements
intoBitSet() for bulk evaluation.
The iterator processes skip blocks in bulk:
- YES blocks: bitSet.set(start, end): entire range at once
- YES_IF_PRESENT blocks: only checks doc existence (no ordinal comparison)
- MAYBE blocks: checks ordinals inline
This replaces the per-doc TwoPhaseIterator (approximation + confirmation)
path when a skip index is available.
### Benchmark
AMD EPYC c5a.2xlarge, JDK 25, SortedSetDocValuesField.newSlowRangeQuery,
~25% selectivity (256 of 1024 ordinals match):
| docCount | valuesPerDoc | baseline (ops/s) | candidate (ops/s) | speedup
|
|----------|-------------|------------------|-------------------|---------|
| 100K | 1 | 1,210.8 ± 5.0 | 4,071.9 ± 11.6 | **3.36x** |
| 100K | 2 | 465.1 ± 1.0 | 558.7 ± 1.9 | **1.20x** |
| 1M | 1 | 122.1 ± 0.6 | 297.7 ± 4.1 | **2.44x** |
| 1M | 2 | 46.8 ± 0.3 | 55.8 ± 0.6 | **1.19x** |
Single-valued fields see the largest gain (2.4–3.4x) because the singleton
unwrap avoids multi-valued ordinal iteration in YES_IF_PRESENT/MAYBE blocks.
Multi-valued fields still
benefit (1.2x) from the YES block bulk-set optimization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]