costin commented on PR #16146:
URL: https://github.com/apache/lucene/pull/16146#issuecomment-4571899340
@iprithv You're right, thanks for catching this! The direct-forwarding
default on `FilterLeafCollector` was too optimistic: it preserves bulk
collection performance for pure delegators, but it silently bypasses subclasses
that only override `collect(int)`. That is a correctness regression relative to
the old `LeafCollector` defaults that I overlooked this the tests passed fully.
I've added a test to guard against this regression and changed the base
default to preserve the existing contract of subclassing instead of
shortcutting.
There is one performance-sensitive exception: `ConstantScoreQuery` wraps
collectors only to replace the scorer passed via `setScorer`; it does not need
per-doc interception. For that case, I added explicit forwarding of
`collect(DocIdStream)` and `collectRange` to the wrapped collector so we don’t
accidentally erase bulk collection through a pure scorer wrapper.
Reran the benchmarks which still shows the intended benefit for cached dense
filters (same setup)
#### Platform: c5a.2xlarge, AVX2, JDK 25, 1M docs
#### Cached filter conjunction
| lead | filter | baseline (ops/s) | ungated (ops/s) | ungated vs base |
gated (ops/s) | gated vs base |
|-------|--------|------------------:|----------------:|----------------:|--------------:|--------------:|
| 0.001 | 0.01 | 61,048 | 59,206 | 0.97x |
60,644 | 0.99x |
| 0.001 | 0.10 | 113,306 | 59,908 | **0.53x** |
114,313 | 1.01x |
| 0.001 | 0.50 | 104,766 | 34,116 | **0.33x** |
102,661 | 0.98x |
| 0.001 | 1.0 | 106,201 | 30,854 | **0.29x** |
106,188 | 1.00x |
| 0.03 | 0.01 | 3,824 | 3,806 | 1.00x |
3,889 | 1.02x |
| 0.03 | 0.10 | 4,401 | 9,932 | **2.26x** |
9,264 | **2.10x** |
| 0.03 | 0.50 | 2,752 | 8,352 | **3.03x** |
8,589 | **3.12x** |
| 0.03 | 1.0 | 4,498 | 10,612 | **2.36x** |
9,965 | **2.22x** |
| 0.10 | 0.01 | 2,599 | 2,667 | 1.03x |
2,509 | 0.97x |
| 0.10 | 0.10 | 1,289 | 3,821 | **2.96x** |
3,792 | **2.94x** |
| 0.10 | 0.50 | 813 | 3,881 | **4.78x** |
3,992 | **4.91x** |
| 0.10 | 1.0 | 1,323 | 3,836 | **2.90x** |
3,930 | **2.97x** |
#### Uncached filter conjunction
| lead | filter | baseline (ops/s) | ungated (ops/s) | ungated vs base |
gated (ops/s) | gated vs base |
|-------|--------|------------------:|----------------:|----------------:|--------------:|--------------:|
| 0.001 | 0.01 | 41,216 | 40,454 | 0.98x |
39,666 | 0.96x |
| 0.001 | 0.10 | 8,476 | 7,197 | **0.85x** |
8,076 | 0.95x |
| 0.001 | 0.50 | 24,548 | 16,146 | **0.66x** |
22,616 | 0.92x |
| 0.001 | 1.0 | 23,345 | 18,069 | **0.77x** |
23,613 | 1.01x |
| 0.03 | 0.01 | 5,195 | 5,553 | 1.07x |
5,145 | 0.99x |
| 0.03 | 0.10 | 1,597 | 1,473 | 0.92x |
1,584 | 0.99x |
| 0.03 | 0.50 | 2,054 | 2,123 | 1.03x |
2,053 | 1.00x |
| 0.03 | 1.0 | 3,161 | 2,637 | **0.83x** |
3,254 | 1.03x |
| 0.10 | 0.01 | 3,223 | 3,306 | 1.03x |
3,362 | 1.04x |
| 0.10 | 0.10 | 647 | 645 | 1.00x |
651 | 1.01x |
| 0.10 | 0.50 | 700 | 727 | 1.04x |
683 | 0.98x |
| 0.10 | 1.0 | 1,163 | 906 | **0.78x** |
1,168 | 1.00x |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]