[PR] Optimize disjunction docIDRunEnd [lucene]

via GitHub Thu, 28 May 2026 04:22:38 -0700


costin opened a new pull request, #16142:
URL: https://github.com/apache/lucene/pull/16142


   `ReqExclBulkScorer` uses the prohibited scorer's `docIDRunEnd()` to skip 
runs of excluded documents. For multi-clause `MUST_NOT` queries this often goes 
through `DisjunctionDISIApproximation`, whose `docIDRunEnd()` previously called 
`topList()` and could linearly scan `otherIterators` on every excluded doc.
   
   This PR limits `docIDRunEnd()` to heap-led top clauses that are already 
positioned on the current doc. This may return shorter runs when a 
linear-scanned clause overlaps, but it preserves correctness and avoids the 
per-call scan.
   
   `MustNotBooleanQueryBenchmark` builds 10M docs with dense interleaved 
`MUST_NOT` term postings and periodic pass-through docs, then runs a 
`MatchAllDocsQuery` filter with multiple prohibited `TermQuery` clauses.
   
   JDK 25.0.3+9-LTS, JMH 1.37, Apple M3 Pro (arm64). Baseline = old `topList()` 
implementation. This PR = heap-led `docIDRunEnd()` implementation.
   
   | MUST_NOT terms | Baseline (ops/s) | This PR (ops/s) | Change |
   | --- | ---: | ---: | ---: |
   | 3 | 8.856 | 9.820 | +10.9% |
   | 7 | 2.385 | 3.084 | +29.3% |
   | 11 | 1.477 | 2.116 | +43.3% |
   
   Run with:
   
   ```bash
   java --add-modules jdk.incubator.vector -Xmx4g -Xms4g -XX:+AlwaysPreTouch \
     --module-path lucene/benchmark-jmh/build/benchmarks \
     --module org.apache.lucene.benchmark.jmh MustNotBooleanQueryBenchmark \
     -f 1 -wf 0 -wi 2 -i 3 -w 2s -r 2s -p docCount=10000000
   ```
   
   Closes #16049
   
   Developed with AI-assisted tooling


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Optimize disjunction docIDRunEnd [lucene]

Reply via email to