romseygeek commented on code in PR #16177:
URL: https://github.com/apache/lucene/pull/16177#discussion_r3346669926
##########
lucene/core/src/java/org/apache/lucene/search/DocValuesRangeIterator.java:
##########
@@ -287,6 +315,41 @@ public int docIDRunEnd() throws IOException {
return blockIterator.docIDRunEnd();
}
+ @Override
+ public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws
IOException {
+ if (numericValues == null) {
+ // Generic doc values: confirm each candidate one doc at a time (the
TwoPhaseIterator
+ // default). Only single-valued numeric fields can be range-evaluated
block at a time.
+ super.intoBitSet(upTo, bitSet, offset);
+ return;
+ }
+ while (blockIterator.docID() < upTo) {
+ int blockStart = blockIterator.docID();
+ SkipBlockRangeIterator.Match match = blockIterator.getMatch();
+ // For MAYBE blocks docIDRunEnd() is conservative (doc+1), so use the
full block boundary to
+ // evaluate the whole block at once.
+ int blockEnd =
+ match == SkipBlockRangeIterator.Match.MAYBE
+ ? Math.min(upTo, blockIterator.blockEnd())
+ : Math.min(upTo, blockIterator.docIDRunEnd());
+ switch (match) {
+ case YES -> bitSet.set(blockStart - offset, blockEnd - offset);
+ case YES_IF_PRESENT -> {
+ // All present values are in range, but the field is sparse: only
a presence check.
+ for (int d = blockStart; d < blockEnd; d++) {
Review Comment:
Can we use numericValues.intoBitSet() here?
##########
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/PhraseScorerBenchmark.java:
##########
@@ -105,6 +114,56 @@ public void tearDown() throws IOException {
dir.close();
}
+ // A constant-score conjunction with a phrase FILTER clause routes through
+ // DenseConjunctionBulkScorer
+ // (the phrase is a two-phase clause whose approximation matches ~50% but
whose phrase matches
+ // ~0.1%), so this exercises the unified two-phase bit-set/survivor path.
MatchAllDocsQuery forces
+ // a
+ // 2-clause conjunction so it isn't rewritten to the bare phrase.
Review Comment:
I'm not sure this is correct, MatchAllDocsQuery filters get removed at
rewrite time
##########
lucene/CHANGES.txt:
##########
@@ -141,6 +141,11 @@ Improvements
Optimizations
---------------------
+* GITHUB#16177: Doc-values range queries are now always two-phase iterators.
Bulk evaluation is
Review Comment:
Let's put this in 10.5?
Also I think BatchDocValuesRangeIterator hasn't actually been released, so
maybe combine that CHANGES entry here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]