sgup432 opened a new issue, #15981:
URL: https://github.com/apache/lucene/issues/15981
### Description
After #15954, SortedNumericDocValuesRangeQuery uses SkipBlockRangeIterator
as its two-phase approximation. I see that SkipBlockRangeIterator.cost()
currently returns NO_MORE_DOCS, which means when multiple DV range queries are
combined in a FILTER conjunction, both DenseConjunctionBulkScorer and
ConjunctionDISI will sort the clauses in arbitrary order as they all report the
same cost.
I wonder if we should have a better way to do this. That is, we choose the
most selective field(the one which can eliminate most docs) as the lead
iterator, as this will allow us to skip most of the docs. This might help in
performance depending on the number of fields in range conjunctions, higher the
no. of fields, better ordering will give better performance.
The DocValuesSkipper already has metadata that could help estimate
selectivity ie global minValue()/maxValue(), per-block min/max, and docCount().
Some ideas to do this:
- Use skipper.docCount() as cost. Though only differentiate b/w sparse vs
dense fields.
- Estimate selectivity from queryRange / fieldRange * docCount. Rough but
differentiates narrow vs wide queries
- Count NO blocks by walking the skip tree at scorer creation. This is more
accurate, O(num_blocks) but requires extra work ahead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]