sgup432 opened a new issue, #15981:
URL: https://github.com/apache/lucene/issues/15981

   ### Description
   
   After #15954, SortedNumericDocValuesRangeQuery uses SkipBlockRangeIterator 
as its two-phase approximation. I see that SkipBlockRangeIterator.cost() 
currently returns NO_MORE_DOCS, which means when multiple DV range queries are 
combined in a FILTER conjunction, both DenseConjunctionBulkScorer and 
ConjunctionDISI will sort the clauses in arbitrary order as they all report the 
same cost.
   
   I wonder if we should have a better way to do this. That is, we choose the 
most selective field(the one which can eliminate most docs) as the lead 
iterator, as this will allow us to skip most of the docs. This might help in 
performance depending on the number of fields in range conjunctions, higher the 
no. of fields, better ordering will give better performance.
   
   The DocValuesSkipper already has metadata that could help estimate 
selectivity ie global minValue()/maxValue(), per-block min/max, and docCount(). 
Some ideas to do this:
   
    - Use skipper.docCount() as cost. Though only differentiate b/w sparse vs 
dense fields.
    - Estimate selectivity from queryRange / fieldRange * docCount.  Rough but 
differentiates narrow vs wide queries
    - Count NO blocks by walking the skip tree at scorer creation. This is more 
accurate, O(num_blocks) but requires extra work ahead.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to