gashutos opened a new issue, #13313:
URL: https://github.com/apache/lucene/issues/13313

   ### Description
   
   ### Background
   Lucene sort queries are using skipping logic for faster execution and skip 
non-competitive documents by updating its competitive iterator whenever it 
updates its bottom value in priority queue. 
[Reference](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java#L205C1-L206C1)
   This works for fields indexed with BKD. 
   In case of `after` value, `after` value is considerd as topValue we try to 
execute skipping logic with both `topValue` and `afterValue`.  The skipping 
logic has a 
[constraint](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java#L274),
 and with that, the estimated number of competitive documents must reduce to 
`1/8` to intersect BKD skipping logic. That can be problamatic in like 
described below.
   
   ### Problem
   With #12333 change, we started using `topValue` to skip documents in case 
`topValue` is able to skip `7/8` number of documents. But what if we only able 
to skip `6/8` number of documents with `topvalue` and those all non-competitive 
documents `6/8` are ahead of `after` document in docsIdIterator ?
   This problem specifically comes in `timeseries` workload.
   i.e. we have time series segment where documents ids and document values are 
in nearly sorted order and lets assume we have `100` documents and its time 
field values are from 1,2,3,....100. 
   Now if I trigger sort query `sort` on this field with `after` values as 
`87`, we wont able to skip first 86 documents and they will end up being in 
comparison here in 
[PagingFieldCollector](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java#L274).
   Assume this happening for millions of documents in case of time series 
workload.
   
   ### Proposed solution
   Lets invoke skipping logic in case `topValue` is known but `bottomvalue` is 
unknown ir-respective of number of docuements we are able to skip. That will be 
one time invocation of skipping logic in case `after` value is specific and we 
will skip all documents which are non-competitive w.r.t after value. And from 
next iteration, we will know about `bottomValue` once priority queue is full 
and can have constraint of 7/8 documents skipping.....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to