[I] [RFC] Improve skipping logic for after values in sort query [lucene]

via GitHub Wed, 17 Apr 2024 00:40:51 -0700


gashutos opened a new issue, #13313:
URL: https://github.com/apache/lucene/issues/13313

### Description

### Background
Lucene sort queries are using skipping logic for faster execution and skip
non-competitive documents by updating its competitive iterator whenever it
updates its bottom value in priority queue.
[Reference](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java#L205C1-L206C1)
This works for fields indexed with BKD.
In case of `after` value, `after` value is considerd as topValue we try to
execute skipping logic with both `topValue` and `afterValue`. The skipping
logic has a
[constraint](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java#L274),
and with that, the estimated number of competitive documents must reduce to
`1/8` to intersect BKD skipping logic. That can be problamatic in like
described below.

### Problem
With #12333 change, we started using `topValue` to skip documents in case
`topValue` is able to skip `7/8` number of documents. But what if we only able
to skip `6/8` number of documents with `topvalue` and those all non-competitive
documents `6/8` are ahead of `after` document in docsIdIterator ?
This problem specifically comes in `timeseries` workload.
i.e. we have time series segment where documents ids and document values are
in nearly sorted order and lets assume we have `100` documents and its time
field values are from 1,2,3,....100.
Now if I trigger sort query `sort` on this field with `after` values as
`87`, we wont able to skip first 86 documents and they will end up being in
comparison here in
[PagingFieldCollector](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java#L274).
Assume this happening for millions of documents in case of time series
workload.

### Proposed solution
Lets invoke skipping logic in case `topValue` is known but `bottomvalue` is
unknown ir-respective of number of docuements we are able to skip. That will be
one time invocation of skipping logic in case `after` value is specific and we
will skip all documents which are non-competitive w.r.t after value. And from
next iteration, we will know about `bottomValue` once priority queue is full
and can have constraint of 7/8 documents skipping.....

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] [RFC] Improve skipping logic for after values in sort query [lucene]

Reply via email to