Re: Efficient sort on SortedDocValues

Adrien Grand Mon, 07 Nov 2022 02:42:32 -0800

Hi Andrei,

The case that you are describing got optimized in Lucene 9.4.0 in the case
when your field is also indexed with a StringField:
https://github.com/apache/lucene/pull/1023. See annotation ER at
http://people.apache.org/~mikemccand/lucenebench/TermMonthSort.html.


The way it works is that Lucene will automatically leverage the inverted
index in order to only look at documents that compare better than the
current k-th document in the priority queue.

To make it work with your test case, you will need to:
 - index a StringField with the same name and same value
 - change values to be less random if possible, since this optimization
works better on low-cardinality fields than on high-cardinality fields





On Mon, Nov 7, 2022 at 9:45 AM Mikhail Khludnev <[email protected]> wrote:

> Hello, Andrei.
> Docs are scored in-order (see Weight.scoreAll(), scoreRange()), just
> because underneath postings API is in-order. There are a few
> shortcuts/optimizations, but they only omit some iterations/segments like
> checking competitive scores and so one.
>
> On Sun, Nov 6, 2022 at 1:35 AM Solodin, Andrei (TR Technology)
> <[email protected]> wrote:
>
> > One more thing. While the test case passes now, it still iterates in
> index
> > order. Which means that it still collects ~6.4K docs out of 10k matches.
> > This is an improvement, but I am still wondering why it's not possible to
> > iterate in the field older. Seems like that would provide substantial
> > improvement.
> >
> > From: Solodin, Andrei (TR Technology)
> > Sent: Saturday, November 5, 2022 5:18 PM
> > To: [email protected]
> > Subject: RE: Efficient sort on SortedDocValues
> >
> > I just realized that the problem is that the field needs to be indexed as
> > well. Now it works. But I noticed that this only works in Lucene 9. Does
> > not work in Lucene 8 (specifically 8.11.2). This must be new
> functionality
> > in Lucene 9?
> >
> > Thanks
> >
> >
> > From: Solodin, Andrei (TR Technology)
> > Sent: Saturday, November 5, 2022 1:07 PM
> > To: [email protected]<mailto:[email protected]>
> > Subject: Efficient sort on SortedDocValues
> >
> > Hello Lucene community, while looking into how to efficiently sort on a
> > field value, I came across a couple of things that I don't quite
> > understand. My assumption was that if I execute a search and sort on a
> > SortedDocValues field, lucene would only iterate over the docs in the
> order
> > of the field values or at least collect only competitive docs (docs that
> > made it into the topN queue). Neither of those things seems to be
> > happening. Instead, the iteration is happening in index order and all
> > matched docs are collected. Looking at the code, I see that the
> > optimizations are only possible if the index is sorted in the field order
> > to begin with, which is not possible for our use case. We may have dozens
> > of such fields in our index, thus there isn't any one field that can be
> > used to sort the index. So I guess my question if what I am trying to
> > achieve is possible? I tried to look though Solr codebase, but so far
> > couldn't come up with anything. Code example is here
> > https://pastebin.com/i05E2wZy  . I am using 9.4.1. Thanks in advance.
> >
> > Andrei
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Adrien

Re: Efficient sort on SortedDocValues

Reply via email to