Thanks Mikhail & Adrien for the help

This is the same principle that we apply for block-max WAND so
> theoretically that would work, though in practice it might be a bit
> hard to implement due to the fact that we don't have the APIs that you
> will need.

Aah, did not know block-max WAND is now in lucene! So what I am proposing
looks identical to Bm-WAND..

The heavy-lifting is already done in lucene codebase. Think it should be
straight-forward for us to wrap DocValues in a CustomCodec to track block
min-max ords. We shall give this a shot anyways & see how it goes

Directly index the field into as a term frequency instead of doc
> values, e.g. using FeatureField. One downside is that you can only
> sort in one order efficiently.

Thanks for suggestion. Sure will try & dabble with FeatureField too!


On Tue, Jul 2, 2019 at 6:52 PM Adrien Grand <> wrote:

> Hello,
> This is the same principle that we apply for block-max WAND so
> theoretically that would work, though in practice it might be a bit
> hard to implement due to the fact that we don't have the APIs that you
> will need.
> I have considered the idea of adding information about blocks to doc
> values a couple times, but I think it'd be better to either:
>  - Directly index the field into as a term frequency instead of doc
> values, e.g. using FeatureField. One downside is that you can only
> sort in one order efficiently.
>  - Or using LongDistanceFeatureQuery if your field is also indexed
> with points, by passing the max value of your index as the "origin" if
> you want to sort in decreasing order and the min value if you want to
> sort in increasing order. This would be a bit less efficient than
> FeatureField but would allow sorting in either ascending or descending
> order.
> On Tue, Jul 2, 2019 at 3:01 PM Ravikumar Govindarajan
> <> wrote:
> >
> > Our Sort Fields utilize DocValues..
> >
> > Lets say I collect min-max ords of a Sort Field for a block of documents
> > (128, 256 etc..) at index-time via Codec & store it as part of DocValues
> at
> > a Segment level..
> >
> > During query time, could we take advantage of this Stats when Top-N query
> > with Sort Field is requested?
> >
> > Typically, what I had in mind is a SortStats class with the following
> method
> >
> > int *seek*(int *max-doc-seen-till-now*, int *min-sort-ord-seen-till-now*,
> > boolean sortDesc) {
> >   // 1. Fetch the doc-ranges that has >=
> > *min-sort-ord-seen-till-now*
> > *  // 2. *Return the least doc-range >= *max-doc-seen-till-now *(If
> > SortDesc=true)
> > *         Return the least doc-range <= max-doc-seen-till-now *(If
> > SortDesc=false)
> > }
> >
> > Top-N Collector can keep track of the *max-doc-seen-till-now &
> > min-sort-ord-seen-till-now *variable during query time & then call the
> > ** for a possible skip of blocks of documents that may
> > otherwise be needlessly offered & popped out from the priority queue
> >
> > I understand this simplistic logic depends on sort-field data
> distribution
> > & won't work for multi-sort field queries or out-of-order scoring etc..
> >
> > But, in general will this be a good idea to explore or something that is
> > best not attempted?
> >
> > Any help is much appreciated
> >
> > --
> > Ravi
> --
> Adrien
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Reply via email to