Thanks Mikhail & Adrien for the help This is the same principle that we apply for block-max WAND so > theoretically that would work, though in practice it might be a bit > hard to implement due to the fact that we don't have the APIs that you > will need.
Aah, did not know block-max WAND is now in lucene! So what I am proposing looks identical to Bm-WAND.. The heavy-lifting is already done in lucene codebase. Think it should be straight-forward for us to wrap DocValues in a CustomCodec to track block min-max ords. We shall give this a shot anyways & see how it goes Directly index the field into as a term frequency instead of doc > values, e.g. using FeatureField. One downside is that you can only > sort in one order efficiently. > Thanks for suggestion. Sure will try & dabble with FeatureField too! -- Ravi On Tue, Jul 2, 2019 at 6:52 PM Adrien Grand <jpou...@gmail.com> wrote: > Hello, > > This is the same principle that we apply for block-max WAND so > theoretically that would work, though in practice it might be a bit > hard to implement due to the fact that we don't have the APIs that you > will need. > > I have considered the idea of adding information about blocks to doc > values a couple times, but I think it'd be better to either: > - Directly index the field into as a term frequency instead of doc > values, e.g. using FeatureField. One downside is that you can only > sort in one order efficiently. > - Or using LongDistanceFeatureQuery if your field is also indexed > with points, by passing the max value of your index as the "origin" if > you want to sort in decreasing order and the min value if you want to > sort in increasing order. This would be a bit less efficient than > FeatureField but would allow sorting in either ascending or descending > order. > > > > On Tue, Jul 2, 2019 at 3:01 PM Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > > > Our Sort Fields utilize DocValues.. > > > > Lets say I collect min-max ords of a Sort Field for a block of documents > > (128, 256 etc..) at index-time via Codec & store it as part of DocValues > at > > a Segment level.. > > > > During query time, could we take advantage of this Stats when Top-N query > > with Sort Field is requested? > > > > Typically, what I had in mind is a SortStats class with the following > method > > > > int *seek*(int *max-doc-seen-till-now*, int *min-sort-ord-seen-till-now*, > > boolean sortDesc) { > > // 1. Fetch the doc-ranges that has >= > > *min-sort-ord-seen-till-now* > > * // 2. *Return the least doc-range >= *max-doc-seen-till-now *(If > > SortDesc=true) > > * Return the least doc-range <= max-doc-seen-till-now *(If > > SortDesc=false) > > } > > > > Top-N Collector can keep track of the *max-doc-seen-till-now & > > min-sort-ord-seen-till-now *variable during query time & then call the > > *SortStats.seek()* for a possible skip of blocks of documents that may > > otherwise be needlessly offered & popped out from the priority queue > > > > I understand this simplistic logic depends on sort-field data > distribution > > & won't work for multi-sort field queries or out-of-order scoring etc.. > > > > But, in general will this be a good idea to explore or something that is > > best not attempted? > > > > Any help is much appreciated > > > > -- > > Ravi > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >