Re: Processing query clause combinations at indexing time

Adrien Grand Tue, 15 Dec 2020 08:20:11 -0800

I like this idea. I can think of several users who have a priori knowledge
of frequently used filters and would appreciate having Lucene take care of
transparently optimizing the execution of such filters instead of having to
do it manually.


I'm not sure a separate project is the best option, it makes it more
challenging to keep up-to-date with releases, more challenging for users to
find it, etc. I'd rather add this feature to the Lucene repository, as a
new module or as part of an existing module?


On Tue, Dec 15, 2020 at 4:41 PM Michael Sokolov <msoko...@gmail.com> wrote:

> I feel like there could be some considerable overlap with features
> provided by Luwak, which was contributed to Lucene fairly recently,
> and I think does the query inversion work required for this; maybe
> more of it already exists here? I don't know if that module handles
> the query rewriting, or the term indexing you're talking about though.
>
> On Mon, Dec 14, 2020 at 11:25 PM Atri Sharma <a...@apache.org> wrote:
> >
> > +1
> >
> > I would suggest that this be an independent project hosted on Github
> (there have been similar projects in the past that have seen success that
> way)
> >
> > On Tue, 15 Dec 2020, 09:37 David Smiley, <dsmi...@apache.org> wrote:
> >>
> >> Great optimization!
> >>
> >> I'm dubious on it being a good contribution to Lucene itself however,
> because what you propose fits cleanly above Lucene.  Even at a ES/Solr
> layer (which I know you don't use, but hypothetically speaking), I'm
> dubious there as well.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <msf...@gmail.com> wrote:
> >>>
> >>> My team at work has a neat feature that we've built on top of Lucene
> that has provided a substantial (20%+) increase in maximum qps and some
> reduction in query latency.
> >>>
> >>> Basically, we run a training process that looks at historical queries
> to find frequently co-occurring combinations of required clauses, say "+A
> +B +C +D". Then at indexing time, if a document satisfies one of these
> known combinations, we add a new term to the doc, like "opto:ABCD". At
> query time, we can then replace the required clauses with a single
> TermQuery for the "optimized" term.
> >>>
> >>> It adds a little bit of extra work at indexing time and requires the
> offline training step, but we've found that it yields a significant boost
> at query time.
> >>>
> >>> We're interested in open-sourcing this feature. Is it something worth
> adding to Lucene? Since it doesn't require any core changes, maybe as a
> module?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Adrien

Re: Processing query clause combinations at indexing time

Reply via email to