Re: Processing query clause combinations at indexing time

Atri Sharma Tue, 15 Dec 2020 08:23:40 -0800

In that case, I would be interested to know if this can be merged into
Luwak.


On Tue, 15 Dec 2020, 21:50 Adrien Grand, <jpou...@gmail.com> wrote:

> I like this idea. I can think of several users who have a priori knowledge
> of frequently used filters and would appreciate having Lucene take care of
> transparently optimizing the execution of such filters instead of having to
> do it manually.
>
> I'm not sure a separate project is the best option, it makes it more
> challenging to keep up-to-date with releases, more challenging for users to
> find it, etc. I'd rather add this feature to the Lucene repository, as a
> new module or as part of an existing module?
>
>
> On Tue, Dec 15, 2020 at 4:41 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
>> I feel like there could be some considerable overlap with features
>> provided by Luwak, which was contributed to Lucene fairly recently,
>> and I think does the query inversion work required for this; maybe
>> more of it already exists here? I don't know if that module handles
>> the query rewriting, or the term indexing you're talking about though.
>>
>> On Mon, Dec 14, 2020 at 11:25 PM Atri Sharma <a...@apache.org> wrote:
>> >
>> > +1
>> >
>> > I would suggest that this be an independent project hosted on Github
>> (there have been similar projects in the past that have seen success that
>> way)
>> >
>> > On Tue, 15 Dec 2020, 09:37 David Smiley, <dsmi...@apache.org> wrote:
>> >>
>> >> Great optimization!
>> >>
>> >> I'm dubious on it being a good contribution to Lucene itself however,
>> because what you propose fits cleanly above Lucene.  Even at a ES/Solr
>> layer (which I know you don't use, but hypothetically speaking), I'm
>> dubious there as well.
>> >>
>> >> ~ David Smiley
>> >> Apache Lucene/Solr Search Developer
>> >> http://www.linkedin.com/in/davidwsmiley
>> >>
>> >>
>> >> On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <msf...@gmail.com> wrote:
>> >>>
>> >>> My team at work has a neat feature that we've built on top of Lucene
>> that has provided a substantial (20%+) increase in maximum qps and some
>> reduction in query latency.
>> >>>
>> >>> Basically, we run a training process that looks at historical queries
>> to find frequently co-occurring combinations of required clauses, say "+A
>> +B +C +D". Then at indexing time, if a document satisfies one of these
>> known combinations, we add a new term to the doc, like "opto:ABCD". At
>> query time, we can then replace the required clauses with a single
>> TermQuery for the "optimized" term.
>> >>>
>> >>> It adds a little bit of extra work at indexing time and requires the
>> offline training step, but we've found that it yields a significant boost
>> at query time.
>> >>>
>> >>> We're interested in open-sourcing this feature. Is it something worth
>> adding to Lucene? Since it doesn't require any core changes, maybe as a
>> module?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Adrien
>

Re: Processing query clause combinations at indexing time

Reply via email to