See also commongrams which is a very similar concept:
https://github.com/apache/lucene-solr/tree/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/commongrams

On Tue, Dec 15, 2020 at 12:08 PM Robert Muir <[email protected]> wrote:
>
> I wonder if it can be done in a fairly clean way. This sounds similar
> to using a ShingleFilter to do this optimization, but adding some
> conditionals so that the index is smaller? Now that we have
> ConditionalTokenFilter (for branching), can the feature be implemented
> cleanly?
>
> Ideally it wouldn't require a lot of new code, something like checking
> a "set" + conditionaltokenfilter + shinglefilter?
>
> On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <[email protected]> wrote:
> >
> > My team at work has a neat feature that we've built on top of Lucene that 
> > has provided a substantial (20%+) increase in maximum qps and some 
> > reduction in query latency.
> >
> > Basically, we run a training process that looks at historical queries to 
> > find frequently co-occurring combinations of required clauses, say "+A +B 
> > +C +D". Then at indexing time, if a document satisfies one of these known 
> > combinations, we add a new term to the doc, like "opto:ABCD". At query 
> > time, we can then replace the required clauses with a single TermQuery for 
> > the "optimized" term.
> >
> > It adds a little bit of extra work at indexing time and requires the 
> > offline training step, but we've found that it yields a significant boost 
> > at query time.
> >
> > We're interested in open-sourcing this feature. Is it something worth 
> > adding to Lucene? Since it doesn't require any core changes, maybe as a 
> > module?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to