I wonder if it can be done in a fairly clean way. This sounds similar
to using a ShingleFilter to do this optimization, but adding some
conditionals so that the index is smaller? Now that we have
ConditionalTokenFilter (for branching), can the feature be implemented
cleanly?

Ideally it wouldn't require a lot of new code, something like checking
a "set" + conditionaltokenfilter + shinglefilter?

On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <[email protected]> wrote:
>
> My team at work has a neat feature that we've built on top of Lucene that has 
> provided a substantial (20%+) increase in maximum qps and some reduction in 
> query latency.
>
> Basically, we run a training process that looks at historical queries to find 
> frequently co-occurring combinations of required clauses, say "+A +B +C +D". 
> Then at indexing time, if a document satisfies one of these known 
> combinations, we add a new term to the doc, like "opto:ABCD". At query time, 
> we can then replace the required clauses with a single TermQuery for the 
> "optimized" term.
>
> It adds a little bit of extra work at indexing time and requires the offline 
> training step, but we've found that it yields a significant boost at query 
> time.
>
> We're interested in open-sourcing this feature. Is it something worth adding 
> to Lucene? Since it doesn't require any core changes, maybe as a module?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to