See also commongrams which is a very similar concept: https://github.com/apache/lucene-solr/tree/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/commongrams
On Tue, Dec 15, 2020 at 12:08 PM Robert Muir <[email protected]> wrote: > > I wonder if it can be done in a fairly clean way. This sounds similar > to using a ShingleFilter to do this optimization, but adding some > conditionals so that the index is smaller? Now that we have > ConditionalTokenFilter (for branching), can the feature be implemented > cleanly? > > Ideally it wouldn't require a lot of new code, something like checking > a "set" + conditionaltokenfilter + shinglefilter? > > On Mon, Dec 14, 2020 at 2:37 PM Michael Froh <[email protected]> wrote: > > > > My team at work has a neat feature that we've built on top of Lucene that > > has provided a substantial (20%+) increase in maximum qps and some > > reduction in query latency. > > > > Basically, we run a training process that looks at historical queries to > > find frequently co-occurring combinations of required clauses, say "+A +B > > +C +D". Then at indexing time, if a document satisfies one of these known > > combinations, we add a new term to the doc, like "opto:ABCD". At query > > time, we can then replace the required clauses with a single TermQuery for > > the "optimized" term. > > > > It adds a little bit of extra work at indexing time and requires the > > offline training step, but we've found that it yields a significant boost > > at query time. > > > > We're interested in open-sourcing this feature. Is it something worth > > adding to Lucene? Since it doesn't require any core changes, maybe as a > > module? --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
