My team at work has a neat feature that we've built on top of Lucene that
has provided a substantial (20%+) increase in maximum qps and some
reduction in query latency.

Basically, we run a training process that looks at historical queries to
find frequently co-occurring combinations of required clauses, say "+A +B
+C +D". Then at indexing time, if a document satisfies one of these known
combinations, we add a new term to the doc, like "opto:ABCD". At query
time, we can then replace the required clauses with a single TermQuery for
the "optimized" term.

It adds a little bit of extra work at indexing time and requires the
offline training step, but we've found that it yields a significant boost
at query time.

We're interested in open-sourcing this feature. Is it something worth
adding to Lucene? Since it doesn't require any core changes, maybe as a
module?

Reply via email to