Hi Froh-

Thanks for raising this and sorry I missed your tag in GH#13201 back in
June (had some vacation and was generally away). I'd be interested to see
what others think as well, but I'll at least commit to looking through your
PR tomorrow or Monday to get a better handle on what's being proposed. We
went through a few iterations of this originally before we landed on the
current version. One promising approach was to have a more intelligent
query that would load some number of terms up-front to get a better cost
estimate before making a decision, but it required a custom query
implementation that generally didn't get favorable feedback (it's nice to
be able to use the existing IndexOrDocValuesQuery abstraction instead). I
can dig up some of that conversation if it's helpful, but I'll better
understand what you've got in mind first.

Unwinding a bit though, I'm also in favor in general that we should be able
to do a better job estimating cost here. I think the tricky part is how we
go about doing that effectively. Thanks again for kicking off this thread!

Cheers,
-Greg

On Thu, Aug 1, 2024 at 5:58 PM Michael Froh <msf...@gmail.com> wrote:

> Hi there,
>
> For a few months, some of us have been running into issues with the cost
> estimate from AbstractMultiTermQueryConstantScoreWrapper. (
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L300
> )
>
> In https://github.com/apache/lucene/issues/13029, the problem was raised
> in terms of queries not being cached, because the estimated cost was too
> high.
>
> We've also run into problems in OpenSearch, since we started wrapping
> MultiTermQueries in IndexOrDocValueQuery. The MTQ gets an exaggerated cost
> estimate, so IndexOrDocValueQuery decides it should be a DV query, even
> though the MTQ would really only match a handful of docs (and should be
> lead iterator).
>
> I opened a PR back in March (https://github.com/apache/lucene/pull/13201)
> to try to handle the case where a MultiTermQuery matches a small number of
> terms. Since Mayya pulled the rewrite logic that expands up to 16 terms (to
> rewrite as a Boolean disjunction) earlier in the workflow (in
> https://github.com/apache/lucene/pull/13454), we get the better cost
> estimate for MTQs on few terms "for free".
>
> What do folks think?
>
> Thanks,
> Froh
>

Reply via email to