Re: Slow DV equivalent of TermInSetQuery

Adrien Grand Tue, 26 Oct 2021 12:52:36 -0700

I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these
ideas.


On Tue, Oct 26, 2021 at 7:52 PM Robert Muir <[email protected]> wrote:

> On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand <[email protected]> wrote:
> >
> > > And then we could make an IndexOrDocValuesQuery with both the
> TermInSetQuery and this SDV.newSlowInSetQuery?
> >
> > Unfortunately IndexOrDocValuesQuery relies on the fact that the "index"
> query can evaluate its cost (ScorerSupplier#cost) without doing anything
> costly, which isn't the case for TermInSetQuery.
> >
> > So we'd need to make some changes. Estimating the cost of a
> TermInSetQuery in general without seeking the terms is a hard problem, but
> maybe we could specialize the unique key case to return the number of terms
> as the cost?
>
> Yes we know each term in terms dict only has a single document, when
> terms.size() == terms.getSumDocFreq(): there's only one posting for
> each term.
> But we can probably generalize a cost estimation a bit more, just
> based on these two stats?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
Adrien

Re: Slow DV equivalent of TermInSetQuery

Reply via email to