I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these ideas.
On Tue, Oct 26, 2021 at 7:52 PM Robert Muir <[email protected]> wrote: > On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand <[email protected]> wrote: > > > > > And then we could make an IndexOrDocValuesQuery with both the > TermInSetQuery and this SDV.newSlowInSetQuery? > > > > Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" > query can evaluate its cost (ScorerSupplier#cost) without doing anything > costly, which isn't the case for TermInSetQuery. > > > > So we'd need to make some changes. Estimating the cost of a > TermInSetQuery in general without seeking the terms is a hard problem, but > maybe we could specialize the unique key case to return the number of terms > as the cost? > > Yes we know each term in terms dict only has a single document, when > terms.size() == terms.getSumDocFreq(): there's only one posting for > each term. > But we can probably generalize a cost estimation a bit more, just > based on these two stats? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Adrien
