[
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436292#comment-17436292
]
Robert Muir commented on LUCENE-10207:
--------------------------------------
Hmm why would the cost ever be getDocCount() [number of docs]. I guess my
question is, when estimating "work", shouldn't it be based on sumDocFreq
(number of postings) rather than docCount (number of docs)?
> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -----------------------------------------------------
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-10207_multitermquery.patch
>
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery.
> However IndexOrDocValuesQuery only works well if the "index" query can give
> an estimation of the cost of the query without doing anything expensive (like
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could
> implement it for primary keys (terms.size() == sumDocFreq) by returning the
> number of terms of the query? Another idea is to multiply the number of terms
> by the average postings length, though this could be dangerous if the field
> has a zipfian distribution and some terms have a much higher doc frequency
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is
> interest in this. Here is an email thread where this was recently discussed:
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]