[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

Robert Muir (Jira) Thu, 28 Oct 2021 02:43:09 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435290#comment-17435290
 ]


Robert Muir commented on LUCENE-10207:
--------------------------------------

Well it also depends on what case we optimize for. If the set is large 
(thousands), I can't see the current thousands-of-brute-force-seekExacts being 
faster than just a few seekCeil, ever. IMO that's probably the case we should 
optimize for with this query, as thats how people tend to use it (doing stupid 
joins). You can probably accelerate this more by index-sorting on the ID 
field...

Anyway, if we care about this stuff, we need a benchmark. I can't really go any 
further with it because I don't want to make decisions based intuition on JIRA 
like this, it is too frustrating, and iMO no good excuse for duplicating a 
bunch of multitermquery's code.

> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -----------------------------------------------------
>
>                 Key: LUCENE-10207
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10207
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-10207_multitermquery.patch
>
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

Reply via email to