[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

Zach Chen (Jira) Fri, 15 Oct 2021 21:50:08 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429522#comment-17429522
 ]


Zach Chen commented on LUCENE-10061:
------------------------------------

Hi [~jpountz], I'm interested in working on this one, but have a question on 
its potential implementation and would like to get some advices for it.

I found https://issues.apache.org/jira/browse/LUCENE-8312 during research for 
this, and thought the solution should be very similar here (using merged 
impacts to prune docs that are not competitive), except for maybe how impacts 
get merged. However, while I understand for SynonymQuery, impacts can be merged 
effectively by summing term frequencies for each unique norm value as the 
impacts all come from the same field, I'm not sure how that could be done 
efficiently in the case of CombinedFieldsQuery. If I understand it correctly, 
in order to merge impacts from multiple fields for CombinedFieldsQuery, we may 
need to compute all the possible summation combinations of competitive \{freq, 
norm} across all fields, and find again the competitive ones among them. So for 
the case of 4 fields with a list of 4 competitive impacts each during impacts 
merge, in the worst case we may need to compute 4 * 4 * 4 * 4 = 256 
combinations of merged impacts (\{field1FreqA + field2FreqB + field3FreqC + 
field4FreqD, field1NormA + field2NormB + field3NormC + field4NormD}), and then 
filter out the ones that are not competitive. This seems to be inefficient.

I'm wondering if you may have any suggestion on this, or if using impacts for 
CombinedFieldsQuery pruning support is the right approach to begin with?

> CombinedFieldsQuery needs dynamic pruning support
> -------------------------------------------------
>
>                 Key: LUCENE-10061
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10061
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, 
> forcing Lucene to collect all matches in order to figure the top-k hits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

Reply via email to