[ https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429522#comment-17429522 ]
Zach Chen commented on LUCENE-10061: ------------------------------------ Hi [~jpountz], I'm interested in working on this one, but have a question on its potential implementation and would like to get some advices for it. I found https://issues.apache.org/jira/browse/LUCENE-8312 during research for this, and thought the solution should be very similar here (using merged impacts to prune docs that are not competitive), except for maybe how impacts get merged. However, while I understand for SynonymQuery, impacts can be merged effectively by summing term frequencies for each unique norm value as the impacts all come from the same field, I'm not sure how that could be done efficiently in the case of CombinedFieldsQuery. If I understand it correctly, in order to merge impacts from multiple fields for CombinedFieldsQuery, we may need to compute all the possible summation combinations of competitive \{freq, norm} across all fields, and find again the competitive ones among them. So for the case of 4 fields with a list of 4 competitive impacts each during impacts merge, in the worst case we may need to compute 4 * 4 * 4 * 4 = 256 combinations of merged impacts (\{field1FreqA + field2FreqB + field3FreqC + field4FreqD, field1NormA + field2NormB + field3NormC + field4NormD}), and then filter out the ones that are not competitive. This seems to be inefficient. I'm wondering if you may have any suggestion on this, or if using impacts for CombinedFieldsQuery pruning support is the right approach to begin with? > CombinedFieldsQuery needs dynamic pruning support > ------------------------------------------------- > > Key: LUCENE-10061 > URL: https://issues.apache.org/jira/browse/LUCENE-10061 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > > CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, > forcing Lucene to collect all matches in order to figure the top-k hits. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org