[ https://issues.apache.org/jira/browse/LUCENE-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485557#comment-17485557 ]
ASF subversion and git services commented on LUCENE-10236: ---------------------------------------------------------- Commit a17d2ebcd5af4f3d51e0265370931f9ad397dd81 in lucene's branch refs/heads/branch_9x from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a17d2eb ] LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (9.1.0 Backporting) (#588) > CombinedFieldsQuery to use fieldAndWeights.values() when constructing > MultiNormsLeafSimScorer for scoring > --------------------------------------------------------------------------------------------------------- > > Key: LUCENE-10236 > URL: https://issues.apache.org/jira/browse/LUCENE-10236 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox > Reporter: Zach Chen > Assignee: Zach Chen > Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > This is a spin-off issue from discussion in > [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a > quick fix in CombinedFieldsQuery scoring. > Currently CombinedFieldsQuery would use a constructed > [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421] > object to create a MultiNormsLeafSimScorer for scoring, but the fields > object may contain duplicated field-weight pairs as it is [built from looping > over > fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414], > resulting into duplicated norms being added during scoring calculation in > MultiNormsLeafSimScorer. > E.g. for CombinedFieldsQuery with two fields and two values matching a > particular doc: > {code:java} > CombinedFieldQuery query = > new CombinedFieldQuery.Builder() > .addField("field1", (float) 1.0) > .addField("field2", (float) 1.0) > .addTerm(new BytesRef("foo")) > .addTerm(new BytesRef("zoo")) > .build(); {code} > I would imagine the scoring to be based on the following: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) > but the current logic would use the following for scoring: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + > norm(field2) > > In addition, this differs from how MultiNormsLeafSimScorer is constructed > from CombinedFieldsQuery explain function, which [uses > fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389] > and does not contain duplicated field-weight pairs. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org