[
https://issues.apache.org/jira/browse/LUCY-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marvin Humphrey updated LUCY-180:
---------------------------------
Attachment: LUCY-180.patch
Here is an improved patch, which keeps the optimizations whenever possible.
* When an ORQuery only has one clause, the child matcher will be handed
down unwrapped.
* However, when an ORQuery has multiple clauses but only one clause
matches in a segment, an ORMatcher will be handed down to keep scoring
consistent across all segments.
* A RequiredOptionalQuery will always produce either a
RequiredOptionalMatcher or NULL. It will no longer return the unwrapped
required child matcher when the optional clause cannot match.
* A missing coord multiplier in RequiredOptionalMatcher has been fixed.
* ORMatcher and RequiredOptionalMatcher have been hardened so that they
can accept NULL child matchers.
> ORQuery, ANDQuery, RequiredOptionalQuery optimizations affect scoring
> ---------------------------------------------------------------------
>
> Key: LUCY-180
> URL: https://issues.apache.org/jira/browse/LUCY-180
> Project: Lucy
> Issue Type: Bug
> Affects Versions: 0.1.0 (incubating), 0.2.0 (incubating), 0.2.1
> (incubating)
> Reporter: Marvin Humphrey
> Assignee: Marvin Humphrey
> Fix For: 0.2.2 (incubating), 0.3.0 (incubating)
>
> Attachments: LUCY-180-minimal.patch, LUCY-180.patch
>
>
> ORQuery, ANDQuery, and RequiredOptionalQuery all have optimizations which kick
> in when only one child Query can match: they all compile down to the inner
> Matcher.
> In the case of ORQuery and RequiredOptionalQuery, this optimization can kick
> in per-segment, resulting in an ORMatcher/RequiredOptionalMatcher for some
> segments and e.g. a child TermMatcher for others. This skews scoring because
> coord() affects the ORMatcher/RequiredOptionalMatcher, but not the TermMatcher
> -- the ORMatcher/RequiredOptionalMatcher damps the score of the matching term
> by a coord() multiplier which is typically less than 1.0, but the TermMatcher
> contributes 100% of its score. The punchline is that two documents in
> different segments which present identical match criteria can produce
> different scores, depending on whether terms not present in the document are
> represented in the segment.
> In addition, ORQuery may compile down to a smaller ORMatcher when
> e.g. 3 out of 5 OR'd terms are present. This skews scoring for similar
> reasons.
> To present consistent scoring across all segments, Queries should always
> compile down to the same Matcher node structore for each segment. By the time
> you are compiling per-segment Matchers, it is too late to re-calculate the
> weighting, so you can't optimize the Matcher structure when you find that e.g.
> one of two terms doesn't exist in a given segment.
> -In addition, when compiling down to a single child Matcher, ORQuery,
> ANDQuery-
> -and RequiredOptionalQuery all discard custom boosts. This is solvable by-
> -moving the optimization from Compiler_Make_Matcher() up into-
> -Query_Make_Compiler().-
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira