span query matches too many docs when two query terms are the same unless
inOrder=true
--------------------------------------------------------------------------------------
Key: LUCENE-3120
URL: https://issues.apache.org/jira/browse/LUCENE-3120
Project: Lucene - Java
Issue Type: Bug
Components: core/search
Reporter: Doron Cohen
Priority: Minor
Fix For: 3.2, 4.0
spinoff of user list discussion - [SpanNearQuery - inOrder
parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
With 3 documents:
* "a b x c d"
* "a b b d"
* "a b x b y d"
Here are a few queries (the number in parenthesis indicates expected #hits):
These ones work *as expected*:
* (1) in-order, slop=0, "b", "x", "b"
* (1) in-order, slop=0, "b", "b"
* (2) in-order, slop=1, "b", "b"
These ones match *too many* hits:
* (1) any-order, slop=0, "b", "x", "b"
* (1) any-order, slop=1, "b", "x", "b"
* (1) any-order, slop=2, "b", "x", "b"
* (1) any-order, slop=3, "b", "x", "b"
These ones match *too many* hits as well:
* (1) any-order, slop=0, "b", "b"
* (2) any-order, slop=1, "b", "b"
Each of the above passes when using a phrase query (applying the slop, no
in-order indication in phrase query).
This seems related to a known overlapping spans issue - [non-overlapping Span
queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss,
so we might decide to close this bug after all, but I would like to at least
have the junit that exposes the behavior in JIRA.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]