[ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083965#comment-16083965
 ] 

Michael Gibney edited comment on LUCENE-7848 at 7/12/17 5:36 PM:
-----------------------------------------------------------------

"Could be a bug somewhere in span queries."^ -- I think the remaining problem 
here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at 
which point the "spanOr" is designated a match (or not) of the 
width/positionEnd of the shortest branch. When the branches of a "spanOr" 
differ in length (as they will as a matter of course for uses of GraphFilters 
such as in the above test), the shorter branch is evaluated, but if a longer 
branch is also a match, it affects the offset of subsequent tokens, and the 
enclosing "spanNear" sees a larger-than-expected slop, and fails to match. 

[^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated 
calls to nextStartPosition() which return the same startPosition, but different 
endPositions. The subSpan clauses of the "spanOr" are popped off the 
priorityQueue, retained, and restored upon exhaustion of subSpans (when it's 
time to move on to the next potential match). Some corresponding changes were 
necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and 
conditionally evaluate as many branches of "spanOr" clauses as necessary to 
match (or not) on the full "nearSpan".

There may be other modifications needed in code that can call the modified 
"spanOr" and would need to be aware of its new behavior, but with this patch 
applied, all the tests in the TestWordDelimiterGraphFilter pass (including the 
new testLucene7848()). 

EDIT: original patch had a bug, was re-uploaded a few hours after initially 
posted.


was (Author: mgibney):
"Could be a bug somewhere in span queries."^ -- I think the remaining problem 
here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at 
which point the "spanOr" is designated a match (or not) of the 
width/positionEnd of the shortest branch. When the branches of a "spanOr" 
differ in length (as they will as a matter of course for uses of GraphFilters 
such as in the above test), the shorter branch is evaluated, but if a longer 
branch is also a match, it affects the offset of subsequent tokens, and the 
enclosing "spanNear" sees a larger-than-expected slop, and fails to match. 

[^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated 
calls to nextStartPosition() which return the same startPosition, but different 
endPositions. The subSpan clauses of the "spanOr" are popped off the 
priorityQueue, retained, and restored upon exhaustion of subSpans (when it's 
time to move on to the next potential match). Some corresponding changes were 
necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and 
conditionally evaluate as many branches of "spanOr" clauses as necessary to 
match (or not) on the full "nearSpan".

There may be other modifications needed in code that can call the modified 
"spanOr" and would need to be aware of its new behavior, but with this patch 
applied, all the tests in the TestWordDelimiterGraphFilter pass (including the 
new testLucene7848()). 

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --------------------------------------------------------------
>
>                 Key: LUCENE-7848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7848
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.5, 6.6
>            Reporter: Jim Ferenczi
>         Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, 
> LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to