[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

2017-07-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087019#comment-16087019
 ] 

Dawid Weiss edited comment on LUCENE-7848 at 7/14/17 8:19 AM:
--

Hi Jim. Thanks for the analysis -- I do understand these two queries should be 
identical, but they have a different match result -- that's why I thought it's 
probably a span query issue rather than the builder's (whether you pull those 
gaps or push them inside the or shouldn't matter).

This time I'm on holidays, but I'll keep looking at LUCENE-7398, perhaps it 
sheds some light on what's going on.


was (Author: dweiss):
Hi Jim. Thanks for the analysis -- I do understand these two queries should be 
identical, but they have a different match result -- that's why I thought it's 
probably a span query issue rather than the builder's (whether you pull those 
gaps or push them inside the or shouldn't matter).

This time I'm on holidays, but I'll keep looking at LUCENE-7389, perhaps it 
sheds some light on what's going on.

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --
>
> Key: LUCENE-7848
> URL: https://issues.apache.org/jira/browse/LUCENE-7848
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 6.5, 6.6
>Reporter: Jim Ferenczi
> Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, 
> LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

2017-07-12 Thread Michael Gibney (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083965#comment-16083965
 ] 

Michael Gibney edited comment on LUCENE-7848 at 7/12/17 5:36 PM:
-

"Could be a bug somewhere in span queries."^ -- I think the remaining problem 
here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at 
which point the "spanOr" is designated a match (or not) of the 
width/positionEnd of the shortest branch. When the branches of a "spanOr" 
differ in length (as they will as a matter of course for uses of GraphFilters 
such as in the above test), the shorter branch is evaluated, but if a longer 
branch is also a match, it affects the offset of subsequent tokens, and the 
enclosing "spanNear" sees a larger-than-expected slop, and fails to match. 

[^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated 
calls to nextStartPosition() which return the same startPosition, but different 
endPositions. The subSpan clauses of the "spanOr" are popped off the 
priorityQueue, retained, and restored upon exhaustion of subSpans (when it's 
time to move on to the next potential match). Some corresponding changes were 
necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and 
conditionally evaluate as many branches of "spanOr" clauses as necessary to 
match (or not) on the full "nearSpan".

There may be other modifications needed in code that can call the modified 
"spanOr" and would need to be aware of its new behavior, but with this patch 
applied, all the tests in the TestWordDelimiterGraphFilter pass (including the 
new testLucene7848()). 

EDIT: original patch had a bug, was re-uploaded a few hours after initially 
posted.


was (Author: mgibney):
"Could be a bug somewhere in span queries."^ -- I think the remaining problem 
here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at 
which point the "spanOr" is designated a match (or not) of the 
width/positionEnd of the shortest branch. When the branches of a "spanOr" 
differ in length (as they will as a matter of course for uses of GraphFilters 
such as in the above test), the shorter branch is evaluated, but if a longer 
branch is also a match, it affects the offset of subsequent tokens, and the 
enclosing "spanNear" sees a larger-than-expected slop, and fails to match. 

[^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated 
calls to nextStartPosition() which return the same startPosition, but different 
endPositions. The subSpan clauses of the "spanOr" are popped off the 
priorityQueue, retained, and restored upon exhaustion of subSpans (when it's 
time to move on to the next potential match). Some corresponding changes were 
necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and 
conditionally evaluate as many branches of "spanOr" clauses as necessary to 
match (or not) on the full "nearSpan".

There may be other modifications needed in code that can call the modified 
"spanOr" and would need to be aware of its new behavior, but with this patch 
applied, all the tests in the TestWordDelimiterGraphFilter pass (including the 
new testLucene7848()). 

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --
>
> Key: LUCENE-7848
> URL: https://issues.apache.org/jira/browse/LUCENE-7848
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 6.5, 6.6
>Reporter: Jim Ferenczi
> Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, 
> LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

2017-06-19 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053875#comment-16053875
 ] 

Jim Ferenczi edited comment on LUCENE-7848 at 6/19/17 12:05 PM:


Hi Dawid,
Sorry I am also on vacations this week but looking at your example it seems 
that it's a problem with graph token in general. If you have side paths with 
different length at indexing time you need to use the flatten graph filter. 
Though it will not be able to index the correct positions for this example 
since "xxx,special" and "xxx", "special" should be indexed as a graph and 
Lucene does not handle graph at indexing time. I wonder why your manual query 
works, I might be missing something but this query should also not work unless 
you used another configuration for the WDGF (preserve original = false for 
instance should work at indexing time) ?


was (Author: jim.ferenczi):
Hi David,
Sorry I am also on vacations this week but looking at your example it seems 
that it's a problem with graph token in general. If you have side paths with 
different length at indexing time you need to use the flatten graph filter. 
Though it will not be able to index the correct positions for this example 
since "xxx,special" and "xxx", "special" should be indexed as a graph and 
Lucene does not handle graph at indexing time. I wonder why your manual query 
works, I might be missing something but this query should also not work unless 
you used another configuration for the WDGF (preserve original = false for 
instance should work at indexing time) ?

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --
>
> Key: LUCENE-7848
> URL: https://issues.apache.org/jira/browse/LUCENE-7848
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 6.5, 6.6
>Reporter: Jim Ferenczi
> Attachments: capture-3.png, LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly

2017-05-23 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021368#comment-16021368
 ] 

Erik Hatcher edited comment on LUCENE-7848 at 5/23/17 3:57 PM:
---

I hit a snag with QueryBuilder#createSpanQuery too, and created (for the 
SOLR-1485 work) org.apache.solr.util.PayloadUtils with a createSpanQuery 
method.   It currently also doesn't take gaps into account (but the basic use 
cases don't involve sophisticated analysis there, so it was intentional to keep 
it initially simple), but I did have to work through some Lucene analysis API 
hurdles that I think QueryBuilder's createSpanQuery should fix along the way 
too.

See my comment and implementation here: 
https://github.com/apache/lucene-solr/blob/5d42177b9290b61c658154e42223408944cd4bc1/solr/core/src/java/org/apache/solr/util/PayloadUtils.java#L106-L128


was (Author: ehatcher):
I hit a snag with QueryBuilder#createSpanQuery too, and created (for the 
SOLR-1485 work) org.apache.solr.util.PayloadUtils with a createSpanQuery 
method.   It currently also doesn't take into account for gaps (but the basic 
use cases don't involve sophisticated analysis there, so it was intentional to 
keep it initially simple), but I did have to work through some Lucene analysis 
API hurdles that I think QueryBuilder's createSpanQuery should fix along the 
way too.

See my comment and implementation here: 
https://github.com/apache/lucene-solr/blob/5d42177b9290b61c658154e42223408944cd4bc1/solr/core/src/java/org/apache/solr/util/PayloadUtils.java#L106-L128

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --
>
> Key: LUCENE-7848
> URL: https://issues.apache.org/jira/browse/LUCENE-7848
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 6.5, 6.6
>Reporter: Jim Ferenczi
>
> Position increments greater than 1 are ignored when the query builder creates 
> a graph phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org