[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087019#comment-16087019 ] Dawid Weiss edited comment on LUCENE-7848 at 7/14/17 8:19 AM: -- Hi Jim. Thanks for the analysis -- I do understand these two queries should be identical, but they have a different match result -- that's why I thought it's probably a span query issue rather than the builder's (whether you pull those gaps or push them inside the or shouldn't matter). This time I'm on holidays, but I'll keep looking at LUCENE-7398, perhaps it sheds some light on what's going on. was (Author: dweiss): Hi Jim. Thanks for the analysis -- I do understand these two queries should be identical, but they have a different match result -- that's why I thought it's probably a span query issue rather than the builder's (whether you pull those gaps or push them inside the or shouldn't matter). This time I'm on holidays, but I'll keep looking at LUCENE-7389, perhaps it sheds some light on what's going on. > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083965#comment-16083965 ] Michael Gibney edited comment on LUCENE-7848 at 7/12/17 5:36 PM: - "Could be a bug somewhere in span queries."^ -- I think the remaining problem here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at which point the "spanOr" is designated a match (or not) of the width/positionEnd of the shortest branch. When the branches of a "spanOr" differ in length (as they will as a matter of course for uses of GraphFilters such as in the above test), the shorter branch is evaluated, but if a longer branch is also a match, it affects the offset of subsequent tokens, and the enclosing "spanNear" sees a larger-than-expected slop, and fails to match. [^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated calls to nextStartPosition() which return the same startPosition, but different endPositions. The subSpan clauses of the "spanOr" are popped off the priorityQueue, retained, and restored upon exhaustion of subSpans (when it's time to move on to the next potential match). Some corresponding changes were necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and conditionally evaluate as many branches of "spanOr" clauses as necessary to match (or not) on the full "nearSpan". There may be other modifications needed in code that can call the modified "spanOr" and would need to be aware of its new behavior, but with this patch applied, all the tests in the TestWordDelimiterGraphFilter pass (including the new testLucene7848()). EDIT: original patch had a bug, was re-uploaded a few hours after initially posted. was (Author: mgibney): "Could be a bug somewhere in span queries."^ -- I think the remaining problem here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at which point the "spanOr" is designated a match (or not) of the width/positionEnd of the shortest branch. When the branches of a "spanOr" differ in length (as they will as a matter of course for uses of GraphFilters such as in the above test), the shorter branch is evaluated, but if a longer branch is also a match, it affects the offset of subsequent tokens, and the enclosing "spanNear" sees a larger-than-expected slop, and fails to match. [^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated calls to nextStartPosition() which return the same startPosition, but different endPositions. The subSpan clauses of the "spanOr" are popped off the priorityQueue, retained, and restored upon exhaustion of subSpans (when it's time to move on to the next potential match). Some corresponding changes were necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and conditionally evaluate as many branches of "spanOr" clauses as necessary to match (or not) on the full "nearSpan". There may be other modifications needed in code that can call the modified "spanOr" and would need to be aware of its new behavior, but with this patch applied, all the tests in the TestWordDelimiterGraphFilter pass (including the new testLucene7848()). > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053875#comment-16053875 ] Jim Ferenczi edited comment on LUCENE-7848 at 6/19/17 12:05 PM: Hi Dawid, Sorry I am also on vacations this week but looking at your example it seems that it's a problem with graph token in general. If you have side paths with different length at indexing time you need to use the flatten graph filter. Though it will not be able to index the correct positions for this example since "xxx,special" and "xxx", "special" should be indexed as a graph and Lucene does not handle graph at indexing time. I wonder why your manual query works, I might be missing something but this query should also not work unless you used another configuration for the WDGF (preserve original = false for instance should work at indexing time) ? was (Author: jim.ferenczi): Hi David, Sorry I am also on vacations this week but looking at your example it seems that it's a problem with graph token in general. If you have side paths with different length at indexing time you need to use the flatten graph filter. Though it will not be able to index the correct positions for this example since "xxx,special" and "xxx", "special" should be indexed as a graph and Lucene does not handle graph at indexing time. I wonder why your manual query works, I might be missing something but this query should also not work unless you used another configuration for the WDGF (preserve original = false for instance should work at indexing time) ? > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021368#comment-16021368 ] Erik Hatcher edited comment on LUCENE-7848 at 5/23/17 3:57 PM: --- I hit a snag with QueryBuilder#createSpanQuery too, and created (for the SOLR-1485 work) org.apache.solr.util.PayloadUtils with a createSpanQuery method. It currently also doesn't take gaps into account (but the basic use cases don't involve sophisticated analysis there, so it was intentional to keep it initially simple), but I did have to work through some Lucene analysis API hurdles that I think QueryBuilder's createSpanQuery should fix along the way too. See my comment and implementation here: https://github.com/apache/lucene-solr/blob/5d42177b9290b61c658154e42223408944cd4bc1/solr/core/src/java/org/apache/solr/util/PayloadUtils.java#L106-L128 was (Author: ehatcher): I hit a snag with QueryBuilder#createSpanQuery too, and created (for the SOLR-1485 work) org.apache.solr.util.PayloadUtils with a createSpanQuery method. It currently also doesn't take into account for gaps (but the basic use cases don't involve sophisticated analysis there, so it was intentional to keep it initially simple), but I did have to work through some Lucene analysis API hurdles that I think QueryBuilder's createSpanQuery should fix along the way too. See my comment and implementation here: https://github.com/apache/lucene-solr/blob/5d42177b9290b61c658154e42223408944cd4bc1/solr/core/src/java/org/apache/solr/util/PayloadUtils.java#L106-L128 > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org