[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: LUCENE-7848-delimOnly-token-offset.patch I think the remaining problem is that WordDelimiterGraphFilter is swallowing delim-only tokens and leaving a gap even when PRESERVE_ORIGINAL is true. [^LUCENE-7848-delimOnly-token-offset.patch] fixes this (and addresses the problematic gaps). > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848-delimOnly-token-offset.patch, LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: (was: LUCENE-7848-branching-spanOr.patch) > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: LUCENE-7848-branching-spanOr.patch > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: (was: LUCENE-7848-branching-spanOr.patch) > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: LUCENE-7848-branching-spanOr.patch sorry, updated patch > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney updated LUCENE-7848: --- Attachment: LUCENE-7848-branching-spanOr.patch "Could be a bug somewhere in span queries."^ -- I think the remaining problem here is that only one branch (the shortest) of a SpanOrQuery is evaluated, at which point the "spanOr" is designated a match (or not) of the width/positionEnd of the shortest branch. When the branches of a "spanOr" differ in length (as they will as a matter of course for uses of GraphFilters such as in the above test), the shorter branch is evaluated, but if a longer branch is also a match, it affects the offset of subsequent tokens, and the enclosing "spanNear" sees a larger-than-expected slop, and fails to match. [^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated calls to nextStartPosition() which return the same startPosition, but different endPositions. The subSpan clauses of the "spanOr" are popped off the priorityQueue, retained, and restored upon exhaustion of subSpans (when it's time to move on to the next potential match). Some corresponding changes were necessary to make NearSpansOrdered aware of the new "spanOr" behavior, and conditionally evaluate as many branches of "spanOr" clauses as necessary to match (or not) on the full "nearSpan". There may be other modifications needed in code that can call the modified "spanOr" and would need to be aware of its new behavior, but with this patch applied, all the tests in the TestWordDelimiterGraphFilter pass (including the new testLucene7848()). > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, > LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-7848: Attachment: capture-3.png Token graph for the input (indexing and search is the same). > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: capture-3.png, LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-7848: Attachment: LUCENE-7848.patch Here's a test (testLucene7848) that reproduces the behavior observed in Solr. To me this should work (right)? I didn't take a look at token streams emitted vs. the query yet -- have to switch context now, but it'd be a good start to figure out what's happening. > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: LUCENE-7848.patch, LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
[ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7848: - Attachment: LUCENE-7848.patch Here is a simple patch that support gaps in QueryBuilder#createSpanQuery and QueryBuilder#analyzeGraphPhrase. QueryBuilder#createSpanQuery could also handle zero increment but that's probably another issue. > QueryBuilder.analyzeGraphPhrase does not handle gaps correctly > -- > > Key: LUCENE-7848 > URL: https://issues.apache.org/jira/browse/LUCENE-7848 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 6.5, 6.6 >Reporter: Jim Ferenczi > Attachments: LUCENE-7848.patch > > > Position increments greater than 1 are ignored when the query builder creates > a graph phrase query. > Instead it should use SpanNearQuery.addGap for pos incr > 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org