[ https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465301#comment-15465301 ]
Christoph Goller edited comment on LUCENE-7398 at 9/5/16 4:07 PM: ------------------------------------------------------------------ Paul's 20160814 patch almost convinced me. Unfortunately, it does not fix the case when an intermediate span has a longer match that reduces overall sloppyness but overlaps with a match of a subsequent span and consequently requires advancing the subsequent span. Here is an example Document: w1 w2 w3 w4 w5 near/0(w1, or(w2, near/0(w2, w3, w4)), or(w5, near/0(w4, w5))) Add the following code to the end of TestSpanCollection.testNestedNearQuery() {code} SpanNearQuery q234 = new SpanNearQuery(new SpanQuery[]{q2, q3, q4}, 0, true); SpanOrQuery q2234 = new SpanOrQuery(q2, q234); SpanTermQuery p5 = new SpanTermQuery(new Term(FIELD, "w5")); SpanNearQuery q45 = new SpanNearQuery(new SpanQuery[]{q4, p5}, 0, true); SpanOrQuery q455 = new SpanOrQuery(q45, p5); SpanNearQuery q1q2234q445 = new SpanNearQuery(new SpanQuery[]{q1, q2234, q455}, 0, true); spans = q1q2234q445.createWeight(searcher, false, 1f).getSpans(searcher.getIndexReader().leaves().get(0),SpanWeight.Postings.POSITIONS); assertEquals(0, spans.advance(0)); {code} I think we can only fix it if we get give up lazy iteration. I don't think this is so bad for performance. If we implement a clever caching for positions in spans a complete backtracking would only consist of making a few additional int-comparisons. The expensive operation is iterating over all span positions (IO) and we do this already in advancePosition(Spans, int), aren't we. was (Author: gol...@detego-software.de): Paul's fix almost convinced me. Unfortunately, it does not fix the case when an intermediate span has a longer match that reduces overall sloppyness but overlaps with a match of a subsequent span and consequently requires advancing the subsequent span. Here is an example Document: w1 w2 w3 w4 w5 near/0(w1, or(w2, near/0(w2, w3, w4)), or(w5, near/0(w4, w5))) Add the following code to the end of TestSpanCollection.testNestedNearQuery() {code} SpanNearQuery q234 = new SpanNearQuery(new SpanQuery[]{q2, q3, q4}, 0, true); SpanOrQuery q2234 = new SpanOrQuery(q2, q234); SpanTermQuery p5 = new SpanTermQuery(new Term(FIELD, "w5")); SpanNearQuery q45 = new SpanNearQuery(new SpanQuery[]{q4, p5}, 0, true); SpanOrQuery q455 = new SpanOrQuery(q45, p5); SpanNearQuery q1q2234q445 = new SpanNearQuery(new SpanQuery[]{q1, q2234, q455}, 0, true); spans = q1q2234q445.createWeight(searcher, false, 1f).getSpans(searcher.getIndexReader().leaves().get(0),SpanWeight.Postings.POSITIONS); assertEquals(0, spans.advance(0)); {code} I think we can only fix it if we get give up lazy iteration. I don't think this is so bad for performance. If we implement a clever caching for positions in spans a complete backtracking would only consist of making a few additional int-comparisons. The expensive operation is iterating over all span positions (IO) and we do this already in advancePosition(Spans, int), aren't we. > Nested Span Queries are buggy > ----------------------------- > > Key: LUCENE-7398 > URL: https://issues.apache.org/jira/browse/LUCENE-7398 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 5.5, 6.x > Reporter: Christoph Goller > Assignee: Alan Woodward > Priority: Critical > Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch, > LUCENE-7398.patch, TestSpanCollection.java > > > Example for a nested SpanQuery that is not working: > Document: Human Genome Organization , HUGO , is trying to coordinate gene > mapping research worldwide. > Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], > 0, true), body:gene]), body:research], 0, true) > The query should match "coordinate gene mapping research" as well as > "coordinate gene research". It does not match "coordinate gene mapping > research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It > probably stopped working with the changes on SpanQueries in 5.3. I will > attach a unit test that shows the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org