[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407775#comment-15407775
 ] 

Paul Elschot commented on LUCENE-7398:
--------------------------------------

To complete the picture here for the ordered case, shrinkToAfterShortestMatch() 
was replaced by lazy iteration at LUCENE-6537. Some points from there:
- Lazy iteration should return the same document matches, but it will return 
some extra Span hits within each document, so scores might be different.
- Repeated matches from non nested ordered span near occur only when the first 
term repeats and there is enough slop; for query t1 t2 with slop 1:
  t1 t1 t2 matches twice,
  t1 t2 t2 matches once.

Nevertheless, from the gene research example above one can see that the current 
lazy iteration misses a document that used to match.

So, is it possible to change the current implementation so that it matches more 
documents correctly, while still being lazy?
Here lazy means that all subspans are only moved forward, and a test for a 
match is only done after at least one subspans was moved forward.

The current implementation is based on the first subspans moving forward 
followed by a stretchToOrder().
After that, as long as there is no match (i.e. too much slop), we could add 
moving each of the intermediate subspans forward until the order is lost.
(This would be somewhat similar to shrinkToAfterShortestMatch(), but based on 
the actual slop, and not on the length on the match.)

Would that help?
When so, in which order should the intermediate spans be moved forward? 
shrinkToAfterShortestMatch() used to work backwards, but forwards could also be 
done.

> Nested Span Queries are buggy
> -----------------------------
>
>                 Key: LUCENE-7398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5, 6.x
>            Reporter: Christoph Goller
>            Assignee: Alan Woodward
>            Priority: Critical
>         Attachments: LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to