[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484910#comment-15484910
 ] 

Paul Elschot commented on LUCENE-7398:
--------------------------------------

As to missing matches due to lazy iteration, I'd prefer to add an option to 
allow choice between current behaviour, the above patch (because I think it is 
slightly better than previous 4.10 behaviour), one that misses no matches, and 
perhaps more.
For example, would anyone like a SpanWindowQuery that only uses span start 
positions? That would at least allow an easy complete implementation.
And we need to document the current ordered - no overlap, and non ordered - 
overlap behaviour.

To improve scoring consistency, we could start by requiring that span near 
queries score the same as phrases.
There is a problem for nested span queries in that current similarities have a 
tf component over a complete document field, and this tf does not play well 
with the sloppy frequency for SpanNear over SpanOr. I'd like each term 
occurrence of a SpanTerm to contribute the same (idf like) weight to a 
SpanNear, but that can currently not be done because the spans of a SpanOr does 
not have a weight. So when mixing terms with SpanOr it will be hard to get the 
same scoring as a boolean Or over PhraseQueries. I don't know how to resolve 
this, we may have to add something to the similarities for this.
SpanBoostQuery would only make sense when the individual Spans occrurences can 
carry a weight.
I'd prefer span scoring consistency to have its own jira issue(s).



> Nested Span Queries are buggy
> -----------------------------
>
>                 Key: LUCENE-7398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5, 6.x
>            Reporter: Christoph Goller
>            Assignee: Alan Woodward
>            Priority: Critical
>         Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to