[ https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484910#comment-15484910 ]
Paul Elschot commented on LUCENE-7398: -------------------------------------- As to missing matches due to lazy iteration, I'd prefer to add an option to allow choice between current behaviour, the above patch (because I think it is slightly better than previous 4.10 behaviour), one that misses no matches, and perhaps more. For example, would anyone like a SpanWindowQuery that only uses span start positions? That would at least allow an easy complete implementation. And we need to document the current ordered - no overlap, and non ordered - overlap behaviour. To improve scoring consistency, we could start by requiring that span near queries score the same as phrases. There is a problem for nested span queries in that current similarities have a tf component over a complete document field, and this tf does not play well with the sloppy frequency for SpanNear over SpanOr. I'd like each term occurrence of a SpanTerm to contribute the same (idf like) weight to a SpanNear, but that can currently not be done because the spans of a SpanOr does not have a weight. So when mixing terms with SpanOr it will be hard to get the same scoring as a boolean Or over PhraseQueries. I don't know how to resolve this, we may have to add something to the similarities for this. SpanBoostQuery would only make sense when the individual Spans occrurences can carry a weight. I'd prefer span scoring consistency to have its own jira issue(s). > Nested Span Queries are buggy > ----------------------------- > > Key: LUCENE-7398 > URL: https://issues.apache.org/jira/browse/LUCENE-7398 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 5.5, 6.x > Reporter: Christoph Goller > Assignee: Alan Woodward > Priority: Critical > Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch, > LUCENE-7398.patch, TestSpanCollection.java > > > Example for a nested SpanQuery that is not working: > Document: Human Genome Organization , HUGO , is trying to coordinate gene > mapping research worldwide. > Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], > 0, true), body:gene]), body:research], 0, true) > The query should match "coordinate gene mapping research" as well as > "coordinate gene research". It does not match "coordinate gene mapping > research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It > probably stopped working with the changes on SpanQueries in 5.3. I will > attach a unit test that shows the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org