[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-10-04 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398.patch

Patch of 4 Oct 2016.

This is the patch of 25 Sep 2016, but without the UNORDERED_STARTPOS case.

In a nutshell this:
- adds ORDERED_LOOKAHEAD, 
- is backward compatible,
- tries to document the limitations of the matching methods for SpanNearQuery.


> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-09-25 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160925.patch

Patch of 25 Sep 2016.
Compared to the previous patch, this removes the ORDERED_STARTPOS case, because 
I don't know whether that is needed.
Also this restores backward compatibility.

Compared to master, this has:
Four MatchNear methods, two are the current ones, they are called ORDERED_LAZY 
and UNORDERED_LAZY, and these are used when the current builder and 
constructors use a boolean ordered argument.

The third case is ORDERED_LOOKAHEAD, which is from the patch of 18 August.

The last case is UNORDERED_STARTPOS, which is simpler than UNORDERED_LAZY, 
hopefully a little faster, and with better completeness of the result.

Javadocs for all four cases have been added.

All test cases from here have been added, and where necessary they have been 
modified to use ORDERED_LOOKAHEAD and to not do span collection. These tests 
pass.

For the last case, UNORDERED_STARTPOS, no test cases have been added yet. This 
is still to be done. Does anyone have more difficult cases?

Minor point: the collect() method was moved to the superclass ConjunctionSpans.

Feedback welcome, especially on the javadocs of SpanNearQuery.MatchNear.

Instead of adding backtracking methods, it might be better to do counting of 
input spans in a matching window. I'm hoping that the UNORDERED_STARTPOS case 
can be extended for that. Any ideas there?

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-09-24 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160924.patch

Patch of 24 Sep 2016, work in progress.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398.patch, LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-08-14 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160814.patch

Patch of 14 August 2016.
This uses the span position queue from master, and adds the above w1..w5 test 
case with slop 1.

This reintroduces a shrink in shrinkToDecreaseSlop:
{code}
  /** The subSpans are ordered in the same doc and matchSlop is too big.
   * Try and decrease the slop by calling nextStartPosition() on all subSpans 
except the last one in reverse order.
   * Return true iff an ordered match was found with small enough slop.
   */
  private boolean shrinkToDecreaseSlop() throws IOException {
  ...
  }
{code}

This also adds a FIXME to collect():
{code}
  /** FIXME: the subspans may be after the current match. */
{code}
Payload collection can still be added, I left that to be done.

It only fails the test case TestSpanCollection.testNestedNearQuery reporting 
that a term was not collected properly, the other search tests pass.



> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-08-02 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7398:
--
Attachment: LUCENE-7398.patch

Patch against master.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-08-02 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7398:
--
Attachment: LUCENE-7398.patch

Patch with fix, incorporating Christoph's test - will commit tomorrow.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-07-29 Thread Christoph Goller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Goller updated LUCENE-7398:
-
Attachment: TestSpanCollection.java

Please find attatched an extended TestSpanCollection.java that shows the 
problem.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Priority: Critical
> Attachments: TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org