[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-06-15 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371-5x.patch

Smaller patch, after LUCENE-6466 and LUCENE-6537

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6371-5x.patch, LUCENE-6371-5x.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-06-10 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371-5x.patch

Here's a patch for 5x.  What with reversions and overlapping commits, it turns 
out the easiest thing to do was to merge the patches for LUCENE-6466, 
LUCENE-6537 and LUCENE-6371 into one.

All tests are passing, but I want to beast this against Java 7 for a bit to 
check that LUCENE-6490 is fixed.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6371-5x.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-06-10 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Updated patch, following on from LUCENE-6537.  I'd like to commit this, then 
backport LUCENE-6490 and LUCENE-6537.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-06-09 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Here's a patch that makes NearSpansOrdered non-lazy in the way Adrien 
described, and simplifies the SpanCollector accordingly.

Should I break out the changes to NearSpansOrdered into their own issue?  It 
seems like a big enough change in its own right, really.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-06-08 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

The patch again, this time taken from the correct point in the source tree :-/  
I've fixed the javadoc comment as well.

bq. should we consider removing it entirely?

I don't think so, it's a pretty fundamental operation.  One way of simplifying 
it might be to make SpanCollector final, and have it collect either everything 
or nothing, so that creating subcollectors is easier.  But that then makes it 
difficult to move payload collection out of core.  Or maybe instead we could 
make SpanCollector implement Cloneable, and move the responsibility of building 
subcollectors directly into NearSpansOrdered?

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-29 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Patch updated to trunk.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-25 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Here's a patch taking into account all the comments here and on LUCENE-6494.
* SpanCollector becomes an interface again, so payload collection is entirely 
defined in the .payloads package
* BufferedSpanCollector is removed, replaced by a simple array of 
SpanCollectors in NearSpansOrdered.  SpanCollector has two methods to deal with 
this, newSubCollectors() and collectedComposite(), to create and then replay.
* SpanCollectors are passed through in getSpans().  A null passed here means no 
collection, and there's a default getSpans() call on SpanWeight that always 
passes a null collector.
* I've removed SpanSimilarity, in favour of passing a map of Terms to 
TermContexts to the SpanWeight constructor.  If this is null, then scoring 
isn't required; if not, then SpanWeight builds a SimScorer and passes that to 
its scorer.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-6371:

Fix Version/s: (was: 5.2)
   5.3
   Trunk

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-19 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Fix Version/s: 5.2

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 5.2
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-15 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Updated with a highlighter fix as well.

There's one change in behaviour here with respect to payload collection, in 
that previously a TermSpans would only read a payload once.  This seems a bit 
odd to me, and it meant that, for example, UnorderedNearQueries with 
overlapping matches would sometimes read incomplete payloads.  I've had to 
change PositionIncrementTest to take this change into account.

I'll look at benchmarking this properly with luceneutil.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-13 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

Updated patch:
* collectLeaf() now takes PostingsEnum and Term
* the default impls are renamed to NO_OP

Changing from Collection to BytesRefArray is a great idea, but I'd like 
to do that in a separate issue as that effects the external SpanQuery API a 
fair amount.  This patch currently only changes internals.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6371) Improve Spans payload collection

2015-05-13 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6371:
--
Attachment: LUCENE-6371.patch

I've been playing around with various APIs for this, and I think this one works 
reasonably well.

Spans.isPayloadAvailable() and getPayload() are replaced with a collect() 
method that takes a SpanCollector.  If you want to get payloads from a Spans, 
you do the following:

{code:java}
PayloadSpanCollector collector = new PayloadSpanCollector();
while (spans.nextStartPosition() != NO_MORE_POSITIONS) {
  collector.reset();
  spans.collect(collector);
  doSomethingWith(collector.getPayloads());
}
{code}

The actual job of collecting information from postings lists is devolved to the 
collector itself (via SpanCollector.collectLeaf(), called from 
TermSpans.collect()).

The API is made slightly complicated by the need to buffer collected 
information in NearOrderedSpans, because the algorithm there moves child spans 
on eagerly when finding the smallest possible match, so by the time collect() 
is called we're out of position.  This is dealt with using a 
BufferedSpanCollector, with collectCandidate(Spans) and accept() methods.  The 
default (No-op) collector has a no-op implementation of this, which should get 
optimized away by HotSpot, meaning that we don't need to have separate 
implementations for collecting and non-collecting algorithms, and can do away 
with PayloadNearOrderedSpans.

This patch also moves the PayloadCheck queries to the .payloads package, which 
tidies things up a bit.

All tests pass.

> Improve Spans payload collection
> 
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org