[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371-5x.patch Smaller patch, after LUCENE-6466 and LUCENE-6537 > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: 5.3, Trunk > > Attachments: LUCENE-6371-5x.patch, LUCENE-6371-5x.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371-5x.patch Here's a patch for 5x. What with reversions and overlapping commits, it turns out the easiest thing to do was to merge the patches for LUCENE-6466, LUCENE-6537 and LUCENE-6371 into one. All tests are passing, but I want to beast this against Java 7 for a bit to check that LUCENE-6490 is fixed. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: 5.3, Trunk > > Attachments: LUCENE-6371-5x.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Updated patch, following on from LUCENE-6537. I'd like to commit this, then backport LUCENE-6490 and LUCENE-6537. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: 5.3, Trunk > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Here's a patch that makes NearSpansOrdered non-lazy in the way Adrien described, and simplifies the SpanCollector accordingly. Should I break out the changes to NearSpansOrdered into their own issue? It seems like a big enough change in its own right, really. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: Trunk, 5.3 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch The patch again, this time taken from the correct point in the source tree :-/ I've fixed the javadoc comment as well. bq. should we consider removing it entirely? I don't think so, it's a pretty fundamental operation. One way of simplifying it might be to make SpanCollector final, and have it collect either everything or nothing, so that creating subcollectors is easier. But that then makes it difficult to move payload collection out of core. Or maybe instead we could make SpanCollector implement Cloneable, and move the responsibility of building subcollectors directly into NearSpansOrdered? > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: Trunk, 5.3 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Patch updated to trunk. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: Trunk, 5.3 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Here's a patch taking into account all the comments here and on LUCENE-6494. * SpanCollector becomes an interface again, so payload collection is entirely defined in the .payloads package * BufferedSpanCollector is removed, replaced by a simple array of SpanCollectors in NearSpansOrdered. SpanCollector has two methods to deal with this, newSubCollectors() and collectedComposite(), to create and then replay. * SpanCollectors are passed through in getSpans(). A null passed here means no collection, and there's a default getSpans() call on SpanWeight that always passes a null collector. * I've removed SpanSimilarity, in favour of passing a map of Terms to TermContexts to the SpanWeight constructor. If this is null, then scoring isn't required; if not, then SpanWeight builds a SimScorer and passes that to its scorer. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: Trunk, 5.3 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, > LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-6371: Fix Version/s: (was: 5.2) 5.3 Trunk > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: Trunk, 5.3 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Fix Version/s: 5.2 > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Assignee: Alan Woodward >Priority: Minor > Fix For: 5.2 > > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Updated with a highlighter fix as well. There's one change in behaviour here with respect to payload collection, in that previously a TermSpans would only read a payload once. This seems a bit odd to me, and it meant that, for example, UnorderedNearQueries with overlapping matches would sometimes read incomplete payloads. I've had to change PositionIncrementTest to take this change into account. I'll look at benchmarking this properly with luceneutil. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Priority: Minor > Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch Updated patch: * collectLeaf() now takes PostingsEnum and Term * the default impls are renamed to NO_OP Changing from Collection to BytesRefArray is a great idea, but I'd like to do that in a separate issue as that effects the external SpanQuery API a fair amount. This patch currently only changes internals. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Priority: Minor > Attachments: LUCENE-6371.patch, LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6371) Improve Spans payload collection
[ https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6371: -- Attachment: LUCENE-6371.patch I've been playing around with various APIs for this, and I think this one works reasonably well. Spans.isPayloadAvailable() and getPayload() are replaced with a collect() method that takes a SpanCollector. If you want to get payloads from a Spans, you do the following: {code:java} PayloadSpanCollector collector = new PayloadSpanCollector(); while (spans.nextStartPosition() != NO_MORE_POSITIONS) { collector.reset(); spans.collect(collector); doSomethingWith(collector.getPayloads()); } {code} The actual job of collecting information from postings lists is devolved to the collector itself (via SpanCollector.collectLeaf(), called from TermSpans.collect()). The API is made slightly complicated by the need to buffer collected information in NearOrderedSpans, because the algorithm there moves child spans on eagerly when finding the smallest possible match, so by the time collect() is called we're out of position. This is dealt with using a BufferedSpanCollector, with collectCandidate(Spans) and accept() methods. The default (No-op) collector has a no-op implementation of this, which should get optimized away by HotSpot, meaning that we don't need to have separate implementations for collecting and non-collecting algorithms, and can do away with PayloadNearOrderedSpans. This patch also moves the PayloadCheck queries to the .payloads package, which tidies things up a bit. All tests pass. > Improve Spans payload collection > > > Key: LUCENE-6371 > URL: https://issues.apache.org/jira/browse/LUCENE-6371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Priority: Minor > Attachments: LUCENE-6371.patch > > > Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org