[ 
https://issues.apache.org/jira/browse/LUCENE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-6494:
----------------------------------
    Attachment: LUCENE-6494.patch

Here is a patch.

SpanCollector is changed from an interface to a concrete implementation, 
parametrized by a MatchData data type defining the type of postings information 
to collect.  The no-op implementation is a specialised subclass.

Most of the functionality from PayloadSpanUtil is moved to MatchDataCollector.  
This will take an arbitrary query, convert it to a Span query, run it over any 
document in a searcher, and return a MatchDataIterator<> that iterates over the 
matches for that query within that doc.  This ignores things like SpanNot 
exclusions and Boolean MUST_NOT clauses, so you should make sure that you 
already know a document is a match before passing it in.  PayloadSpanUtil 
retains its existing methods and constructor for backwards compatibility.

MatchData implementations for positions, offsets and payloads are all provided, 
although at the moment you can only collect one of these at a time - a 
composite collector is something I want to look at in another issue.

The MatchDataIterator<T> interface is a bit clunky at the moment.  I might look 
at moving the field information directly into MatchData and changing this to 
look more like other iterators (either lucene ones or Java ones).

There are still lots of javadocs to add, and I'm writing more tests, but I 
thought I'd put this up for comment.  It should allow things like luwak's 
exact-match highlighter to work without requiring all the low-level changes in 
LUCENE-2878.

> Make PayloadSpanUtil apply to other postings information
> --------------------------------------------------------
>
>                 Key: LUCENE-6494
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6494
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>             Fix For: 5.2
>
>         Attachments: LUCENE-6494.patch
>
>
> With the addition of SpanCollectors, we can now get arbitrary postings 
> information from SpanQueries.  PayloadSpanUtil does some rewriting to convert 
> non-span queries into SpanQueries so that it can collect payloads.  It would 
> be good to make this more generic, so that we can collect any postings 
> information from any query (without having to make invasive changes to 
> already optimized Scorers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to