[ 
https://issues.apache.org/jira/browse/LUCENE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554735#comment-14554735
 ] 

David Smiley commented on LUCENE-6494:
--------------------------------------

bq. We could add a Collection<Term> to MatchData as well, to collect all terms 
from a Spans.   I'm not sure I see why you need the Term for highlighting 
though - can't you just use offsets?

You may be right about not needing the Term.  I should retract my concerns 
about this for now, as it pertains to accurate highlights.  I need to build a 
POC to understand what's really needed.  Once I saw the SpanCollector, it 
seemed very promising but I'm having second thoughts now.  When I last thought 
about this problem, I ended up wanting a Spans.getChildren() of sorts -- just 
like Scorers do.  I still think that would most likely be more elegant.  The 
tricky part of doing such a thing, I think, would be handling the buffered case 
of NearSpansOrdered such that if I get the child spans, then it would return 
cached child spans for where it matched, not where the current state of the 
child spans may have advanced to.  Alternatively SpanCollector is somewhat 
similar but it's MatchData, as written, doesn't capture each leaf state 
separately -- instead it expands the bounds.  This means currently I can't get 
the offsets of each underlying SpanTermQuery offset match, but only the 
aggregate start/end offset span which could cover a ton of text -- I don't want 
to highlight everything in-between.

> Make PayloadSpanUtil apply to other postings information
> --------------------------------------------------------
>
>                 Key: LUCENE-6494
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6494
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>             Fix For: 5.2
>
>         Attachments: LUCENE-6494.patch, LUCENE-6494.patch, LUCENE-6494.patch, 
> LUCENE-6494.patch
>
>
> With the addition of SpanCollectors, we can now get arbitrary postings 
> information from SpanQueries.  PayloadSpanUtil does some rewriting to convert 
> non-span queries into SpanQueries so that it can collect payloads.  It would 
> be good to make this more generic, so that we can collect any postings 
> information from any query (without having to make invasive changes to 
> already optimized Scorers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to