To follow-up: I hacked into the offsets by passing WholeBreakIterator
and a custom PassageFormatter that just returns the matches from the
singleton resulting passage. This is suboptimal though, as there's
still some complex logic going on in highlightOffsetsEnums that could
be avoided.

Dawid

On Wed, Jan 11, 2017 at 11:34 AM, Dawid Weiss <dawid.we...@gmail.com> wrote:
> Can any of the folks who contributed to UnifiedHighlighter (David?)
> clarify my thinking here?
>
> I have a requirement to extract (for a set of search results) a list
> of exact "hit" ranges (field offsets, with support for multi-term
> queries and span queries). Obviously, I'm only talking about queries
> that relate to field content somehow, but this has always been quite
> problematic and required the use of multiple helper classes
> (WeightedSpanTermExtractor, MultiTermHighlighting, etc.) and pretty
> hairy logic.
>
> So I turned to look at UnifiedHighlighter for help.
>
> Seems like the right way (?) to do it would be to override (and abuse)
> UnifiedHighlighter's getFieldHighlighter method and return a field
> highlighter with an override of:
>
> protected Passage[] highlightOffsetsEnums(List<OffsetsEnum>
> offsetsEnums) throws IOException {
>
> so that I can capture and return a separate Passage for each
> OffsetsEnum (I have my own code to deal with overlaps and merging, so
> I can skip this entirely). Then, with a custom no-op PassageFormatter
> I could simply get a list of those offsets.
>
> The problem with this approach is that there is currently no way to
> access offsets in OffsetsEnum -- everything is protected (so
> subclassable), but OffsetsEnum are closed to package-private scope.
> Namely these two:
>
>   int startOffset() throws IOException {
>     return postingsEnum.startOffset();
>   }
>
>   int endOffset() throws IOException {
>     return postingsEnum.endOffset();
>   }
>
> Should these two be protected to allow such customizations (I agree
> it's *very* low-level, but I have a practical use case where this
> would be useful).
>
> Am I on the right track here?
>
> Separately from that, I think it'd be nice to have some sort of
> generic utility that, for a given document (or a set of documents)
> would return such hit ranges... UnifiedHighlighter seems
>
> Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to