dsmiley commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361827668
########## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldHighlighter.java ########## @@ -159,8 +160,9 @@ public Object highlightFieldForDoc(LeafReader reader, int docId, String content) break; } // advance breakIterator - passage.setStartOffset(Math.max(this.breakIterator.preceding(start + 1), 0)); - passage.setEndOffset(Math.min(this.breakIterator.following(start), contentLength)); + passage.setStartOffset(Math.max(this.breakIterator.preceding(start + 1), lastPassageEnd)); Review comment: Oh wait; something occurred to me. The breakIterator.preceding impl doesn't intrinsically know that FieldHighlighter is going to call `Math.max(..., lastPassageEnd)` on it. And I recall you are adding this change here in FieldHighlighter because the updated LengthGoalBreakIterator might want to look further back to the left into a zone that might have been part of a previous Passage. Maybe `LengthGoalBreakIterator.preceding` should examine `current()` at the start and ensure it doesn't yield a break before that. Then FieldHighlighter wouldn't change. Without this small proposal, the length of this passage will be undersized because LengthGoalBreakIterator doesn't know FieldHighlighter is going to chop off some of the beginning thanks to that `max()`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org