Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361839121
########## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldHighlighter.java ########## @@ -159,8 +160,9 @@ public Object highlightFieldForDoc(LeafReader reader, int docId, String content) break; } // advance breakIterator - passage.setStartOffset(Math.max(this.breakIterator.preceding(start + 1), 0)); - passage.setEndOffset(Math.min(this.breakIterator.following(start), contentLength)); + passage.setStartOffset(Math.max(this.breakIterator.preceding(start + 1), lastPassageEnd)); Review comment: > Without this small proposal, the length of this passage will be undersized That's incorrect. Such a fragment will be undersized either way. The current approach has the `fragsize` split up by `fragAlignRatio` statically. Even if there is not fulfilled expansion on the left, that won't be used on the right. We would only be moving the point where the `fragsize` on the left is truncated. BTW in the results these would-be-overlapping fragments get merged into a single snippet. So they'll become a bigger one instead of one normal- and one undersized. The only sure place we will receive an undersized snippet is when a match is at the very beginning or the end of the text. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org