[ https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805199#comment-15805199 ]
Jim Ferenczi commented on LUCENE-7620: -------------------------------------- {quote} By choosing a lengthGoal on the low side; maybe "too long" will tend not to be a problem? Or see my TODO at the top of the file – essentially choose the break that is closest to the goal instead of always the first following it. {quote} Yeah depends how the lengthGoal is perceived. I was looking at it as a boundary mainly to solve "too long" fragment. And this issue is more about "too short" fragments. Maybe a different issue then but I am just afraid that we'll end up with multiple public break iterator impls that must follow a specific pattern to be used. Anyway this patch is a start to get better highlighting through custom break iterator and it solves a real issue. Please push to 6.4 if you think it's ready, we can always discuss the next steps in a follow up. Regarding the assertion I prefer an IllegalStateException with a clear message but I am maybe too paranoid. > UnifiedHighlighter: add target character width BreakIterator wrapper > -------------------------------------------------------------------- > > Key: LUCENE-7620 > URL: https://issues.apache.org/jira/browse/LUCENE-7620 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: David Smiley > Assignee: David Smiley > Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch > > > The original Highlighter includes a {{SimpleFragmenter}} that delineates > fragments (aka Passages) by a character width. The default is 100 characters. > It would be great to support something similar for the UnifiedHighlighter. > It's useful in its own right and of course it helps users transition to the > UH. I'd like to do it as a wrapper to another BreakIterator -- perhaps a > sentence one. In this way you get back Passages that are a number of > sentences so they will look nice instead of breaking mid-way through a > sentence. And you get some control by specifying a target number of > characters. This BreakIterator wouldn't be a general purpose > java.text.BreakIterator since it would assume it's called in a manner exactly > as the UnifiedHighlighter uses it. It would probably be compatible with the > PostingsHighlighter too. > I don't propose doing this by default; besides, it's easy enough to pick your > BreakIterator config. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org