[ https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099000#comment-13099000 ]
Robert Muir commented on LUCENE-3417: ------------------------------------- thanks for the patch: all tests pass here with it, and I think the added tests are clear. > DictionaryCompoundWordTokenFilter does not properly add tokens from the end > compound word. > ------------------------------------------------------------------------------------------ > > Key: LUCENE-3417 > URL: https://issues.apache.org/jira/browse/LUCENE-3417 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 3.3, 4.0 > Reporter: Njal Karevoll > Attachments: LUCENE-3417.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > Due to an off-by-one error, a subword placed at the end of a compound word > will not get a token added to the token stream. > For example (from the unit test in the attached patch): > Dictionary: {"ab", "cd", "ef"} > Input: "abcdef" > Created tokens: {"abcdef", "ab", "cd"} > Expected tokens: {"abcdef", "ab", "cd", "ef"} > Additionally, it could produce tokens that were shorter than the > minSubwordSize due to another off-by-one error. For example (again, from the > attached patch): > Dictionary: {"abc", "d", "efg"} > Minimum subword length: 2 > Input: "abcdefg" > Created tokens: {"abcdef", "abc", "d", "efg"} > Expected tokens: {"abcdef", "abc", "efg"} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org