DictionaryCompoundWordTokenFilter does not properly add tokens from the end 
compound word.
------------------------------------------------------------------------------------------

                 Key: LUCENE-3417
                 URL: https://issues.apache.org/jira/browse/LUCENE-3417
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 3.3, 4.0
            Reporter: Njal Karevoll


Due to an off-by-one error, a subword placed at the end of a compound word will 
not get a token added to the token stream.


Example:
Dictionary: {"ab", "cd", "ef"}
word: "abcdef"
Created tokens: {"abcdef", "ab", "cd"}
Expected tokens: {"abcdef", "ab", "cd", "ef"}


Additionally, it could produce tokens that were shorter than the minSubwordSize 
due to another off-by-one error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to