[ https://issues.apache.org/jira/browse/LUCENE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi reassigned LUCENE-2909: -------------------------------------- Assignee: Koji Sekiguchi > NGramTokenFilter may generate offsets that exceed the length of original text > ----------------------------------------------------------------------------- > > Key: LUCENE-2909 > URL: https://issues.apache.org/jira/browse/LUCENE-2909 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/analyzers > Affects Versions: 2.9.4 > Reporter: Shinya Kasatani > Assignee: Koji Sekiguchi > Priority: Minor > Attachments: TokenFilterOffset.patch > > > Whan using NGramTokenFilter combined with CharFilters that lengthen the > original text (such as "ß" -> "ss"), the generated offsets exceed the length > of the origianal text. > This causes InvalidTokenOffsetsException when you try to highlight the text > in Solr. > While it is not possible to know the accurate offset of each character once > you tokenize the whole text with tokenizers like KeywordTokenizer, > NGramTokenFilter should at least avoid generating invalid offsets. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org