[ https://issues.apache.org/jira/browse/LUCENE-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408137#comment-13408137 ]
Robert Muir commented on LUCENE-4201: ------------------------------------- I thought that might be the case: when i first wrote the tests i used japaneseAnalyzer and they always passed... So I think this is just the one corner case that MockTokenizer finds. Not correcting offsets keeps things simple: so if possible I think we could just not do anything with iteration marks + surrogates and leave as-is, otherwise to actually replace the iteration mark with those, we would need offsets corrections. > Add Japanese character filter to normalize iteration marks > ---------------------------------------------------------- > > Key: LUCENE-4201 > URL: https://issues.apache.org/jira/browse/LUCENE-4201 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Affects Versions: 4.0, 5.0 > Reporter: Christian Moen > Attachments: LUCENE-4201.patch, LUCENE-4201.patch > > > For some applications it might be useful to normalize kanji and kana > iteration marks such as 々, ゞ, ゝ, ヽ and ヾ to make sure they are treated > uniformly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org