[ https://issues.apache.org/jira/browse/LUCENE-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408106#comment-13408106 ]
Christian Moen commented on LUCENE-4201: ---------------------------------------- I've indexed the Japanese Wikipedia using this filter and things look okay. I'm seeing a ~8% performance overhead (versus no filter). My thinking is that this filter should be available for applications that need it, but it should not be part of our default Japanese configuration. > Add Japanese character filter to normalize iteration marks > ---------------------------------------------------------- > > Key: LUCENE-4201 > URL: https://issues.apache.org/jira/browse/LUCENE-4201 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Affects Versions: 4.0, 5.0 > Reporter: Christian Moen > Attachments: LUCENE-4201.patch > > > For some applications it might be useful to normalize kanji and kana > iteration marks such as 々, ゞ, ゝ, ヽ and ヾ to make sure they are treated > uniformly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org