[ https://issues.apache.org/jira/browse/LUCENE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237674#comment-13237674 ]
Christian Moen commented on LUCENE-3915: ---------------------------------------- Find attached a draft patch that replaces term attributes with readings. I saw in Ohtani-san's Twitter feed that Koji had checked this functionality into lucene-gosen and I'm providing a similar patch here hoping to support the Japanese spell-checking work. This patch can also convert katakana readings to romaji and it might make sense to use a romaji representation to do the spell-checking. We probably also need to deal with misspellings turning into several tokens, and that we need to recompose them using their readings before we do matching. Just some thoughts... > Add Japanese filter to replace term attribute with readings > ----------------------------------------------------------- > > Key: LUCENE-3915 > URL: https://issues.apache.org/jira/browse/LUCENE-3915 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Christian Moen > Priority: Minor > Attachments: LUCENE-3915.patch > > > Koji and Robert are working on LUCENE-3888 that allows spell-checkers to do > their similarity matching using a different word than its surface form. > This approach is very useful for languages such as Japanese where the surface > form and the form we'd like to use for similarity matching is very different. > For Japanese, it's useful to use readings for this -- probably with some > normalization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org