[ 
https://issues.apache.org/jira/browse/LUCENE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237674#comment-13237674
 ] 

Christian Moen commented on LUCENE-3915:
----------------------------------------

Find attached a draft patch that replaces term attributes with readings.  I saw 
in Ohtani-san's Twitter feed that Koji had checked this functionality into 
lucene-gosen and I'm providing a similar patch here hoping to support the 
Japanese spell-checking work.

This patch can also convert katakana readings to romaji and it might make sense 
to use a romaji representation to do the spell-checking.  We probably also need 
to deal with misspellings turning into several tokens, and that we need to 
recompose them using their readings before we do matching.

Just some thoughts...
                
> Add Japanese filter to replace term attribute with readings
> -----------------------------------------------------------
>
>                 Key: LUCENE-3915
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3915
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Christian Moen
>            Priority: Minor
>         Attachments: LUCENE-3915.patch
>
>
> Koji and Robert are working on LUCENE-3888 that allows spell-checkers to do 
> their similarity matching using a different word than its surface form.
> This approach is very useful for languages such as Japanese where the surface 
> form and the form we'd like to use for similarity matching is very different. 
>  For Japanese, it's useful to use readings for this -- probably with some 
> normalization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to