[ http://issues.apache.org/jira/browse/LUCENE-444?page=all ]
     
Erik Hatcher closed LUCENE-444:
-------------------------------


I'm closing this issue... but some unit tests would be nice to go along with 
this too, eventually :)

> StandardTokenizer loses Korean characters
> -----------------------------------------
>
>          Key: LUCENE-444
>          URL: http://issues.apache.org/jira/browse/LUCENE-444
>      Project: Lucene - Java
>         Type: Bug
>   Components: Analysis
>     Reporter: Cheolgoo Kang
>     Priority: Minor
>      Fix For: 1.9
>  Attachments: StandardTokenizer_Korean.patch
>
> While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, 
> StandardTokenizer ignores the Korean characters. This is because the 
> definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have 
> enough range covering Korean syllables described in Unicode character map.
> This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the 
> StandardTokenizer.jj code.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to