[ http://issues.apache.org/jira/browse/LUCENE-444?page=all ] Erik Hatcher closed LUCENE-444: -------------------------------
I'm closing this issue... but some unit tests would be nice to go along with this too, eventually :) > StandardTokenizer loses Korean characters > ----------------------------------------- > > Key: LUCENE-444 > URL: http://issues.apache.org/jira/browse/LUCENE-444 > Project: Lucene - Java > Type: Bug > Components: Analysis > Reporter: Cheolgoo Kang > Priority: Minor > Fix For: 1.9 > Attachments: StandardTokenizer_Korean.patch > > While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, > StandardTokenizer ignores the Korean characters. This is because the > definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have > enough range covering Korean syllables described in Unicode character map. > This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the > StandardTokenizer.jj code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]