Joe Shaw (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-692?page=all ] [snip] > One of our users reported their inability to search some Korean > strings. This is because the Hangul Jamo Unicode block is not > included in the StandardTokenizer.jj file. > I'm attaching a patch which fixes this, from Young-Ho Cha.
This has already been addressed by a patch committed by Otis to fix the following issue (in August 2006, after the 2.0.0 release): https://issues.apache.org/jira/browse/LUCENE-478 Here is the Korean section from trunk version of StandardAnalyzer.jj: | < KOREAN: // Korean [ "\uac00"-"\ud7af", // Hangul Syllables "\u1100"-"\u11ff" // Hangul Jamo // "\uac00"-"\ud7a3" ] > Actually, there is an oddity here -- Otis, you last committed a change to this file -- do you know why the Hangul Jamo range is included twice, once in the KOREAN section and again in the LETTER section? | < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\uffa0"-"\uffdc" ] > Steve > Joe Shaw updated LUCENE-692: > ---------------------------- > > Attachment: lucene-hangul-jamo.patch > > Patch to StandardTokenizer.jj which fixes this. > >> Hangul Jamo (Korean) support in StandardTokenizer.jj >> ---------------------------------------------------- >> >> Key: LUCENE-692 >> URL: http://issues.apache.org/jira/browse/LUCENE-692 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: Analysis >> Affects Versions: 1.9, 2.0.0, 2.1, 2.0.1 >> Reporter: Joe Shaw >> Priority: Minor >> Attachments: lucene-hangul-jamo.patch --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]