Otis Gospodnetic wrote: > I see it only in 1 place (Korean): U+1100-U+11FF is included in the U+0100-U+1FFF range in LETTER:
| < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", --------------------------- "\u0100"-"\u1fff", --------------------------- "\uffa0"-"\uffdc" ] > > $ grep 11ff src/java/org/apache/lucene/analysis/standard/StandardTokenizer.jj > "\u1100"-"\u11ff" // Hangul Jamo > > Maybe I'm not seeing something... > > Otis > > ----- Original Message ---- > From: Steven Rowe <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, October 20, 2006 1:22:48 PM > Subject: Re: [jira] Updated: (LUCENE-692) Hangul Jamo (Korean) support in > StandardTokenizer.jj > > Joe Shaw (JIRA) wrote: >> [ http://issues.apache.org/jira/browse/LUCENE-692?page=all ] > [snip] >> One of our users reported their inability to search some Korean >> strings. This is because the Hangul Jamo Unicode block is not >> included in the StandardTokenizer.jj file. >> I'm attaching a patch which fixes this, from Young-Ho Cha. > > This has already been addressed by a patch committed by Otis to fix the > following issue (in August 2006, after the 2.0.0 release): > > https://issues.apache.org/jira/browse/LUCENE-478 > > Here is the Korean section from trunk version of StandardAnalyzer.jj: > > | < KOREAN: // Korean > [ > "\uac00"-"\ud7af", // Hangul Syllables > "\u1100"-"\u11ff" // Hangul Jamo > // "\uac00"-"\ud7a3" > ] > > > > Actually, there is an oddity here -- Otis, you last committed a change > to this file -- do you know why the Hangul Jamo range is included twice, > once in the KOREAN section and again in the LETTER section? > > | < #LETTER: // unicode letters > [ > "\u0041"-"\u005a", > "\u0061"-"\u007a", > "\u00c0"-"\u00d6", > "\u00d8"-"\u00f6", > "\u00f8"-"\u00ff", > "\u0100"-"\u1fff", > "\uffa0"-"\uffdc" > ] > > > > Steve > >> Joe Shaw updated LUCENE-692: >> ---------------------------- >> >> Attachment: lucene-hangul-jamo.patch >> >> Patch to StandardTokenizer.jj which fixes this. >> >>> Hangul Jamo (Korean) support in StandardTokenizer.jj >>> ---------------------------------------------------- >>> >>> Key: LUCENE-692 >>> URL: http://issues.apache.org/jira/browse/LUCENE-692 >>> Project: Lucene - Java >>> Issue Type: Improvement >>> Components: Analysis >>> Affects Versions: 1.9, 2.0.0, 2.1, 2.0.1 >>> Reporter: Joe Shaw >>> Priority: Minor >>> Attachments: lucene-hangul-jamo.patch --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]