Otis Gospodnetic wrote:
> I see it only in 1 place (Korean):
U+1100-U+11FF is included in the U+0100-U+1FFF range in LETTER:
| < #LETTER: // unicode letters
[
"\u0041"-"\u005a",
"\u0061"-"\u007a",
"\u00c0"-"\u00d6",
"\u00d8"-"\u00f6",
"\u00f8"-"\u00ff",
---------------------------
"\u0100"-"\u1fff",
---------------------------
"\uffa0"-"\uffdc"
]
>
> $ grep 11ff src/java/org/apache/lucene/analysis/standard/StandardTokenizer.jj
> "\u1100"-"\u11ff" // Hangul Jamo
>
> Maybe I'm not seeing something...
>
> Otis
>
> ----- Original Message ----
> From: Steven Rowe <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, October 20, 2006 1:22:48 PM
> Subject: Re: [jira] Updated: (LUCENE-692) Hangul Jamo (Korean) support in
> StandardTokenizer.jj
>
> Joe Shaw (JIRA) wrote:
>> [ http://issues.apache.org/jira/browse/LUCENE-692?page=all ]
> [snip]
>> One of our users reported their inability to search some Korean
>> strings. This is because the Hangul Jamo Unicode block is not
>> included in the StandardTokenizer.jj file.
>> I'm attaching a patch which fixes this, from Young-Ho Cha.
>
> This has already been addressed by a patch committed by Otis to fix the
> following issue (in August 2006, after the 2.0.0 release):
>
> https://issues.apache.org/jira/browse/LUCENE-478
>
> Here is the Korean section from trunk version of StandardAnalyzer.jj:
>
> | < KOREAN: // Korean
> [
> "\uac00"-"\ud7af", // Hangul Syllables
> "\u1100"-"\u11ff" // Hangul Jamo
> // "\uac00"-"\ud7a3"
> ]
> >
>
> Actually, there is an oddity here -- Otis, you last committed a change
> to this file -- do you know why the Hangul Jamo range is included twice,
> once in the KOREAN section and again in the LETTER section?
>
> | < #LETTER: // unicode letters
> [
> "\u0041"-"\u005a",
> "\u0061"-"\u007a",
> "\u00c0"-"\u00d6",
> "\u00d8"-"\u00f6",
> "\u00f8"-"\u00ff",
> "\u0100"-"\u1fff",
> "\uffa0"-"\uffdc"
> ]
> >
>
> Steve
>
>> Joe Shaw updated LUCENE-692:
>> ----------------------------
>>
>> Attachment: lucene-hangul-jamo.patch
>>
>> Patch to StandardTokenizer.jj which fixes this.
>>
>>> Hangul Jamo (Korean) support in StandardTokenizer.jj
>>> ----------------------------------------------------
>>>
>>> Key: LUCENE-692
>>> URL: http://issues.apache.org/jira/browse/LUCENE-692
>>> Project: Lucene - Java
>>> Issue Type: Improvement
>>> Components: Analysis
>>> Affects Versions: 1.9, 2.0.0, 2.1, 2.0.1
>>> Reporter: Joe Shaw
>>> Priority: Minor
>>> Attachments: lucene-hangul-jamo.patch
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]