Joe Shaw (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/LUCENE-692?page=all ]
[snip]
> One of our users reported their inability to search some Korean
> strings.  This is because the Hangul Jamo Unicode block is not
> included in the StandardTokenizer.jj file.
> I'm attaching a patch which fixes this, from Young-Ho Cha.

This has already been addressed by a patch committed by Otis to fix the
following issue (in August 2006, after the 2.0.0 release):

   https://issues.apache.org/jira/browse/LUCENE-478

Here is the Korean section from trunk version of StandardAnalyzer.jj:

| < KOREAN:                                          // Korean
      [
       "\uac00"-"\ud7af",     // Hangul Syllables
       "\u1100"-"\u11ff"      // Hangul Jamo
       // "\uac00"-"\ud7a3"
      ]
  >

Actually, there is an oddity here -- Otis, you last committed a change
to this file -- do you know why the Hangul Jamo range is included twice,
once in the KOREAN section and again in the LETTER section?

| < #LETTER:       // unicode letters
      [
       "\u0041"-"\u005a",
       "\u0061"-"\u007a",
       "\u00c0"-"\u00d6",
       "\u00d8"-"\u00f6",
       "\u00f8"-"\u00ff",
       "\u0100"-"\u1fff",
       "\uffa0"-"\uffdc"
      ]
  >

Steve

> Joe Shaw updated LUCENE-692:
> ----------------------------
> 
>     Attachment: lucene-hangul-jamo.patch
> 
> Patch to StandardTokenizer.jj which fixes this.
> 
>> Hangul Jamo (Korean) support in StandardTokenizer.jj
>> ----------------------------------------------------
>>
>>                 Key: LUCENE-692
>>                 URL: http://issues.apache.org/jira/browse/LUCENE-692
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Analysis
>>    Affects Versions: 1.9, 2.0.0, 2.1, 2.0.1
>>            Reporter: Joe Shaw
>>            Priority: Minor
>>         Attachments: lucene-hangul-jamo.patch


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to