[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798230#comment-13798230
 ] 

SooMyung Lee commented on LUCENE-4956:
--------------------------------------

[~bmargulies] Korean Tokenizer has the feature that identify language (Korean, 
English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has 
some different cases. First, eojeol consists of only Korean letters, Second, 
eojeol can be a combination of Korean letter and alphanumeric letter. Third, 
eojeol consists of only all alphanumeric letters. Fourth eojeol consists of 
Chinese letters. Tokinizer treat first and second case as Korean so Korean 
Morphological analysis is processed in Korean-filter. In second case, I copied 
code from standard-filter for korean-filter. In third case, Korean-filter map 
Chinese letter to Korean sound and then if it is a compound noun, decompounding 
is processed based on dictionary.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
> lucene4956.patch, LUCENE-4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to