[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

SooMyung Lee (JIRA) Thu, 17 Oct 2013 12:50:14 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798319#comment-13798319
 ]


SooMyung Lee commented on LUCENE-4956:
--------------------------------------

Hi, all.

I' going to explain how I develop this code  as Christian recommended because 
of license and legal problem that [~jkrupan] mentioned in previous comment. 

I started to write this code and dictionary in 2006 based on a book which 
author is Seung-Shik, Kang who is a professor of Kookmin university now.

the dictionary consist of several files but major files are total.dic, 
josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I 
collected basic stem words for total.dic and particles for josa.dic and 
eomi.dic from book and various websites. and then I surveyed how basic stem 
words can be used on online dictionaries. and  I only referred to the book to 
make syllable.dic. 
the rest of files is created by myself during developing except for 
mapHanja.dic. I added this file two years ago. I'm not sure that this file has 
not legal problem because many data came from projects result so it is better 
to remove that data.

to make source code, I referred to the book so major logic was based on the 
book except for some utilities classes such as String, File and Trie.java. I  
copied most of utilities classes from apache common project but Trie.java from 
other website. I cannot remember the exact website now because it was happend 
long time ago. but I remember that I read the license that was Apache license.

I finished first version in 2008 and created an online community on a website 
(called Naver) and uploaded the source code.  the number of community members 
are over 3700 currently.
I attended an opensource contest held by Korean government organization in 
2009. During the contest, I uploaded the source code to the Sourceforge and got 
a BlackDuck license test with this code and passed the test.

I have supported users through the online community 
(http://cafe.naver.com/korlucene). so some users improved dictionaries and 
source codes and then posted it on the website. and I merged it and opened it 
again.

This is the wohle process how I developed the code. If anybody has something to 
recommend, Please let me know it.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
> lucene4956.patch, LUCENE-4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to