[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798319#comment-13798319
]
SooMyung Lee commented on LUCENE-4956:
--------------------------------------
Hi, all.
I' going to explain how I develop this code as Christian recommended because
of license and legal problem that [~jkrupan] mentioned in previous comment.
I started to write this code and dictionary in 2006 based on a book which
author is Seung-Shik, Kang who is a professor of Kookmin university now.
the dictionary consist of several files but major files are total.dic,
josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I
collected basic stem words for total.dic and particles for josa.dic and
eomi.dic from book and various websites. and then I surveyed how basic stem
words can be used on online dictionaries. and I only referred to the book to
make syllable.dic.
the rest of files is created by myself during developing except for
mapHanja.dic. I added this file two years ago. I'm not sure that this file has
not legal problem because many data came from projects result so it is better
to remove that data.
to make source code, I referred to the book so major logic was based on the
book except for some utilities classes such as String, File and Trie.java. I
copied most of utilities classes from apache common project but Trie.java from
other website. I cannot remember the exact website now because it was happend
long time ago. but I remember that I read the license that was Apache license.
I finished first version in 2008 and created an online community on a website
(called Naver) and uploaded the source code. the number of community members
are over 3700 currently.
I attended an opensource contest held by Korean government organization in
2009. During the contest, I uploaded the source code to the Sourceforge and got
a BlackDuck license test with this code and passed the test.
I have supported users through the online community
(http://cafe.naver.com/korlucene). so some users improved dictionaries and
source codes and then posted it on the website. and I merged it and opened it
again.
This is the wohle process how I developed the code. If anybody has something to
recommend, Please let me know it.
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.2
> Reporter: SooMyung Lee
> Assignee: Christian Moen
> Labels: newbie
> Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch,
> lucene4956.patch, LUCENE-4956.patch
>
>
> Korean language has specific characteristic. When developing search service
> with lucene & solr in korean, there are some problems in searching and
> indexing. The korean analyer solved the problems with a korean morphological
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene
> and solr. If you develop a search service with lucene in korean, It is the
> best idea to choose the korean analyzer.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]