[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2014-02-08 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895754#comment-13895754
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi [~daedeqi],
I created SourceForge project and contribute it to Apache Lucene project.
I'm working on fixing some problems mentioned in this Jira issue. 
But the problem you mentioned about 위키백과는 is to be solved if you add the word 
위키 into the dictionary.
There is no perfect way to analyze Korean sentences according to only a 
algorithm such as Porter Stemming in English sentence. So, we use both 
algorithm and dictionary to analyze Korean sentence. The dictionary for the 
Korean analyzer has around 40,000 words. I think most of the basic Korean word 
is included in it. But about many loan words such as 위키 is not included. 
Usually the user of Korean Analyzer in Korea builds his own dictionary for his 
purpose.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2014-01-12 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868971#comment-13868971
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~thetaphi‍] I'm trying to change the code to use StandardTokenizer but I found 
a problem. when a text with Chinese characters is passed into the 
StandardTokenizer, It tokenizes Chinese characters into each character. That 
makes it difficult to extract index keywords and map Chinese character to 
Hangul Character. So, to use StandardTokenizer for KoreanAnalyzer, consecutive 
Chinese characters should not be tokenized. 
Can you change the StandardTokenizer as I mentioned?

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-12-26 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856878#comment-13856878
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi Robert,

Thank you for your effort. 

Yes,  I also found some problems in some case like WordSpaceAnalyzer and 
offset/posincs, 
and I made some improvement. so, I'll post the patch soon.

I want to help you for the progress. 
but, I'm feeling strange when I work with you in this project. It is too 
difficult for me to figure out how I can help you.
Please, let me know something that I can help you more detail. Is it helpful to 
you If I make some test cases ?


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-12-26 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857107#comment-13857107
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~thetapi] Thank you for your comment, Uwe. 
I think I can make some improvements with above the problem like 
WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by 
this weekend.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-12-26 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857107#comment-13857107
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 12/26/13 9:54 PM:


[~thetaphi] Thank you for your comment, Uwe. 
I think I can make some improvements with above the problem like 
WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by 
this weekend.


was (Author: soomyung):
[~thetapi] Thank you for your comment, Uwe. 
I think I can make some improvements with above the problem like 
WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by 
this weekend.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-18 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798887#comment-13798887
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~rcmuir], Thanks again.

I have run a test case with new hanja-hangul mapping files. It works very well.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798230#comment-13798230
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~bmargulies] Korean Tokenizer has the feature that identify language (Korean, 
English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has 
some different cases. First, eojeol consists of only Korean letters, Second, 
eojeol can be a combination of Korean letter and alphanumeric letter. Third, 
eojeol consists of only all alphanumeric letters. Fourth eojeol consists of 
Chinese letters. Tokinizer treat first and second case as Korean so Korean 
Morphological analysis is processed in Korean-filter. In second case, I copied 
code from standard-filter for korean-filter. In third case, Korean-filter map 
Chinese letter to Korean sound and then if it is a compound noun, decompounding 
is processed based on dictionary.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798230#comment-13798230
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 10/17/13 6:28 PM:


[~bmargulies] Korean Tokenizer has the feature that identify language (Korean, 
English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has 
some different cases. First, eojeol consists of only Korean letters, Second, 
eojeol can be a combination of Korean letter and alphanumeric letter. Third, 
eojeol consists of only all alphanumeric letters. Fourth eojeol consists of 
Chinese letters. Tokinizer treat first and second case as Korean so Korean 
Morphological analysis is processed in Korean-filter. In third case, I copied 
code from standard-filter for korean-filter. In fourth case, Korean-filter map 
Chinese letter to Korean sound and then if it is a compound noun, decompounding 
is processed based on dictionary.


was (Author: soomyung):
[~bmargulies] Korean Tokenizer has the feature that identify language (Korean, 
English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has 
some different cases. First, eojeol consists of only Korean letters, Second, 
eojeol can be a combination of Korean letter and alphanumeric letter. Third, 
eojeol consists of only all alphanumeric letters. Fourth eojeol consists of 
Chinese letters. Tokinizer treat first and second case as Korean so Korean 
Morphological analysis is processed in Korean-filter. In second case, I copied 
code from standard-filter for korean-filter. In third case, Korean-filter map 
Chinese letter to Korean sound and then if it is a compound noun, decompounding 
is processed based on dictionary.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798319#comment-13798319
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, all.

I' going to explain how I develop this code  as Christian recommended because 
of license and legal problem that [~jkrupan] mentioned in previous comment. 

I started to write this code and dictionary in 2006 based on a book which 
author is Seung-Shik, Kang who is a professor of Kookmin university now.

the dictionary consist of several files but major files are total.dic, 
josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I 
collected basic stem words for total.dic and particles for josa.dic and 
eomi.dic from book and various websites. and then I surveyed how basic stem 
words can be used on online dictionaries. and  I only referred to the book to 
make syllable.dic. 
the rest of files is created by myself during developing except for 
mapHanja.dic. I added this file two years ago. I'm not sure that this file has 
not legal problem because many data came from projects result so it is better 
to remove that data.

to make source code, I referred to the book so major logic was based on the 
book except for some utilities classes such as String, File and Trie.java. I  
copied most of utilities classes from apache common project but Trie.java from 
other website. I cannot remember the exact website now because it was happend 
long time ago. but I remember that I read the license that was Apache license.

I finished first version in 2008 and created an online community on a website 
(called Naver) and uploaded the source code.  the number of community members 
are over 3700 currently.
I attended an opensource contest held by Korean government organization in 
2009. During the contest, I uploaded the source code to the Sourceforge and got 
a BlackDuck license test with this code and passed the test.

I have supported users through the online community 
(http://cafe.naver.com/korlucene). so some users improved dictionaries and 
source codes and then posted it on the website. and I merged it and opened it 
again.

This is the wohle process how I developed the code. If anybody has something to 
recommend, Please let me know it.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798678#comment-13798678
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~bmargulies] WordSpaceAnalyzer has the feature. If fail to analyze Korean 
Morphology, KoreanFilter makes WordSpaceAnalyzer try to split eojeol. I'll add 
some test case.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798686#comment-13798686
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi [~rcmuir],

Thank you for your comment. I can reconstitute the hanja-hangul mappings file 
by myself if we cannot find other sources with clear licenses. I can easily get 
hanja list that often appear in Korean sentence. after then I'll look up online 
dictionary. I can start with 3,000~4,000 hanjas that is most often appeared 
Korean sentences.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798686#comment-13798686
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 10/18/13 1:06 AM:


Hi [~rcmuir],

Thank you for your comment. I can reconstitute the hanja-hangul mappings file 
by myself if we cannot find other sources with clear licenses. I can easily get 
hanja list that often appear in Korean sentence. after then I'll look up online 
dictionary. I can start with 3,000~4,000 hanjas that is most often appeared in 
Korean sentences.


was (Author: soomyung):
Hi [~rcmuir],

Thank you for your comment. I can reconstitute the hanja-hangul mappings file 
by myself if we cannot find other sources with clear licenses. I can easily get 
hanja list that often appear in Korean sentence. after then I'll look up online 
dictionary. I can start with 3,000~4,000 hanjas that is most often appeared 
Korean sentences.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-17 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798796#comment-13798796
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Great, thanks a lot !

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, 
 lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-14 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794683#comment-13794683
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~thetaphi] Thank you for your advice,
I have opened this source code in sourceforege since 2009 and have many users. 
but, nobody told me the bugs and I also didn't know that. Christian and myself 
will fix the bugs soon. Thank you again.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, 
 LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-08 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789512#comment-13789512
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Christian,

Yes, I worked my last patch against on lucene4956. I'll check the problem and 
inform you how to solve it within today.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, 
 LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-10-04 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786899#comment-13786899
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi Christian,

I didn't hear any news from you since last August. 
Do you have any problem with moving to next step?

I run a Korean developers community for the Korean Analyzer.
I announced that Arirang analyzer will be incorporated into lucene and solr 
soon. 
So, many developers are waiting for that.

I want we go to next step quickly. If you need any help, Please let me know. 


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, 
 LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-09-10 Thread SooMyung Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SooMyung Lee updated LUCENE-4956:
-

Attachment: lucene-4956.patch

Hi Christian,

I have sync up and I made some modification.
I'm attaching the patch.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, 
 LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-08-14 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739548#comment-13739548
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, Christian
Thank you for your effort.

I'll review the changes what you made.
I've been also made some improvements. I'll upload the patch soon.


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch, LUCENE-4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-07-09 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704145#comment-13704145
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, Steve

I see you created 4.4 branch for releasing.
After I looked over it, I found that the Korean analyzer(Arirang) is missing.

Can you tell me when the korean analyzer can be incorporated into the release.



 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-07-09 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704200#comment-13704200
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, Christian

I can understand your situation. I know you run the company.

I was just wondering if there is any problem with integrating it.
If you need any help, please let me know it.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-21 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663622#comment-13663622
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, Christian.

I have named the korean analyzer package as kr but recently I found that It 
is incorrect. kr is the country code of the south korean and kp is the 
country code of the north korea. I think ko is more suitable for the name of 
the korean anzlyzer package. ko is the korean language code. So, I want you 
to rename the korean analyzer package from kr to ko.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-18 Thread SooMyung Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SooMyung Lee updated LUCENE-4956:
-

Attachment: lucene4956.patch

I have changed the source code relating to keyword extraction. I have removed 
the properties relating to keyword extraction and changed the keyword 
extraction logic. I've also added a test case that describe how for the korean 
analyzer work.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-18 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:06 AM:


I have changed the source code relating to keyword extraction. I have removed 
the properties relating to keyword extraction and changed the keyword 
extraction logic. I've also added a test case that describe how for the korean 
analyzer to work.

  was (Author: soomyung):
I have changed the source code relating to keyword extraction. I have 
removed the properties relating to keyword extraction and changed the keyword 
extraction logic. I've also added a test case that describe how for the korean 
analyzer work.
  
 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-18 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:46 AM:


Hi, christian. 
I have made some changes of the source code and uploaded that.
I have changed the source code relating to keyword extraction. 
I have removed the properties relating to keyword extraction and changed the 
keyword extraction logic. 
I've also added a test case that describe how for the korean analyzer to work.

I hope this is of some help to you!

  was (Author: soomyung):
I have changed the source code relating to keyword extraction. I have 
removed the properties relating to keyword extraction and changed the keyword 
extraction logic. I've also added a test case that describe how for the korean 
analyzer to work.
  
 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-18 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:46 AM:


Hi, christian. 

I have made some changes of the source code and uploaded that.
I have changed the source code relating to keyword extraction. 
I have removed the properties relating to keyword extraction and changed the 
keyword extraction logic. 
I've also added a test case that describe how the korean analyzer works.

I hope this is of some help to you!

  was (Author: soomyung):
Hi, christian. 
I have made some changes of the source code and uploaded that.
I have changed the source code relating to keyword extraction. 
I have removed the properties relating to keyword extraction and changed the 
keyword extraction logic. 
I've also added a test case that describe how for the korean analyzer to work.

I hope this is of some help to you!
  
 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-14 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657105#comment-13657105
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi, Christian.
Until this week, I'll prepare some test cases and documents that explain how 
the options work and why those are needed.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-10 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655122#comment-13655122
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Cool! Thanks, Steve

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-09 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653453#comment-13653453
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi Christian,
Thanks for your great work.

I'd like to ask you to modify the text_kr field type definition in schema.xml 
as follows
{noformat}
fieldType name=text_kr class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KoreanTokenizerFactory/
filter class=solr.KoreanFilterFactory hasOrigin=true 
hasCNoun=true  bigrammable=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_kr.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.KoreanTokenizerFactory/
filter class=solr.KoreanFilterFactory hasOrigin=false 
hasCNoun=false  bigrammable=false/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_kr.txt/
  /analyzer  
/fieldType
{noformat}

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-06 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649633#comment-13649633
 ] 

SooMyung Lee commented on LUCENE-4956:
--

[~cm] I'm sorry that I didn't reply to your comment on the last weekend! I'm 
seeing that [~steve_rowe] solved your problem. am I right?
[~steve_rowe] I checked the method. isNounPart() is no more necessary. 
Spaces should be inserted between phrases in a korean sentence, but many people 
are confused in where inserting spaces. 

The isNounPart() method examine if spaces should be inserted at a specific 
position only when a noun existing in the dictionary precede it.
After testing, I found that the method is superfluous.
I'm sorry not to correct the source code before contributing.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-29 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644282#comment-13644282
 ] 

SooMyung Lee edited comment on LUCENE-4956 at 4/29/13 8:07 AM:
---

Hi Steve, What should I do in the present situation, Do I need to make a 
correction to all issues and submit new tarball? Please let me know what I have 
to do to move forward!

  was (Author: soomyung):
Hi Steve and Christian, What should I do in the present situation, Do I 
need to make a correction to all issues and submit new tarball? Please let me 
know what I have to do to move forward!
  
 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-28 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644282#comment-13644282
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi Steve and Christian, What should I do in the present situation, Do I need to 
make a correction to all issues and submit new tarball? Please let me know what 
I have to do to move forward!

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-26 Thread SooMyung Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643527#comment-13643527
 ] 

SooMyung Lee commented on LUCENE-4956:
--

Hi Steve, I think arirang is the best name for the korean analysis modules. 
arirang is the name of traditional korean song. So, I think arirang can 
represent korean analysis modules well.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-24 Thread SooMyung Lee (JIRA)
SooMyung Lee created LUCENE-4956:


 Summary: the korean analyzer that has a korean morphological 
analyzer and dictionaries
 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee


Korean language has specific characteristic. When developing search service 
with lucene  solr in korean, there are some problems in searching and 
indexing. The korean analyer solved the problems with a korean morphological 
anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean 
tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. 
If you develop a search service with lucene in korean, It is best idea to 
choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-24 Thread SooMyung Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SooMyung Lee updated LUCENE-4956:
-

Description: Korean language has specific characteristic. When developing 
search service with lucene  solr in korean, there are some problems in 
searching and indexing. The korean analyer solved the problems with a korean 
morphological anlyzer. It consists of a korean morphological analyzer, 
dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is 
made for lucene and solr. If you develop a search service with lucene in 
korean, It is the best idea to choose the korean analyzer.  (was: Korean 
language has specific characteristic. When developing search service with 
lucene  solr in korean, there are some problems in searching and indexing. The 
korean analyer solved the problems with a korean morphological anlyzer. It 
consists of a korean morphological analyzer, dictionaries, a korean tokenizer 
and a korean filter. The korean anlyzer is made for lucene and solr. If you 
develop a search service with lucene in korean, It is best idea to choose the 
korean analyzer.)

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie

 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-24 Thread SooMyung Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SooMyung Lee updated LUCENE-4956:
-

Attachment: kr.analyzer.4x.tar

8edffacb15b3964f25054c82c0d4ea92

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org