[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895754#comment-13895754 ] SooMyung Lee commented on LUCENE-4956: -- Hi [~daedeqi], I created SourceForge project and contribute it to Apache Lucene project. I'm working on fixing some problems mentioned in this Jira issue. But the problem you mentioned about 위키백과는 is to be solved if you add the word 위키 into the dictionary. There is no perfect way to analyze Korean sentences according to only a algorithm such as Porter Stemming in English sentence. So, we use both algorithm and dictionary to analyze Korean sentence. The dictionary for the Korean analyzer has around 40,000 words. I think most of the basic Korean word is included in it. But about many loan words such as 위키 is not included. Usually the user of Korean Analyzer in Korea builds his own dictionary for his purpose. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868971#comment-13868971 ] SooMyung Lee commented on LUCENE-4956: -- [~thetaphi] I'm trying to change the code to use StandardTokenizer but I found a problem. when a text with Chinese characters is passed into the StandardTokenizer, It tokenizes Chinese characters into each character. That makes it difficult to extract index keywords and map Chinese character to Hangul Character. So, to use StandardTokenizer for KoreanAnalyzer, consecutive Chinese characters should not be tokenized. Can you change the StandardTokenizer as I mentioned? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856878#comment-13856878 ] SooMyung Lee commented on LUCENE-4956: -- Hi Robert, Thank you for your effort. Yes, I also found some problems in some case like WordSpaceAnalyzer and offset/posincs, and I made some improvement. so, I'll post the patch soon. I want to help you for the progress. but, I'm feeling strange when I work with you in this project. It is too difficult for me to figure out how I can help you. Please, let me know something that I can help you more detail. Is it helpful to you If I make some test cases ? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857107#comment-13857107 ] SooMyung Lee commented on LUCENE-4956: -- [~thetapi] Thank you for your comment, Uwe. I think I can make some improvements with above the problem like WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by this weekend. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857107#comment-13857107 ] SooMyung Lee edited comment on LUCENE-4956 at 12/26/13 9:54 PM: [~thetaphi] Thank you for your comment, Uwe. I think I can make some improvements with above the problem like WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by this weekend. was (Author: soomyung): [~thetapi] Thank you for your comment, Uwe. I think I can make some improvements with above the problem like WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by this weekend. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798887#comment-13798887 ] SooMyung Lee commented on LUCENE-4956: -- [~rcmuir], Thanks again. I have run a test case with new hanja-hangul mapping files. It works very well. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798230#comment-13798230 ] SooMyung Lee commented on LUCENE-4956: -- [~bmargulies] Korean Tokenizer has the feature that identify language (Korean, English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has some different cases. First, eojeol consists of only Korean letters, Second, eojeol can be a combination of Korean letter and alphanumeric letter. Third, eojeol consists of only all alphanumeric letters. Fourth eojeol consists of Chinese letters. Tokinizer treat first and second case as Korean so Korean Morphological analysis is processed in Korean-filter. In second case, I copied code from standard-filter for korean-filter. In third case, Korean-filter map Chinese letter to Korean sound and then if it is a compound noun, decompounding is processed based on dictionary. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798230#comment-13798230 ] SooMyung Lee edited comment on LUCENE-4956 at 10/17/13 6:28 PM: [~bmargulies] Korean Tokenizer has the feature that identify language (Korean, English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has some different cases. First, eojeol consists of only Korean letters, Second, eojeol can be a combination of Korean letter and alphanumeric letter. Third, eojeol consists of only all alphanumeric letters. Fourth eojeol consists of Chinese letters. Tokinizer treat first and second case as Korean so Korean Morphological analysis is processed in Korean-filter. In third case, I copied code from standard-filter for korean-filter. In fourth case, Korean-filter map Chinese letter to Korean sound and then if it is a compound noun, decompounding is processed based on dictionary. was (Author: soomyung): [~bmargulies] Korean Tokenizer has the feature that identify language (Korean, English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has some different cases. First, eojeol consists of only Korean letters, Second, eojeol can be a combination of Korean letter and alphanumeric letter. Third, eojeol consists of only all alphanumeric letters. Fourth eojeol consists of Chinese letters. Tokinizer treat first and second case as Korean so Korean Morphological analysis is processed in Korean-filter. In second case, I copied code from standard-filter for korean-filter. In third case, Korean-filter map Chinese letter to Korean sound and then if it is a compound noun, decompounding is processed based on dictionary. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798319#comment-13798319 ] SooMyung Lee commented on LUCENE-4956: -- Hi, all. I' going to explain how I develop this code as Christian recommended because of license and legal problem that [~jkrupan] mentioned in previous comment. I started to write this code and dictionary in 2006 based on a book which author is Seung-Shik, Kang who is a professor of Kookmin university now. the dictionary consist of several files but major files are total.dic, josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I collected basic stem words for total.dic and particles for josa.dic and eomi.dic from book and various websites. and then I surveyed how basic stem words can be used on online dictionaries. and I only referred to the book to make syllable.dic. the rest of files is created by myself during developing except for mapHanja.dic. I added this file two years ago. I'm not sure that this file has not legal problem because many data came from projects result so it is better to remove that data. to make source code, I referred to the book so major logic was based on the book except for some utilities classes such as String, File and Trie.java. I copied most of utilities classes from apache common project but Trie.java from other website. I cannot remember the exact website now because it was happend long time ago. but I remember that I read the license that was Apache license. I finished first version in 2008 and created an online community on a website (called Naver) and uploaded the source code. the number of community members are over 3700 currently. I attended an opensource contest held by Korean government organization in 2009. During the contest, I uploaded the source code to the Sourceforge and got a BlackDuck license test with this code and passed the test. I have supported users through the online community (http://cafe.naver.com/korlucene). so some users improved dictionaries and source codes and then posted it on the website. and I merged it and opened it again. This is the wohle process how I developed the code. If anybody has something to recommend, Please let me know it. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798678#comment-13798678 ] SooMyung Lee commented on LUCENE-4956: -- [~bmargulies] WordSpaceAnalyzer has the feature. If fail to analyze Korean Morphology, KoreanFilter makes WordSpaceAnalyzer try to split eojeol. I'll add some test case. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798686#comment-13798686 ] SooMyung Lee commented on LUCENE-4956: -- Hi [~rcmuir], Thank you for your comment. I can reconstitute the hanja-hangul mappings file by myself if we cannot find other sources with clear licenses. I can easily get hanja list that often appear in Korean sentence. after then I'll look up online dictionary. I can start with 3,000~4,000 hanjas that is most often appeared Korean sentences. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798686#comment-13798686 ] SooMyung Lee edited comment on LUCENE-4956 at 10/18/13 1:06 AM: Hi [~rcmuir], Thank you for your comment. I can reconstitute the hanja-hangul mappings file by myself if we cannot find other sources with clear licenses. I can easily get hanja list that often appear in Korean sentence. after then I'll look up online dictionary. I can start with 3,000~4,000 hanjas that is most often appeared in Korean sentences. was (Author: soomyung): Hi [~rcmuir], Thank you for your comment. I can reconstitute the hanja-hangul mappings file by myself if we cannot find other sources with clear licenses. I can easily get hanja list that often appear in Korean sentence. after then I'll look up online dictionary. I can start with 3,000~4,000 hanjas that is most often appeared Korean sentences. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798796#comment-13798796 ] SooMyung Lee commented on LUCENE-4956: -- Great, thanks a lot ! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794683#comment-13794683 ] SooMyung Lee commented on LUCENE-4956: -- [~thetaphi] Thank you for your advice, I have opened this source code in sourceforege since 2009 and have many users. but, nobody told me the bugs and I also didn't know that. Christian and myself will fix the bugs soon. Thank you again. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789512#comment-13789512 ] SooMyung Lee commented on LUCENE-4956: -- Christian, Yes, I worked my last patch against on lucene4956. I'll check the problem and inform you how to solve it within today. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786899#comment-13786899 ] SooMyung Lee commented on LUCENE-4956: -- Hi Christian, I didn't hear any news from you since last August. Do you have any problem with moving to next step? I run a Korean developers community for the Korean Analyzer. I announced that Arirang analyzer will be incorporated into lucene and solr soon. So, many developers are waiting for that. I want we go to next step quickly. If you need any help, Please let me know. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SooMyung Lee updated LUCENE-4956: - Attachment: lucene-4956.patch Hi Christian, I have sync up and I made some modification. I'm attaching the patch. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739548#comment-13739548 ] SooMyung Lee commented on LUCENE-4956: -- Hi, Christian Thank you for your effort. I'll review the changes what you made. I've been also made some improvements. I'll upload the patch soon. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704145#comment-13704145 ] SooMyung Lee commented on LUCENE-4956: -- Hi, Steve I see you created 4.4 branch for releasing. After I looked over it, I found that the Korean analyzer(Arirang) is missing. Can you tell me when the korean analyzer can be incorporated into the release. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704200#comment-13704200 ] SooMyung Lee commented on LUCENE-4956: -- Hi, Christian I can understand your situation. I know you run the company. I was just wondering if there is any problem with integrating it. If you need any help, please let me know it. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663622#comment-13663622 ] SooMyung Lee commented on LUCENE-4956: -- Hi, Christian. I have named the korean analyzer package as kr but recently I found that It is incorrect. kr is the country code of the south korean and kp is the country code of the north korea. I think ko is more suitable for the name of the korean anzlyzer package. ko is the korean language code. So, I want you to rename the korean analyzer package from kr to ko. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SooMyung Lee updated LUCENE-4956: - Attachment: lucene4956.patch I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer work. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327 ] SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:06 AM: I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer to work. was (Author: soomyung): I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer work. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327 ] SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:46 AM: Hi, christian. I have made some changes of the source code and uploaded that. I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer to work. I hope this is of some help to you! was (Author: soomyung): I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer to work. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661327#comment-13661327 ] SooMyung Lee edited comment on LUCENE-4956 at 5/18/13 10:46 AM: Hi, christian. I have made some changes of the source code and uploaded that. I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how the korean analyzer works. I hope this is of some help to you! was (Author: soomyung): Hi, christian. I have made some changes of the source code and uploaded that. I have changed the source code relating to keyword extraction. I have removed the properties relating to keyword extraction and changed the keyword extraction logic. I've also added a test case that describe how for the korean analyzer to work. I hope this is of some help to you! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657105#comment-13657105 ] SooMyung Lee commented on LUCENE-4956: -- Hi, Christian. Until this week, I'll prepare some test cases and documents that explain how the options work and why those are needed. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655122#comment-13655122 ] SooMyung Lee commented on LUCENE-4956: -- Cool! Thanks, Steve the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653453#comment-13653453 ] SooMyung Lee commented on LUCENE-4956: -- Hi Christian, Thanks for your great work. I'd like to ask you to modify the text_kr field type definition in schema.xml as follows {noformat} fieldType name=text_kr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KoreanTokenizerFactory/ filter class=solr.KoreanFilterFactory hasOrigin=true hasCNoun=true bigrammable=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_kr.txt/ /analyzer analyzer type=query tokenizer class=solr.KoreanTokenizerFactory/ filter class=solr.KoreanFilterFactory hasOrigin=false hasCNoun=false bigrammable=false/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_kr.txt/ /analyzer /fieldType {noformat} the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649633#comment-13649633 ] SooMyung Lee commented on LUCENE-4956: -- [~cm] I'm sorry that I didn't reply to your comment on the last weekend! I'm seeing that [~steve_rowe] solved your problem. am I right? [~steve_rowe] I checked the method. isNounPart() is no more necessary. Spaces should be inserted between phrases in a korean sentence, but many people are confused in where inserting spaces. The isNounPart() method examine if spaces should be inserted at a specific position only when a noun existing in the dictionary precede it. After testing, I found that the method is superfluous. I'm sorry not to correct the source code before contributing. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644282#comment-13644282 ] SooMyung Lee edited comment on LUCENE-4956 at 4/29/13 8:07 AM: --- Hi Steve, What should I do in the present situation, Do I need to make a correction to all issues and submit new tarball? Please let me know what I have to do to move forward! was (Author: soomyung): Hi Steve and Christian, What should I do in the present situation, Do I need to make a correction to all issues and submit new tarball? Please let me know what I have to do to move forward! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644282#comment-13644282 ] SooMyung Lee commented on LUCENE-4956: -- Hi Steve and Christian, What should I do in the present situation, Do I need to make a correction to all issues and submit new tarball? Please let me know what I have to do to move forward! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643527#comment-13643527 ] SooMyung Lee commented on LUCENE-4956: -- Hi Steve, I think arirang is the best name for the korean analysis modules. arirang is the name of traditional korean song. So, I think arirang can represent korean analysis modules well. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
SooMyung Lee created LUCENE-4956: Summary: the korean analyzer that has a korean morphological analyzer and dictionaries Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SooMyung Lee updated LUCENE-4956: - Description: Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. (was: Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is best idea to choose the korean analyzer.) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SooMyung Lee updated LUCENE-4956: - Attachment: kr.analyzer.4x.tar 8edffacb15b3964f25054c82c0d4ea92 the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org