[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895699#comment-13895699 ] Dai Deqi commented on LUCENE-4956: -- Dear Lucene Korean Team, I posted the following at sourceforge too. Thank you for your time. Would appreciate any inputs or assistance you can provide. Respectfully, Deqi Dear Lucene Korean Team, Hi, I'm a translator working with OmegaT and the OmegaT developers (see Yahoo! OmegaT group). Thank you all very much for the hard work you've put into this analyzer. I was so excited when I came across it! As a result, I asked the OmegaT developers if they could include your Korean analyzer into OmegaT and they did. The unfortunate part is that the analyzer does not appear to be working. See the e-mails pasted below for more information. And I would respectfully like to ask a few questions. Would you happen to know why this is happening? If there's a problem, do you know if it will be fixed in future releases? Finally, may I ask how this analyzer and the one here are related: https://issues.apache.org/jira/browse/LUCENE-4956 Thank you all in advance for your time. Respectfully, Deqi Dear Colleagues, RE: http://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/20023 I'm interested in adding a Korean-specific analyzer/tokenizer to OmT 3.0.8 because of the simplicity of the CJK tokenizer described in the RE. To that end, I downloaded KoreanAnalyzer-20100302.jar and, since I'm using a Mac, put in the .app lib folder and updated the Info.plist file to point to the new jar file. Does anyone else know what needs to be done? How do I make OmT aware of the new analyzer and use it by default? I'd be very grateful for any assistance and apologize in advance if I don't know the difference between an analyzer and a tokenizer. For those working in Korean, there's another apparently related analyzer, but I have no idea of how to work with it: https://issues.apache.org/jira/browse/LUCENE-4956 V/R, Dai Deqi Hi Aaron, Good news and bad news. I built OmT with the new Korean analyzer that you so graciously added with no problems at all. However, the new Korean-only analyzer doesn't appear to be working as well as the CJK analyzer. I'm assuming analyzer/tokenizer differences will show up most noticeably in the Glossary pane. And that's where I'm seeing big differences. For example, the simple sentence below 그 전문은 다음과 같다. produces Transtips and Glossary hits using the CJK analyzer, but nothing with the new Korean-only analyzer. That was quite disappointing. If there are any other tests you or anyone else can suggest or would like me to try, please let me know. I've never done this kind of testing before. All the Best, Dai Deqi Hello. I just did a quick test of the KoreanAnalyzer lib and found that while the tokenizer seems to work fine, the analyzer part (which is used for glossary and Transtips, etc.) doesn't seem to work at all. Input: 그 전문은 다음과 같다. Tokenizer output: [ 그, 전문은, 다음과, 같다 ] Analyzer output: [ ] In other words the analyzer simply does not output anything, which means that no matches will be found. I'm not sure what to make of this, as we are using the library in the same way as any other Lucene analyzer. This suggests to me that the code is broken; if there's some workaround then perhaps the author of the library can help us, but otherwise we will just have to wait until the standalone library is fixed or a final version is integrated into Lucene. -Aaron Actually, sorry, I was wrong; the analyzer's output is empty for the example sentence you supplied, but that is not true for the general case. For a sentence I took from Wikipedia: Input: 위키백과는 전 세계 여러 언어로 만들어 나가는 자유 백과사전으로, 누구나 참여하실 수 있습니다. Tokenization: [ 위키백과는, 전, 세계, 여러, 언어로, 만들어, 나가는, 자유, 백과사전으로, 누구나, 참여하실, 수, 있습니다 ] Analysis: [ 위키백과는, 위키백, 위키, 키백 ] I thought at first this was the result of a very aggressive stopwords filter or something, but the result is the same even when supplying an empty stopwords set. Plus, Google Translate tells me that the analysis result is basically: [ Wikipedia, Wikipedia, Wiki, pedia ] (all substrings of the first token) So it seems the conclusion is the same: The analysis is broken, or at least behaves completely differently from all standard Lucene analyzers. -Aaron the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar,
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895706#comment-13895706 ] Benson Margulies commented on LUCENE-4956: -- This is a patch, not an accepted component of Apache Lucene. There's no guarantee that anyone will work on it. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895754#comment-13895754 ] SooMyung Lee commented on LUCENE-4956: -- Hi [~daedeqi], I created SourceForge project and contribute it to Apache Lucene project. I'm working on fixing some problems mentioned in this Jira issue. But the problem you mentioned about 위키백과는 is to be solved if you add the word 위키 into the dictionary. There is no perfect way to analyze Korean sentences according to only a algorithm such as Porter Stemming in English sentence. So, we use both algorithm and dictionary to analyze Korean sentence. The dictionary for the Korean analyzer has around 40,000 words. I think most of the basic Korean word is included in it. But about many loan words such as 위키 is not included. Usually the user of Korean Analyzer in Korea builds his own dictionary for his purpose. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895773#comment-13895773 ] Aaron Madlon-Kay commented on LUCENE-4956: -- Hello. I am involved with the OmegaT project; I am the Aaron referenced in [~daedeqi]'s quoted emails. We are using the version of this tokenizer that is hosted on SourceForge. This issue should not have been brought up here at all. I apologize for the intrusion. I will follow up with [~soomyung] privately. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895808#comment-13895808 ] Dai Deqi commented on LUCENE-4956: -- Dear Aaron and All, I humbly apologize if I breached an unspoken internet/project protocol. Such was not my intention. In fact, I thought I was helping. In any event, I'll be much more careful in the future. Very Respectfully, Dai Deqi On Saturday, February 8, 2014 6:50 PM, Aaron Madlon-Kay (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895773#comment-13895773] Aaron Madlon-Kay edited comment on LUCENE-4956 at 2/8/14 11:43 PM: --- Hello. I am involved with the OmegaT project; I am the Aaron referenced in [~daedeqi]'s quoted emails. We are using the version of this tokenizer that is hosted on SourceForge. Our issue should not have been brought up here at all. I apologize for the intrusion. I will follow up with [~soomyung] privately. was (Author: amake): Hello. I am involved with the OmegaT project; I am the Aaron referenced in [~daedeqi]'s quoted emails. We are using the version of this tokenizer that is hosted on SourceForge. This issue should not have been brought up here at all. I apologize for the intrusion. I will follow up with [~soomyung] privately. -- This message was sent by Atlassian JIRA (v6.1.5#6160) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868971#comment-13868971 ] SooMyung Lee commented on LUCENE-4956: -- [~thetaphi] I'm trying to change the code to use StandardTokenizer but I found a problem. when a text with Chinese characters is passed into the StandardTokenizer, It tokenizes Chinese characters into each character. That makes it difficult to extract index keywords and map Chinese character to Hangul Character. So, to use StandardTokenizer for KoreanAnalyzer, consecutive Chinese characters should not be tokenized. Can you change the StandardTokenizer as I mentioned? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856878#comment-13856878 ] SooMyung Lee commented on LUCENE-4956: -- Hi Robert, Thank you for your effort. Yes, I also found some problems in some case like WordSpaceAnalyzer and offset/posincs, and I made some improvement. so, I'll post the patch soon. I want to help you for the progress. but, I'm feeling strange when I work with you in this project. It is too difficult for me to figure out how I can help you. Please, let me know something that I can help you more detail. Is it helpful to you If I make some test cases ? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857081#comment-13857081 ] Uwe Schindler commented on LUCENE-4956: --- Hi, I have the same problems like Robert with some code parts. Partly the code is un-understandable and it looks like some places just have workarounds around silly bugs in the original code (like the catch ArrayIndexOutOfBoundsException and resuming with completely different code paths). Also the code generates like 5 completely new java.util.Collections (Lists, Maps,...) per token, without even reusing the previous ones! The code has lots of problems with offsets and positions (sometimes we workarounded using Math.max(0, positionFromCrazyCode). The code as it is will not pass TestRandomChains! Robert and I already rewrote lots of the code and also removed the GPL code. At this point it is still not in a state that can be committed to Lucene trunk or even backporting it. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857097#comment-13857097 ] Uwe Schindler commented on LUCENE-4956: --- There is one other problem we have to solve: The code ships with a slightly modified version of a very old version of StandardTokenizer, compiled with the not JVM invariant version of JFlex (the one which uses unicode tables from jvm crafting the source code). We should use the default StandardTokenizer and modify the filter to use the newly added types. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857107#comment-13857107 ] SooMyung Lee commented on LUCENE-4956: -- [~thetapi] Thank you for your comment, Uwe. I think I can make some improvements with above the problem like WordSpaceAnalyzer, posinc/offset and KoreanTokenizer. I'll upload the patch by this weekend. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856699#comment-13856699 ] Steve Rowe commented on LUCENE-4956: [~rcmuir], [~thetaphi], [~cm]: is there anything blocking merging the branch into trunk and branch_4x? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856704#comment-13856704 ] Robert Muir commented on LUCENE-4956: - Yes, the nocommits. In general there are not enough tests to proceed fixing more things. I took it as far as I could with TestCoverageHack, but the stuff like the AIOOBE-catching is just the tip of the iceberg to some problems in WSOutput/WordSpaceAnalyzer. The main challenge here is just that, there are many many many special cases happening in the analysis logic. There needs to be good tests for these, rather than just testing that the analysis does not change because currently the analysis does really funky things in some situations and needs to change. TokenStream logic needs cleanup too: offsets/posincs and so on need to work and BaseTokenStreamTestCase.checkRandomData etc should pass. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812034#comment-13812034 ] Benson Margulies commented on LUCENE-4956: -- Looks like mapHanja.dic needs some adjustment of the legal notice? Or was this going to be replaced? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812039#comment-13812039 ] Robert Muir commented on LUCENE-4956: - please point to specific files in svn that you have concerns about. I recreated this file myself from clearly attributed sources, from scratch. It has *MORE THAN ENOUGH* legal notice. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812044#comment-13812044 ] Benson Margulies commented on LUCENE-4956: -- My point is that it might have a bit too much legal notice. Generally, when someone grants a license, the headers all move up to some global NOTICE file, and the file is left with just an Apache license. I also noted the following: ! Except as contained in this notice, the name of a copyright holder shall not be ! used in advertising or otherwise to promote the sale, use or other dealings in ! these Data Files or Software without prior written authorization of the copyright holder. and then noticed: that http://www.apache.org/legal/resolved.html says that it approves of * BSD (without advertising clause). So that Unicode license is possibly an issue. Right now I'm using the git clone, but I just did a pull, and the pathname is lucene/analysis/arirang/src/data/mapHanja.dic the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812047#comment-13812047 ] Robert Muir commented on LUCENE-4956: - {quote} So that Unicode license is possibly an issue. {quote} No, its not. https://issues.apache.org/jira/browse/LEGAL-108 the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812050#comment-13812050 ] Benson Margulies commented on LUCENE-4956: -- That jira concerns a different license. The license on the file pointed-to there has no advertising clause that I can spot. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812051#comment-13812051 ] Robert Muir commented on LUCENE-4956: - This is the unicode license that all of their data and code comes from. There is only one. Please, don't waste my time here, if you want to waste the legal team's time, thats ok :) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812052#comment-13812052 ] Robert Muir commented on LUCENE-4956: - The exact question about using unicode data tables has been answered explicitly already: http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200903.mbox/%3c3d4032300903030415w4831f6e4u65c12881cbb86...@mail.gmail.com%3E I don't think it needs any further discussion the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812053#comment-13812053 ] Benson Margulies commented on LUCENE-4956: -- Rob, I got shat on at great length over this for merely test data over at the WS project. I had to make the build pull the data over the network to get certain directors off of my back. I'm trying to spare you the experience. That's all. As a low-intensity member of the UTC, I would also expect there to be only one license. However, I compare: {noformat} # Copyright (c) 1991-2011 Unicode, Inc. All Rights reserved. # # This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No # claims are made as to fitness for any particular purpose. No warranties of # any kind are expressed or implied. The recipient agrees to determine # applicability of information provided. If this file has been provided on # magnetic media by Unicode, Inc., the sole remedy for any claim will be # exchange of defective media within 90 days of receipt. # # Unicode, Inc. hereby grants the right to freely use the information # supplied in this file in the creation of products supporting the # Unicode Standard, and to make copies of this file in any form for # internal or external distribution as long as this notice remains # attached. {noformat} with {noformat} ! Copyright (c) 1991-2013 Unicode, Inc. ! All rights reserved. ! Distributed under the Terms of Use in http://www.unicode.org/copyright.html. ! ! Permission is hereby granted, free of charge, to any person obtaining a copy ! of the Unicode data files and any associated documentation (the Data Files) ! or Unicode software and any associated documentation (the Software) to deal ! in the Data Files or Software without restriction, including without limitation ! the rights to use, copy, modify, merge, publish, distribute, and/or sell copies ! of the Data Files or Software, and to permit persons to whom the Data Files or ! Software are furnished to do so, provided that (a) the above copyright notice(s) ! and this permission notice appear with all copies of the Data Files or Software, ! (b) both the above copyright notice(s) and this permission notice appear in ! associated documentation, and (c) there is clear notice in each modified Data ! File or in the Software as well as in the documentation associated with the Data ! File(s) or Software that the data or software has been modified. ! ! THE DATA FILES AND SOFTWARE ARE PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, ! EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO ! EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ! ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES ! WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF ! CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION ! WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE. ! ! Except as contained in this notice, the name of a copyright holder shall not be ! used in advertising or otherwise to promote the sale, use or other dealings in ! these Data Files or Software without prior written authorization of the copyright holder. {noformat} They look pretty different to me. Go figure? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812056#comment-13812056 ] Robert Muir commented on LUCENE-4956: - {quote} Rob, I got shat on at great length over this for merely test data over at the WS project. I had to make the build pull the data over the network to get certain directors off of my back. I'm trying to spare you the experience. That's all. {quote} Then perhaps you should push back hard when people don't know what they are talking about, like I do. As i said, the question about using unicode data tables has already been directly answered. {quote} As a low-intensity member of the UTC, I would also expect there to be only one license. However, I compare: {quote} I am also one. this means nothing. {quote} They look pretty different to me. Go figure? {quote} There is only one license from the terms of use page http://www.unicode.org/copyright.html That is what I include. Whoever created your other license decided to omit some of the information, which I did not. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812059#comment-13812059 ] Benson Margulies commented on LUCENE-4956: -- OK, I see, the email thread about Unicode data in general does certainly cover this. Sometimes the workings of Legal are pretty perplexing. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812065#comment-13812065 ] Robert Muir commented on LUCENE-4956: - Just so anyone reading the thread knows: the clause Benson mentioned is not an advertising clause: {quote} Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder. {quote} The BSD advertising clause reads like this: {quote} All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the organization. {quote} These are very different. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807877#comment-13807877 ] Benson Margulies commented on LUCENE-4956: -- Hmm. When I followed the link, I found a .tar.gz. I guess the zip was further down the page. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807918#comment-13807918 ] Benson Margulies commented on LUCENE-4956: -- Something's funny here. On this page (http://www.kristalinfo.com/TestCollections/), the zip file has directories like HANTEC-2.0/relevance_file/+�++/L2.rel The code in the patch expects the word 'full' in latin-alphabet, no funny full-width, in the that intermediate directory. On the other hand, I can't seem to find an unzip with a documented -O option on Linux. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807987#comment-13807987 ] Robert Muir commented on LUCENE-4956: - nothing is funny, i renamed them locally, sorry. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807567#comment-13807567 ] Benson Margulies commented on LUCENE-4956: -- Could you share the trick of unpacking the big tarball, locale-wise? I ended up with: [benson] /data/HANTEC-2.0 % ls relevance_file %B0%FA%C7б%E2%BC%FA%BAо%DF %C0%FCü which does not work so well. Did you set LOCALE to something before unpacking? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807568#comment-13807568 ] Robert Muir commented on LUCENE-4956: - unzip -O cp949 HANTEC-2.0.zip the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806405#comment-13806405 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536174 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536174 ] LUCENE-4956: move this out of dictionaryutil the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806422#comment-13806422 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536184 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536184 ] LUCENE-4956: replace some getWord != null with hasWord the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806475#comment-13806475 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536214 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536214 ] LUCENE-4956: more speedup,style,refactoring the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806497#comment-13806497 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536231 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536231 ] LUCENE-4956: more refactoring of decompounding the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806498#comment-13806498 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536233 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536233 ] LUCENE-4956: move all the list creation out of compoundnounanalyzer the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806519#comment-13806519 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536235 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536235 ] LUCENE-4956: improve the runtime of maxWord the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806556#comment-13806556 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1536244 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1536244 ] LUCENE-4956: more cleanups and remove n^2 in filterIncorrect the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801543#comment-13801543 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534514 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534514 ] LUCENE-4956: More optimization on captureState the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800391#comment-13800391 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534030 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534030 ] LUCENE-4956: move dictionary entry classes to dictionary package the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800398#comment-13800398 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534032 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534032 ] LUCENE-4956: don't use wordentry for uncompound processing the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800411#comment-13800411 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534040 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534040 ] LUCENE-4956: don't hold thousands of arrays in dictionary the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800577#comment-13800577 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534115 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534115 ] LUCENE-4956: commit working state the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800602#comment-13800602 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534128 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534128 ] LUCENE-4956: remove trie the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800611#comment-13800611 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534135 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534135 ] LUCENE-4956: Fix file not found case, add close on finally the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800634#comment-13800634 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534141 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534141 ] LUCENE-4956: add some cleanups, remove packing, add missing close, lazy-load compound data until you ask for it the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801042#comment-13801042 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534364 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534364 ] LUCENE-4956: use a byte1 jamo FST, smaller and much faster the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801444#comment-13801444 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534472 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534472 ] LUCENE-4956: do this simpler/faster like kuromoji the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801445#comment-13801445 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534473 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534473 ] LUCENE-4956: pull out broken acronym/etc handling, user can just use classicfilter for that the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801446#comment-13801446 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534477 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534477 ] LUCENE-4956: don't captureState unless we have to the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800098#comment-13800098 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533857 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533857 ] LUCENE-4956: Partially fix posIncrAtt to preserve increment of first token. The morphQueue still has a bug, added nocommit! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800099#comment-13800099 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533858 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533858 ] LUCENE-4956: Rename IndexWord to Token like in Kuromoji! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800108#comment-13800108 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533862 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533862 ] LUCENE-4956: Simplier fix for the broken posIncr. I also cleaned up the Token class and made private to the Filter the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800110#comment-13800110 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533863 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533863 ] LUCENE-4956: Remove useless getters in private class the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800113#comment-13800113 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533865 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533865 ] LUCENE-4956: fix malformed entries the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800117#comment-13800117 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533872 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533872 ] LUCENE-4956: remove slow caseless match in trie, don't read headers as actual entries the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800123#comment-13800123 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533877 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533877 ] LUCENE-4956: ban dictionary corrumption the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800156#comment-13800156 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533923 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533923 ] LUCENE-4956: Rewrite iterator-consumer; make stupid Exception more selective. I have no idea how to fix! the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800344#comment-13800344 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534021 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534021 ] LUCENE-4956: clean up compound / feature processing a bit (more coming) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800382#comment-13800382 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1534029 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1534029 ] LUCENE-4956: more morph cleanups the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799794#comment-13799794 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533695 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533695 ] LUCENE-4956: move data to src/data and setup regeneration (for now simple copy) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799835#comment-13799835 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533709 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533709 ] LUCENE-4956: Remove unused file constants the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799937#comment-13799937 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533781 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533781 ] LUCENE-4956: allow use of these with datainput the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799989#comment-13799989 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533813 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533813 ] LUCENE-4956: refactor syllable handling to not be a list of thousands of arrays the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1371#comment-1371 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533815 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533815 ] LUCENE-4956: Fix error handling in HanjaUtil to prevent NPE on broken classpath the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1372#comment-1372 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533817 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533817 ] LUCENE-4956: Fix error handling in HanjaUtil to prevent NPE on broken classpath the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1375#comment-1375 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533821 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533821 ] LUCENE-4956: Use IOUtils.decodingReader to load data files the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1381#comment-1381 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533835 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533835 ] LUCENE-4956: more cleanups and visibility fixes the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800011#comment-13800011 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533838 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533838 ] LUCENE-4956: Move/Rename some files and make pkg-private the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800034#comment-13800034 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533842 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533842 ] LUCENE-4956: Fix stopwords file, Cleanup analyzer (load stopwords file, no hardcoded stops), and filter (fix broken incrementToken, implement reset), remove unused varaibles in CompoundNounAnalyzer the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800035#comment-13800035 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533843 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533843 ] LUCENE-4956: Make filter final, add one more nocommit the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800049#comment-13800049 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533846 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533846 ] LUCENE-4956: Fix factory to check for incorrect parameter keys, remove bogus parameters, remove bogus matchVersion on KoreanTokenizer the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798832#comment-13798832 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533329 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533329 ] LUCENE-4956: generate new mapHanja.dic from sources with clear license the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798836#comment-13798836 ] Robert Muir commented on LUCENE-4956: - OK, see new file here: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4956/lucene/analysis/arirang/src/resources/org/apache/lucene/analysis/ko/dic/mapHanja.dic?revision=1533329view=markup Generated with: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4956/lucene/analysis/arirang/src/tools/java/org/apache/lucene/analysis/ko/GenerateHanjaMap.java?revision=1533329view=markup hanja keys: 27784 hanja/hangul mappings: 28861 the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798887#comment-13798887 ] SooMyung Lee commented on LUCENE-4956: -- [~rcmuir], Thanks again. I have run a test case with new hanja-hangul mapping files. It works very well. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798894#comment-13798894 ] Uwe Schindler commented on LUCENE-4956: --- Hi, I reviewed the Trie.java code and its usage yesterday. Trie.java is only used at 2 places with same usage pattern: - DictionaryUtil#dictionary - Tagger#occurences In both cases there are only 2 types of matches: - DictionaryUtil#findWithPrefix: returns an Iterator of all entries with a given prefix - DictionaryUtil#getWord: returns WordEntry for an exact match - Tagger#getGR: returns iterator of all entries with a given prefix These use cases are not really the ones a Trie is made for, so the ideal and most performant would be to USE Lucene's FST implementation. We would also get an Iterator-like interface to look up prefixes. So I would suggest to replace these 3 methods by an FST backing them. The dictionary would then (like for kuromoji) be preprocessed and saved as serialized FST in the resource file. The original dictionary as text file would only be available in the Lucene source distribution to regenerate the FST. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798897#comment-13798897 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533355 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533355 ] LUCENE-4956: Remove useless synchronization (no lazy loading anymore) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798903#comment-13798903 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533358 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533358 ] LUCENE-4956: Reorder loading the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798918#comment-13798918 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533362 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533362 ] LUCENE-4956: Make WordEntry components final the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798959#comment-13798959 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533371 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533371 ] LUCENE-4956: Quick fix to remove the Trie for the Tagger. The file is very slow and a TreeMap is perfectly fine. Can still be improved, but the primary concern is to remove Trie.java the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798980#comment-13798980 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533378 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533378 ] LUCENE-4956: Remove debug output, make map unmodifiable the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798997#comment-13798997 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533382 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533382 ] LUCENE-4956: Don't load full files into big List, instead process them line by line (the current code uses iterator anyway). the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799043#comment-13799043 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533403 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533403 ] LUCENE-4956: Move empty line removal up the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799140#comment-13799140 ] Robert Muir commented on LUCENE-4956: - {quote} So I would suggest to replace these 3 methods by an FST backing them. {quote} I'm starting on this today. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799158#comment-13799158 ] Uwe Schindler commented on LUCENE-4956: --- bq. So I would suggest to replace these 3 methods by an FST backing them. This one is already gone: Tagger#getGR(prefix) So the big dictionary is the thing to do. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799205#comment-13799205 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533517 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533517 ] LUCENE-4956: Make parser more strict, remove bullshit from data files the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799214#comment-13799214 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533521 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533521 ] LUCENE-4956: Unify Exceptions the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799276#comment-13799276 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533550 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533550 ] LUCENE-4956: Tagger is completely dead code! Why did I put work into it? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799274#comment-13799274 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533549 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533549 ] LUCENE-4956: add TestCoverageHack the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799290#comment-13799290 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533557 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533557 ] LUCENE-4956: remove empty dir the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799314#comment-13799314 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533562 from [~rcmuir] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533562 ] LUCENE-4956: remove some dead code the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797876#comment-13797876 ] Benson Margulies commented on LUCENE-4956: -- As a potential user of this technology, I'd like to ask for it to have documentation of its linguistic approach. * What is the goal of the tokenizer? Is it to deliver eojeol or hyung-tae-so? If eojeol, does it split up the case where Korean writers are sometimes relaxed about whitespace between them? * Similarly, what does it set out to index? Does it index eojeol and them also their contained eumjeol or hyung-tae-so, using position-increment / position-length to indicate compound relationships. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798230#comment-13798230 ] SooMyung Lee commented on LUCENE-4956: -- [~bmargulies] Korean Tokenizer has the feature that identify language (Korean, English or Chinese) in Korean sentence. Usually, eojeol in Korean sentence has some different cases. First, eojeol consists of only Korean letters, Second, eojeol can be a combination of Korean letter and alphanumeric letter. Third, eojeol consists of only all alphanumeric letters. Fourth eojeol consists of Chinese letters. Tokinizer treat first and second case as Korean so Korean Morphological analysis is processed in Korean-filter. In second case, I copied code from standard-filter for korean-filter. In third case, Korean-filter map Chinese letter to Korean sound and then if it is a compound noun, decompounding is processed based on dictionary. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798319#comment-13798319 ] SooMyung Lee commented on LUCENE-4956: -- Hi, all. I' going to explain how I develop this code as Christian recommended because of license and legal problem that [~jkrupan] mentioned in previous comment. I started to write this code and dictionary in 2006 based on a book which author is Seung-Shik, Kang who is a professor of Kookmin university now. the dictionary consist of several files but major files are total.dic, josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I collected basic stem words for total.dic and particles for josa.dic and eomi.dic from book and various websites. and then I surveyed how basic stem words can be used on online dictionaries. and I only referred to the book to make syllable.dic. the rest of files is created by myself during developing except for mapHanja.dic. I added this file two years ago. I'm not sure that this file has not legal problem because many data came from projects result so it is better to remove that data. to make source code, I referred to the book so major logic was based on the book except for some utilities classes such as String, File and Trie.java. I copied most of utilities classes from apache common project but Trie.java from other website. I cannot remember the exact website now because it was happend long time ago. but I remember that I read the license that was Apache license. I finished first version in 2008 and created an online community on a website (called Naver) and uploaded the source code. the number of community members are over 3700 currently. I attended an opensource contest held by Korean government organization in 2009. During the contest, I uploaded the source code to the Sourceforge and got a BlackDuck license test with this code and passed the test. I have supported users through the online community (http://cafe.naver.com/korlucene). so some users improved dictionaries and source codes and then posted it on the website. and I merged it and opened it again. This is the wohle process how I developed the code. If anybody has something to recommend, Please let me know it. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798393#comment-13798393 ] Benson Margulies commented on LUCENE-4956: -- I am told (I don't read Korean myself) that people often leave out the white space between eojeol that are made up entirely of Hangul letters (Korean letters). Are you just defining these very long things to be single eojeol? Prof Kang in his own work has a module that splits these using some rules. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798434#comment-13798434 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533264 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533264 ] LUCENE-4956: Remove StringEscapeUtils by unescaping the mapHanja.dic the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798463#comment-13798463 ] Uwe Schindler commented on LUCENE-4956: --- Hi [~soomyung], thanks for the clarification. It was not [~jkrupan] that mentioned the GPL violation, it was Robert and me. I am glad that you are aware of this and you are trying to clarify this. Indeed the License of this file is hard to find out, because the Gnutella one (which is the original) has no license header. But the whole Gnutella project is GPL licensed. Those people also started to donate this code to Google Guava and wanted to relicense to ASF2, but this is not yet done. So we cannot use this code. The missing License header may be the reason for the Blackduck test to be happy. [~cm] offered to donate a PatriciaTrie he wrote himself. Maybe we can replace the gnutella one by this one. I would prefer the solution to not use a trie at all. Instead we should use Lucene's FTS feature and bundle the whole dictionary as a serialized FST (like kuromoji does). About the other copypasted code: I already removed all commons-io and commons-lang stuff. Commons-io was completely unneeded, because the resource handling to load resources from JAR files was not very good and can be done much easier by a simple Class#getResourceAsStream. I already implemented that and moved some class around, so be sure to update your svn before working more on the module. I also removed the \u-escaping from the mapHanja.dic file, so I was able to remove the StringEscapeUtil class, which did too much unescaping (not only \u, also \n, \t,...)! But we should really check the license of this file or create a new one from Unicode tables. I left the file in SVN (converted to plain UTF-8) for now. I am currently working on rewriting some code that creates too many small objects like strings all the time, because this slows down indexing! E.g. HanjaUtils should not use a String just to lookup a single char in a map. There are better data structures to hold the mapHanja table. We should also not use readLines() to load all dictionaries into heap, then use an iterator over it and convert them to something else. We should use a BufferedReader and read it line by line and do the processing directly. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798498#comment-13798498 ] Uwe Schindler commented on LUCENE-4956: --- bq. the rest of files is created by myself during developing except for mapHanja.dic. I added this file two years ago. I'm not sure that this file has not legal problem because many data came from projects result so it is better to remove that data. Do you have some documentation who gave the file to you or where you downloaded it? Some university? Some CD-ROM? the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798530#comment-13798530 ] Robert Muir commented on LUCENE-4956: - Maybe we can reconstitute this file from other hanja-hangul mappings with clear licenses? I have not done any processing, I will investigate sources such as https://code.google.com/p/google-input-tools/source/browse/src/chrome/os/nacl-hangul/misc/symbol.txt and unihan and see what it looks like. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798531#comment-13798531 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533277 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533277 ] LUCENE-4956: Remove lazy dictionary loading, don't convert to string all the time. This may be improved further if we use an array and substract the smallest codepoint value the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798536#comment-13798536 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533278 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533278 ] LUCENE-4956: More improvements the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798566#comment-13798566 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533282 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533282 ] LUCENE-4956: Remove thread-unsafe lazy loading. Initialize in static ctor the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798582#comment-13798582 ] ASF subversion and git services commented on LUCENE-4956: - Commit 1533286 from [~thetaphi] in branch 'dev/branches/lucene4956' [ https://svn.apache.org/r1533286 ] LUCENE-4956: Remove thread-unsafe lazy loading. Initialize in static ctor the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, lucene4956.patch, LUCENE-4956.patch Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org