[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797876#comment-13797876
]
Benson Margulies commented on LUCENE-4956:
------------------------------------------
As a potential user of this technology, I'd like to ask for it to have
documentation of its linguistic approach.
* What is the goal of the tokenizer? Is it to deliver eojeol or hyung-tae-so?
If eojeol, does it split up the case where Korean writers are sometimes relaxed
about whitespace between them?
* Similarly, what does it set out to index? Does it index eojeol and them also
their contained eumjeol or hyung-tae-so, using position-increment /
position-length to indicate compound relationships.
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.2
> Reporter: SooMyung Lee
> Assignee: Christian Moen
> Labels: newbie
> Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch,
> lucene4956.patch, LUCENE-4956.patch
>
>
> Korean language has specific characteristic. When developing search service
> with lucene & solr in korean, there are some problems in searching and
> indexing. The korean analyer solved the problems with a korean morphological
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene
> and solr. If you develop a search service with lucene in korean, It is the
> best idea to choose the korean analyzer.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]