[ 
https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495157#comment-17495157
 ] 

Uwe Schindler commented on LUCENE-10416:
----------------------------------------

I have one question: if you have indexed text using the Korean analyzer - do 
you need to reindex or is it "mostly fine"?

The problem is if tokens are generated with different rules or normalization, 
you won't find them in index anymore.

In older Lucene versions we had "matchVersion" parameter for this, but this 
would require to ship with both dictionaries.

If there are significant changes we should shop this only in 10.0, not with 
version 9.1.

> Update Korean Dictionary for Nori
> ---------------------------------
>
>                 Key: LUCENE-10416
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10416
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Uihyun Kim
>            Priority: Minor
>             Fix For: 9.1, 10.0 (main)
>
>         Attachments: LUCENE-10416.patch
>
>
> For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, 
> which is available under an Apache license here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic]
>  
> The dictionary hasn't been updated in Nori although it has some updates to 
> provide better analysis results. Downloading is available here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
>  * Currently used in Nori: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  * Latest: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  
> There are changes between the currently used version and the latest release 
> version(change log: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
>  * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
>  * Fix: correct unexpectedly huge cost on NNG/장소
>  * New words
>  
> There's no issue with testing :lucene:analysis:nori:test and building a new 
> binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to