[ 
https://issues.apache.org/jira/browse/LUCENE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860128#comment-16860128
 ] 

Namgyu Kim commented on LUCENE-8817:
------------------------------------

Thank you for your replies. [~tomoko] and [~cm] :D

I was surprised at your deep thoughts.
{code:java}
analysis
└── ???
         ├── common (module: analyzers-???-common)
         │       ├── build.xml
         │       └── src
         ├── kuromoji (module: analyzers-???-kuromoji)
         │       ├── build.xml
         │       └── src
         ├── nori (module: analyzers-???-nori)
         │       ├── build.xml
         │       └── src
         └── tools  (module: analyzers-???-tools)
                 ├── build.xml
                 └── src
{code}
I agree with the module structure proposed by Tomoko.
 In my personal opinion, "analysis" is better than "analyzers".
{quote}In terms of naming, what about using "statistical" instead of "mecab" 
for this class of analyzers?
 I'm thinking "Viterbi" could be good to refer to in shared tokenizer code.
 This said, I think it could be a good to refer to "mecab" in the dictionary 
compiler code, documentation, etc. to make sure users understand that we can 
read this model format.
 Any thoughts?
{quote}
About the name, the folder name "viterbi" looks much better than "statistical".
 But to be perfectly honest, I'm not sure that it's really right to use the 
algorithm name as the folder name.
 Most users probably don't know what viterbi is.
 It is also associated with the package name, and 
"org.apache.lucene.analysis.viterbi.ja" or "~.viterbi.ko" will confuse users.
 Or just use "org.apache.lucene.analysis.ja", it could be fine.
 It's because analysis-common is already doing like it.
 (not org.apache.lucene.common.cjk)
 It doesn't matter if we use it for administrative purposes, but I also want to 
hear some opinions from others.
{quote}how about using "kuromoji" in the top level module name for both of 
Japanese and Korean analyzers, and changing current module names "kuromoji" and 
"nori" to "kuromoji-ja" and "kuromoij-ko"?
{quote}
I personally don't agree to use kuromoji-ko instead of nori.
nori is already a familiar name to users.
They may be confused about it.

> Combine Nori and Kuromoji DictionaryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-8817
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8817
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Namgyu Kim
>            Priority: Major
>
> This issue is related to LUCENE-8816.
> Currently Nori and Kuromoji Analyzer use the same dictionary structure. 
> (MeCab)
>  If we make combine DictionaryBuilder, we can reduce the code size.
>  But this task may have a dependency on the language.
>  (like HEADER string in BinaryDictionary and CharacterDefinition, methods in 
> BinaryDictionaryWriter, ...)
>  On the other hand, there are many overlapped classes.
> The purpose of this patch is to provide users of Nori and Kuromoji with the 
> same system dictionary generator.
> It may take some time because there is a little workload.
>  The work will be based on the latest master, and if the LUCENE-8816 is 
> finished first, I will pull the latest code and proceed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to