[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239223#comment-13239223
 ] 

Kazuaki Hiraga commented on LUCENE-3921:
----------------------------------------

Christian, Thank you for your comments and give us details about your 
experiments.

I think the length restriction is an acceptable option for splitting the 
Katakana compound token.  I tried to use DictionaryCompoundWordTokenFilter for 
this purpose and we were able to get the similar result what we expected. But 
we don't want to rely on another dictionary that along with dictionaries for 
Japanese tokenizer.  So, if possible, we would like to have such functionality 
in Kuromoji.

                
> Add decompose compound Japanese Katakana token capability to Kuromoji
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3921
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3921
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.0
>         Environment: Cent OS 5, IPA Dictionary, Run with "Search mdoe"
>            Reporter: Kazuaki Hiraga
>              Labels: features
>
> Japanese morphological analyzer, Kuromoji doesn't have a capability to 
> decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
> that some Katakana tokens can be decomposed, but it cannot be applied every 
> Katakana compound tokens. For instance, "トートバッグ(tote bag)" and "ショルダーバッグ" 
> don't decompose into "トート バッグ" and "ショルダー バッグ" although the IPA dictionary 
> has "バッグ" in its entry.  I would like to apply the decompose feature to every 
> Katakana tokens if the sub-tokens are in the dictionary or add the capability 
> to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to