[
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164954#comment-14164954
]
Christian Moen commented on LUCENE-3922:
----------------------------------------
I've attached a new patch.
The {{checkRandomData}} issues were caused by improper handling of token
composition for graphs (bug found by [~gaute]). Tokens preceded by position
increment zero token are left untouched and so are stacked/synonym tokens.
We'll do some more testing and add some documentation before we move forward to
commit this.
> Add Japanese Kanji number normalization to Kuromoji
> ---------------------------------------------------
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.0-ALPHA
> Reporter: Kazuaki Hiraga
> Assignee: Christian Moen
> Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and
> 十二月(December). So, we would like to normalize those Kanji numerals to Arabic
> numerals (I don't think we need to have a capability to normalize to Kanji
> numerals).
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]