[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2018-11-20 Thread Kazuaki Hiraga (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694286#comment-16694286
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


I have confirmed that there are still some normalization issues that 
incorrectly normalize Kanji numerals. However, implementation itself has been 
finished and merged into the main branch. Thus, I will close this ticket and 
file another ticket to report normalization issues and send patches. 

 

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>Priority: Major
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2018-11-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689577#comment-16689577
 ] 

Mike Sokolov commented on LUCENE-3922:
--

+1 - this was merged ages ago (2015); would be nice to clean up the Jira so 
folks looking for interesting projects don't get diverted :)

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>Priority: Major
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2018-10-04 Thread ankush jhalani (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638360#comment-16638360
 ] 

ankush jhalani commented on LUCENE-3922:


I noticed the changes are available in master/branch_7x 
([Test]JapaneseNumberFilter[Factory].java)

Should we mark this closed?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>Priority: Major
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2015-04-02 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392489#comment-14392489
 ] 

Ramkumar Aiyengar commented on LUCENE-3922:
---

[~cm], just got interested in this patch.. Any reason this hasn't gone to 
branch_5x as yet?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Fix For: 5.1
>
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2015-02-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303062#comment-14303062
 ] 

ASF subversion and git services commented on LUCENE-3922:
-

Commit 1656670 from [~cm] in branch 'dev/trunk'
[ https://svn.apache.org/r1656670 ]

Added JapaneseNumberFilter (LUCENE-3922)

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Fix For: 5.1
>
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2015-01-28 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296379#comment-14296379
 ] 

Christian Moen commented on LUCENE-3922:


Please feel free to test it.  Feedback is very welcome.

The patch is against {{trunk}} and this should make it into 5.1.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2015-01-21 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285565#comment-14285565
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


[~cm] , sounds great! Can I test this feature? If yes, what version should I 
use?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2014-10-16 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173567#comment-14173567
 ] 

Christian Moen commented on LUCENE-3922:


Gaute and myself have done testing on real-world data and we've uncovered and 
fixed a couple of corner-case issues.

Our todo items are as follows:

# Do additional testing and possible add additional number formats
# Document some unsupported cases in unit-tests
# Add class-level javadoc
# Add a Solr factory



> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, 
> LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2014-10-09 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164954#comment-14164954
 ] 

Christian Moen commented on LUCENE-3922:


I've attached a new patch.

The {{checkRandomData}} issues were caused by improper handling of token 
composition for graphs (bug found by [~gaute]). Tokens preceded by position 
increment zero token are left untouched and so are stacked/synonym tokens.

We'll do some more testing and add some documentation before we move forward to 
commit this.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>Assignee: Christian Moen
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2014-08-05 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085909#comment-14085909
 ] 

Christian Moen commented on LUCENE-3922:


Gaute and myself have been doing some work on this and we have rewritten this 
as a {{TokenFilter}}.

A few comments:

* We have added support for numbers such as 3.2兆円 as you requested, Kazu.
* We could potentially use a POS-tag attribute from Kuromoji to identify number 
that we are composing, but perhaps not relying on POS-tags makes this filter 
also useful in the case of n-gramming.
* We haven't implemented any of the anchoring logic discussed above, i.e. if we 
to restrict normalization to prices, etc. Is this useful to have?
* Input such as {{1,5}} becomes {{15}} after normalization, which could be 
undesired. Is this bad input or do we want anchoring to retain these numbers?

One thing though, in order to support some of this number parsing, i.e. cases 
such as 3.2兆円, we need to use Kuromoji in a mode that retains punctuation 
characters.

There's also an unresolved issue found by {{checkRandomData}} that we haven't 
tracked down and fixed, yet.

This is a work in progress and feedback is welcome.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-12 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475016#comment-13475016
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


It would be nice if we can choose expand them or normalize them.

I have a concern that Solr's query-side synonym expansion doesn't work well if 
number of tokens are different between original tokens and synonym tokens, 
especially if we want to do phrase matching with query-side synonym expansion 
will be a disaster (Of course, reduction or index-side would be better. But, we 
sometimes need to use TokenFilter that provides such capability in query-side.) 
So, I would like to choose the configuration that Kanji numerals normalize to 
Arabic numerals or Arabic numerals store along with Kanji numerals. 

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-11 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474287#comment-13474287
 ] 

Christian Moen commented on LUCENE-3922:


Ohtani-san,

I saw your tweet about this earlier and it sounds like a very good idea.  
Thanks.

I will try to set aside some time to work on this.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-11 Thread Jun Ohtani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474268#comment-13474268
 ] 

Jun Ohtani commented on LUCENE-3922:


Hi Christian, Kazuaki

+1, TokenFilter implementation.
And I think that it is helpful, this TokenFilter expand token arabic number and 
kanji number, like a synonym filter feature.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-11 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474257#comment-13474257
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Hi Christian,

That what I am thinking. I think TokenFilter would be a good choice to 
implement that feature. We can use POS tag to recognize what a token is. We can 
apply normalization if a token is a numeral prefix/suffix with numerals. 

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-11 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474224#comment-13474224
 ] 

Christian Moen commented on LUCENE-3922:


Thanks, Kazu.

I'm aware of the issue and the thinking is to rework this as a {{TokenFilter}} 
and use anchoring options with surrounding tokens to decide if normalisation 
should take place, i.e. if the preceding token is ¥ or the following token is 円 
in the case of normalising prices.

It might also be helpful to look into using POS-info for this to benefit from 
what we actually know about the token, i.e. to not apply normalisation if the 
POS tag is a person name.

Other suggestions and ideas are of course most welcome.


> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-11 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474210#comment-13474210
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


The following examples are false positive case:
"姿三四郎" became "姿", "34", "郎"
"小林一茶" became "小林", "1", "茶"
"鈴木一郎" became "鈴木", "1", "郎"

Can we prevent this behavior?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471132#comment-13471132
 ] 

Christian Moen commented on LUCENE-3922:


{quote}
Is it difficult to support numbers with period as the following?
3.2兆円
5.2億円
{quote}

Supporting this is no problem and a good idea.

{quote}
I think It would be helpful that this charfilter supports old Kanji numeric 
characters ("KYU-KANJI" or "DAIJI") such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 
(Three), or configureable.
{quote}

This is also easy to support.

As for making preserving zeros configurable, that's also possible, of course.

It's great to get more feedback on what sort of functionality we need and what 
should be configurable options. Hopefully, we can find a good balance without 
adding too much complexity.

Thanks for the feedback.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471123#comment-13471123
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Lance, you may be right.  Although I have never seen that Japanese people use 
Kanji numbers for James Bond movies :-), I can't say that we never use Kanji 
for that kind of expression.

Christian, Is it possible to choose preserve leading zeros or not?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471117#comment-13471117
 ] 

Lance Norskog commented on LUCENE-3922:
---

bq. On the other hand, I agree with Christian to not preserving leading zeros. 
So, "◯◯七" doesn't need to become "007".
This example shows why leading zeros should be preserved :)

There are different kinds of text search. Searching for media titles like James 
Bond movies is a very different thing from searching newspaper articles. You 
might want to find "◯◯七" as the Japanese-language release and "007" as the 
English-language release. These numbers are brands, not numbers. 

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471068#comment-13471068
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Sorry for this late reply.

Although I have some request to improve capability, this is very helpful and 
nice charfilter for me.
Thank you! Christian!!

My requests are the following:

Is it difficult to support numbers with period as the following?
3.2兆円
5.2億円

On the other hand, I agree with Christian to not preserving leading zeros. So, 
"◯◯七" doesn't need to become "007".

I think It would be helpful that this charfilter supports old Kanji numeric 
characters ("KYU-KANJI" or "DAIJI") such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 
(Three), or configureable.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-04 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469936#comment-13469936
 ] 

Lance Norskog commented on LUCENE-3922:
---

Kazuaki, do have any comment on this fix?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-07-31 Thread Kazuaki Hiraga (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426340#comment-13426340
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Hi Christian,

Great! I will test your patch and get back to you!!

Thanks,
Kazu

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-07-30 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425488#comment-13425488
 ] 

Christian Moen commented on LUCENE-3922:


I've attached a work-in-progress patch for {{trunk}} that implements a 
{{CharFilter}} that normalizes Japanese numbers.

These are some TODOs and implementation considerations I have that I'd be 
thankful to get feedback on:

* Buffering the entire input on the first read should be avoided.  The primary 
reason this is done is because I was thinking to add some regexps before and 
after kanji numeric strings to qualify their normalization, i.e. to only 
normalize strings that starts with ¥, JPY or ends with 円, to only normalize 
monetary amounts in Japanese yen.  However, this probably isn't necessary as we 
can probably can use {{Matcher.requireEnd()}} and {{Matcher.hitEnd()}} to 
decide if we need to read more input. (Thanks, Robert!)

* Is qualifying the numbers to be normalized with prefix and suffix regexps 
useful, i.e. to only normalize monetary amounts?

* How do we deal with leading zeros?  Currently, "007" and "◯◯七" becomes "7" 
today.  Do we want an option to preserve leading zeros?

* How large numbers do we care about supporting?  Some of the larger numbers 
are surrogates, which complicates implementation, but they're certainly 
possible.  If we don't care about really large numbers, we can probably be fine 
working with {{long}} instead of {{BigInteger}}.

* Polite numbers and some other variants aren't supported, i.e. 壱, 弐, 参, etc., 
but they can easily be added.  We can also add the obsolete variants if that's 
useful somehow.  Are these useful?  Do we want them available via an option?

* Number formats such as "1億2,345万6,789" isn't supported - we don't deal with 
the comma today, but this can be added.  The same applies to "12 345" where 
there's a space that separates thousands like in French.  Numbers like "2・2兆" 
aren't supported, but can be added.

* Only integers are supported today, so we can't parse "〇・一二三四", which becomes 
"0" and "1234" as separate tokens instead of "0.1234"

There are probably other considerations, too, that I doesn't immediately come 
to mind.

Numbers are fairly complicated and feedback on direction for further 
implementation is most appreciated.  Thanks.

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Kazuaki Hiraga
>  Labels: features
> Attachments: LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Kazuaki Hiraga (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239122#comment-13239122
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Koji, Thank you for your comment. I am very interested in the normalizer you 
have mentioned. Is it possible to choose to concatenate suffix/prefix(年/月/円, 
etc.) to the Arabic numbers?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Kazuaki Hiraga
>  Labels: features
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238334#comment-13238334
 ] 

Christian Moen commented on LUCENE-3922:


Koji, this is very nice.

Does the kanji number normalizer ({{KanjiNumberCharFilter}}) also deal with 
combinations of kanji and arabic numbers like Kazu's price example?

Is the above code you refer to something that can go into Lucene or is it 
non-free software?

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Kazuaki Hiraga
>  Labels: features
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Koji Sekiguchi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238329#comment-13238329
 ] 

Koji Sekiguchi commented on LUCENE-3922:


We, RONDHUIT, have done this kind of normalization (and more!). You may be 
interested in:

http://www.rondhuit-demo.com/RCSS/api/overview-summary.html#featured-japanese

||Summary||normalization sample||
|漢数字=>算用数字正規化|四七=>47, 四十七=>47, 四拾七=>47, 四〇七=>407|
|和暦=>西暦正規化|昭和四七年、昭和四十七年、昭和四拾七年=>1972年, 昭和六十四年、平成元年=>1989年|


> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Kazuaki Hiraga
>  Labels: features
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238303#comment-13238303
 ] 

Christian Moen commented on LUCENE-3922:


Thanks a lot, Kazu.

This is a good idea to add.  Patches are of course also very welcome! :)

> Add Japanese Kanji number normalization to Kuromoji
> ---
>
> Key: LUCENE-3922
> URL: https://issues.apache.org/jira/browse/LUCENE-3922
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Kazuaki Hiraga
>  Labels: features
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing 
> price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
> 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
> numerals (I don't think we need to have a capability to normalize to Kanji 
> numerals).
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org