[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-05-30 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851778#comment-16851778
 ] 

Jim Ferenczi commented on LUCENE-8812:
--

The patch looks good [~danmuzi], I wonder if it would be difficult to have a 
base class for the Japanese and Korean number filter since they share a large 
amount of code. However I think it's ok to merge this first and we can tackle 
the merge in a follow up, wdyt ?

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-05-30 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852014#comment-16852014
 ] 

Namgyu Kim commented on LUCENE-8812:


Thank you for your reply, [~jim.ferenczi] :D
{quote}I wonder if it would be difficult to have a base class for the Japanese 
and Korean number filter since they share a large amount of code. However I 
think it's ok to merge this first and we can tackle the merge in a follow up, 
wdyt ?
{quote}
I think it is an awesome refactoring.
 If the refactoring is done, we can also share this TokenFilter in 
SmartChineseAnalyzer. (Chinese and Japanese use the same numeric characters)
 The amount of code will also be reduced.

I think the NumberFilter (new abstract class) can be in the 
org.apache.lucene.analysis.core(analysis-common) or 
org.apache.lucene.analysis(lucene-core) package, what do you think?
 In my personal opinion, analysis-common seems to be correct, but it is a 
little bit ambiguous.

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-06 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857911#comment-16857911
 ] 

Jim Ferenczi commented on LUCENE-8812:
--

Sorry I didn't see your reply. I agree with you that it is ambiguous to put it 
in analysis-common so +1 to add it in the nori module for now and revisit 
if/when we create a separate module for the mecab tokenizer. 

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-07 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858839#comment-16858839
 ] 

Namgyu Kim commented on LUCENE-8812:


Thank you for your reply. [~jim.ferenczi] :D

Awesome. I'll submit this patch.

This is the first time that I submit a patch manually.
I will look for the manual and proceed, but it can take a little time.

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859467#comment-16859467
 ] 

ASF subversion and git services commented on LUCENE-8812:
-

Commit 5a75b8a0808e577f9f1a737f411104034cc23a6b in lucene-solr's branch 
refs/heads/master from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5a75b8a ]

LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to 
number and process decimal point

Signed-off-by: Namgyu Kim 


> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859469#comment-16859469
 ] 

ASF subversion and git services commented on LUCENE-8812:
-

Commit 36b90c6056932c9cf2e4dfae3d500cab5d03ba3b in lucene-solr's branch 
refs/heads/branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=36b90c6 ]

LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to 
number and process decimal point.

Signed-off-by: Namgyu Kim 


> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859470#comment-16859470
 ] 

Namgyu Kim commented on LUCENE-8812:


Hi [~jim.ferenczi] :D

I pushed my commit to *branch_8x* and *master* branch.
 I checked and it seems to be reflected fine.
 So I'll resolve this issue.
Let me know if there are some problems.

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859502#comment-16859502
 ] 

ASF subversion and git services commented on LUCENE-8812:
-

Commit 619e47b791a3cea475923a61113adf0fe39404c6 in lucene-solr's branch 
refs/heads/branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=619e47b ]

LUCENE-8812: disable Java 9 try-with-resources style in TestKoreanNumberFilter

Signed-off-by: Namgyu Kim 


> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859504#comment-16859504
 ] 

ASF subversion and git services commented on LUCENE-8812:
-

Commit fe58b6f3a27e1520054f74f06c3d7c2042ff4f79 in lucene-solr's branch 
refs/heads/master from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fe58b6f ]

LUCENE-8812: disable Java 9 try-with-resources style in TestKoreanNumberFilter

Signed-off-by: Namgyu Kim 


> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-09 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859507#comment-16859507
 ] 

Namgyu Kim commented on LUCENE-8812:


I re-opened this issue because the error was occurred in branch_8x branch.
 ([https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Windows/298/])
 After the error was found, the commit to fix the error was pushed.
 The cause of the error was that when I wrote the test(TestKoreanNumberFilter),
 I used Java 9 try-with-resources style, and the error was occurred in 
branch_8x because it is based on Java 8.
 So I disable it including the master branch, and when the 9.0 version is 
officially released, I will rework it at that time.

And I made a slight mistake.
 I did not mention the issue number when committing to the branch_8x branch.
 I wanted to change the commit message, but it could not be changed because it 
is a protected branch. (force push is blocked)
 So I was forced to revert.

Anyway, both problems(error + wrong commit message) are all my fault, and I 
will be careful.

I will resolve this issue when the Jenkins build is completed.

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-10 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859778#comment-16859778
 ] 

Jim Ferenczi commented on LUCENE-8812:
--

Thanks [~danmuzi]!

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-06-12 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862270#comment-16862270
 ] 

Namgyu Kim commented on LUCENE-8812:


You're welcome. [~jim.ferenczi]!
I checked that the build was completed fine at Jenkins.
I'll resolve this issue.

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8812) add KoreanNumberFilter to Nori(Korean) Analyzer

2019-05-29 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851033#comment-16851033
 ] 

Namgyu Kim commented on LUCENE-8812:


Hi [~jim.ferenczi] :D
Thank you for applying the LUCENE-8784 patch!
I checked that this patch can be applied into latest code. (no conflict)

What do you think about this patch?

> add KoreanNumberFilter to Nori(Korean) Analyzer
> ---
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to 
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org