[ 
https://issues.apache.org/jira/browse/LUCENE-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828848#comment-15828848
 ] 

Hoss Man commented on LUCENE-7635:
----------------------------------

i'm not very familiar with Kuromoji but i believe the lines you 're deleting in 
this patch are intended to catch comments at the _end_ of a line -- not just 
the begining, ie...

{noformat}
# comment at start of line

朝青龍,朝青龍,アサショウリュウ,カスタム人名 # end line comment, has a comma in it
                                   # spans more then one line
abcd,a b cd,foo1 foo2 foo3,bar     # Another end line comment
{noformat}

Since it seems like the intent of the UserDict format is to be "CSV with '#' 
comments" it seems like the comment stripping should be moved to 
o.a.l.analysis.ja.util.CSVUtil where it can be done if-and-only-if the '#' is 
not part of a quoted value...

{noformat}
朝青龍,朝青龍,アサショウリュウ,カスタム人名  # end line comment, has a comma in it
                                    # spans more then one line
abcd,a b cd,foo1 foo2 foo3,bar      # Another end line comment
"quoted#sharp",other,"quoted,stuff" # yet another end line comment
{noformat}

ie: add a {{if(c == '#' && !insideQuote)}} block (similar to the existing 
{{COMMA}} conditional) to CSVUtil.parse() that would (trim and) add the final 
value to result and break out of the for loop.

?

> Kuromoji fails if user dictionary contains #
> --------------------------------------------
>
>                 Key: LUCENE-7635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7635
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Masaru Hasegawa
>         Attachments: LUCENE-7635.patch
>
>
> If user dictionary contains entries like:
> {code}
> withsharp#,withsharp#,withsharp#,カスタム名詞
> {code}
> It fails to create dictionary throwing 
> java.lang.ArrayIndexOutOfBoundsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to