Jim Ferenczi created LUCENE-8959:
------------------------------------

             Summary: JapaneseNumberFilter does not take whitespaces into 
account when concatenating numbers
                 Key: LUCENE-8959
                 URL: https://issues.apache.org/jira/browse/LUCENE-8959
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Jim Ferenczi


Today the JapaneseNumberFilter tries to concatenate numbers even if they are 
separated by whitespaces. So for instance "10 100" is rewritten into "10100" 
even if the tokenizer doesn't discard punctuations. In practice this is not an 
issue but this can lead to giant number of tokens if there are a lot of numbers 
separated by spaces. The number of concatenation should be configurable with a 
sane default limit in order to avoid creating big tokens that slows down the 
analysis.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to