Jim Ferenczi created LUCENE-8959:
------------------------------------
Summary: JapaneseNumberFilter does not take whitespaces into
account when concatenating numbers
Key: LUCENE-8959
URL: https://issues.apache.org/jira/browse/LUCENE-8959
Project: Lucene - Core
Issue Type: Improvement
Reporter: Jim Ferenczi
Today the JapaneseNumberFilter tries to concatenate numbers even if they are
separated by whitespaces. So for instance "10 100" is rewritten into "10100"
even if the tokenizer doesn't discard punctuations. In practice this is not an
issue but this can lead to giant number of tokens if there are a lot of numbers
separated by spaces. The number of concatenation should be configurable with a
sane default limit in order to avoid creating big tokens that slows down the
analysis.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]