Alan Woodward created LUCENE-8497:
-------------------------------------
Summary: Rethink multi-term analysis handling
Key: LUCENE-8497
URL: https://issues.apache.org/jira/browse/LUCENE-8497
Project: Lucene - Core
Issue Type: New Feature
Reporter: Alan Woodward
The current framework for handling term normalisation works via instanceof
checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself
deals in AbstractAnalysisComponents, and so callers need to cast to the correct
component type before use, which is ripe for misuse.
We should re-organise all this to be type-safe and usable without casts. One
possibility is to add `normalize` methods to CharFilterFactory and
TokenFilterFactory that mirror their existing `create` methods. The default
implementation would return the input unchanged, while filters that should
apply at normalization time can delegate to `create`.
Related to this, we should deprecate and remove LowerCaseTokenizer, which
combines tokenization and normalization in a way that will break this API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]