Alan Woodward created LUCENE-8497:
-------------------------------------

             Summary: Rethink multi-term analysis handling
                 Key: LUCENE-8497
                 URL: https://issues.apache.org/jira/browse/LUCENE-8497
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: Alan Woodward


The current framework for handling term normalisation works via instanceof 
checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
deals in AbstractAnalysisComponents, and so callers need to cast to the correct 
component type before use, which is ripe for misuse.

We should re-organise all this to be type-safe and usable without casts.  One 
possibility is to add `normalize` methods to CharFilterFactory and 
TokenFilterFactory that mirror their existing `create` methods.  The default 
implementation would return the input unchanged, while filters that should 
apply at normalization time can delegate to `create`.

Related to this, we should deprecate and remove LowerCaseTokenizer, which 
combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to