[ https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676578#comment-16676578 ]
Alan Woodward commented on LUCENE-8497: --------------------------------------- I plan on committing this soon - any objections, speak up now... > Rethink multi-term analysis handling > ------------------------------------ > > Key: LUCENE-8497 > URL: https://issues.apache.org/jira/browse/LUCENE-8497 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Priority: Major > Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch, > LUCENE-8497.patch > > Time Spent: 1h > Remaining Estimate: 0h > > The current framework for handling term normalisation works via instanceof > checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself > deals in AbstractAnalysisComponents, and so callers need to cast to the > correct component type before use, which is ripe for misuse. > We should re-organise all this to be type-safe and usable without casts. One > possibility is to add `normalize` methods to CharFilterFactory and > TokenFilterFactory that mirror their existing `create` methods. The default > implementation would return the input unchanged, while filters that should > apply at normalization time can delegate to `create`. > Related to this, we should deprecate and remove LowerCaseTokenizer, which > combines tokenization and normalization in a way that will break this API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org