[ https://issues.apache.org/jira/browse/LUCENE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509726#comment-16509726 ]
Mike Sokolov edited comment on LUCENE-8352 at 6/12/18 3:05 PM: --------------------------------------------------------------- {quote}So maybe we could remove this setReader method, make TokenStreamComponents final, and replace the tokenizer field with a Consumer<Reader> that would be tokenizer::setReader by default?{quote} I think that would work for me, yes, and not too difficult either :) was (Author: sokolov): bq So maybe we could remove this setReader method, make TokenStreamComponents final, and replace the tokenizer field with a Consumer<Reader> that would be tokenizer::setReader by default? I think that would work for me, yes, and not too difficult either :) > Make TokenStreamComponents final > -------------------------------- > > Key: LUCENE-8352 > URL: https://issues.apache.org/jira/browse/LUCENE-8352 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Mark Harwood > Priority: Minor > > The current design is a little trappy. Any specialised subclasses of > TokenStreamComponents _(see_ _StandardAnalyzer, ClassicAnalyzer, > UAX29URLEmailAnalyzer)_ are discarded by any subsequent Analyzers that wrap > them _(see LimitTokenCountAnalyzer, QueryAutoStopWordAnalyzer, > ShingleAnalyzerWrapper and other examples in elasticsearch)_. > The current design means each AnalyzerWrapper.wrapComponents() implementation > discards any custom TokenStreamComponents and replaces it with one of its own > choosing (a vanilla TokenStreamComponents class from examples I've seen). > This is a trap I fell into when writing a custom TokenStreamComponents with a > custom setReader() and I wondered why it was not being triggered when wrapped > by other analyzers. > If AnalyzerWrapper is designed to encourage composition it's arguably a > mistake to also permit custom TokenStreamComponent subclasses - the > composition process does not preserve the choice of custom classes and any > behaviours they might add. For this reason we should not encourage extensions > to TokenStreamComponents (or if TSC extensions are required we should somehow > mark an Analyzer as "unwrappable" to prevent lossy compositions). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org