[ 
https://issues.apache.org/jira/browse/LUCENE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509726#comment-16509726
 ] 

Mike Sokolov edited comment on LUCENE-8352 at 6/12/18 3:05 PM:
---------------------------------------------------------------

{quote}So maybe we could remove this setReader method, make 
TokenStreamComponents final, and replace the tokenizer field with a 
Consumer<Reader> that would be tokenizer::setReader by default?{quote}

I think that would work for me, yes, and not too difficult either :) 


was (Author: sokolov):
bq So maybe we could remove this setReader method, make TokenStreamComponents 
final, and replace the tokenizer field with a Consumer<Reader> that would be 
tokenizer::setReader by default?

I think that would work for me, yes, and not too difficult either :) 

> Make TokenStreamComponents final
> --------------------------------
>
>                 Key: LUCENE-8352
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8352
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mark Harwood
>            Priority: Minor
>
> The current design is a little trappy. Any specialised subclasses of 
> TokenStreamComponents _(see_ _StandardAnalyzer, ClassicAnalyzer, 
> UAX29URLEmailAnalyzer)_ are discarded by any subsequent Analyzers that wrap 
> them _(see LimitTokenCountAnalyzer, QueryAutoStopWordAnalyzer, 
> ShingleAnalyzerWrapper and other examples in elasticsearch)_. 
> The current design means each AnalyzerWrapper.wrapComponents() implementation 
> discards any custom TokenStreamComponents and replaces it with one of its own 
> choosing (a vanilla TokenStreamComponents class from examples I've seen).
> This is a trap I fell into when writing a custom TokenStreamComponents with a 
> custom setReader() and I wondered why it was not being triggered when wrapped 
> by other analyzers.
> If AnalyzerWrapper is designed to encourage composition it's arguably a 
> mistake to also permit custom TokenStreamComponent subclasses  - the 
> composition process does not preserve the choice of custom classes and any 
> behaviours they might add. For this reason we should not encourage extensions 
> to TokenStreamComponents (or if TSC extensions are required we should somehow 
> mark an Analyzer as "unwrappable" to prevent lossy compositions).
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to