[
https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027171#comment-13027171
]
Robert Muir commented on LUCENE-3055:
-------------------------------------
Hi Ian, you are right the justifications don't totally explain the reasoning
behind this change.
>From my perspective the most important reason is to avoid a huge performance
>trap: previously if you subclassed one of these analyzers, override
>tokenStream(), and added SpecialFilter for example, most of the time users
>would actually slow down indexing, because now reusableTokenStream() cannot be
>used by the indexer.
This created worst-case situations like LUCENE-2279.
Instead, the recommended approach is to just let analyzers be tokenstream
factories (which is all they are). They aren't really "extendable" only
"overridable" since they are just factories for tokenstreams, and by doing so
it creates the worst-case performance trap where new objects are created for
every document. I would instead recommend writing your analyzer by extending
ReusableAnalyzerBase instead, which is easy and safe:
{noformat}
Analyzer analyzer = new ReusableAnalyzerBase() {
protected TokenStreamComponents createComponents(String fieldName, Reader
reader) {
Tokenizer tokenizer = new WhitespaceTokenizer(...);
TokenStream filteredStream = new FooTokenFilter(tokenizer, ...);
filteredStream = new BarTokenFilter(filteredStream, ...);
return new TokenStreamComponents(tokenizer, filteredStream);
}
};
{noformat}
> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> ----------------------------------------------------------------------
>
> Key: LUCENE-3055
> URL: https://issues.apache.org/jira/browse/LUCENE-3055
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 3.1
> Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final. This makes
> ReusableAnalyzerBase useless, and makes it impossible to subclass e.g.
> StandardAnalyzer to make a small modification e.g. to tokenStream(). These
> issues don't indicate a new method of doing this. The issues don't give a
> reason except for design considerations, which seems a poor reason to make a
> backward-incompatible change
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]