[jira] [Issue Comment Edited] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

Uwe Schindler (JIRA) Fri, 29 Apr 2011 13:53:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027179#comment-13027179
 ]


Uwe Schindler edited comment on LUCENE-3055 at 4/29/11 8:51 PM:
----------------------------------------------------------------

{quote}
>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was 
the biggest problem, leading to ugly reflection based checks in Lucene 3.0: In 
3.0 StandardAnalyzer correctly implemented both tokenStream() and 
reuseableTokenStream(). As soon as one subclass only overrided tokenStream(), 
but the indexer still calling reuseableTokenStream() the changes were not even 
used, leading to lots of bug reports. Because of this, a reflection based 
backwards hack was done in 3.0 (see o.a.l.util.VirtualMethod class to make this 
easier), that prevented the indexer from calling reuseableTokenStream if a 
subclass suddenly overwrote only one of the methods. With moving forward in 
3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams, new 
base class ReuseableAnalyzerBase,...), so the only solution was to enforce the 
decorator pattern.

The above example by Robert is the correct way to implement your "factory" of 
TokenStreams. Everything else like subclassing StandardAnalyzer is ugly as it 
hides what you are really doing. The above pattern does exactly what also 
Solr's Schema does: You have to explicitely list all your components, making it 
clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous 
StandardAnalyzer completely (both tokenStream and reuseableTokenStream) and is 
showing like solrschema.xml what your Analyzer looks like (no hidden stuff in 
superfactories,...)

      was (Author: thetaphi):
    {quote}
>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was 
the biggest problem, leading to ugly reflection based checks in Lucene 3.0: In 
3.0 StandardAnalyzer correctly implemented both tokenStream() and 
reuseableTokenStream(). As soon as one subclass only overrided tokenStream(), 
but the indexer still calling reuseableTokenStream() the changes were not even 
used, leading to lots of bug reports. Because of this, a reflection based 
backwards hack was done in 3.0 (see o.a.l.util.VirtualMethod class to make this 
easier), that prevented the indexer from calling reuseableTokenStream if a 
subclass suddenly overwrote only one of the methods. With moving forward in 
3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams, new 
base class ReuseableAnalyzerBase,...), so the only solution was to enforce the 
decorator pattern.

The above example by Robert is the correct way to implement you "factory" of 
TokenStreams. Everything else like subclassing StandardAnalyzer is ugly as it 
hides what you are really doing. The above pattern does exactly what also 
Solr's Schemadoes: You have to explicitely list all your components, making it 
clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous 
StandardAnalyzer completely (both tokenStream and reuseableTokenStream) and is 
showing like solrschema.xml what your Analyzer looks like (no hidden stuff in 
superfactories,...)
  
> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-3055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3055
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.1
>            Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes 
> ReusableAnalyzerBase useless, and makes it impossible to subclass e.g. 
> StandardAnalyzer to make a small modification e.g. to tokenStream().  These 
> issues don't indicate a new method of doing this.  The issues don't give a 
> reason except for design considerations, which seems a poor reason to make a 
> backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

Reply via email to