Re: Seemingly very difficult to wrap an Analyzer with CharFilter

Steven Schlansker Fri, 14 Jun 2013 10:01:51 -0700

On Jun 12, 2013, at 5:26 PM, Michael Sokolov <msoko...@safaribooksonline.com> 
wrote:


> On 6/12/2013 7:02 PM, Steven Schlansker wrote:
>> On Jun 12, 2013, at 3:44 PM, Michael Sokolov 
>> <msoko...@safaribooksonline.com> wrote:
>> 
>>> You may not have noticed that CharFilter extends Reader.  The expected 
>>> pattern here is that you chain instances together -- your CharFilter should 
>>> act as *input* to the Analyzer, I think.  Don't think in terms of extending 
>>> these analysis classes (except the base ones designed for it): compose them 
>>> so that each consumes the one before it
>>> 
>> Hi Mike,
>> 
>> Hm, that may work out.  I am a little surprised because I thought the 
>> intention is that you set the Analyzer up as part of the configuration, and 
>> when you add documents, the analyzer takes care of all text processing.  In 
>> particular this means that now I have to ensure that the same transformation 
>> is done at query time, and I thought the analyzer abstraction was supposed 
>> to avoid this.
>> 
>> But if this is how it should be done, it could work.  Thanks for the pointer.
>> 
>> Steven
>> 
>> 
> Um I'm sorry I was in a hurry and forgot to think... I went back and looked 
> at my code and found the pattern was different from what I was thinking.  I 
> have:
> 
> public final class DefaultAnalyzer extends Analyzer {
> 
>    @Override
>    protected TokenStreamComponents createComponents(String fieldName, Reader 
> reader) {
>        Tokenizer tokenizer = new 
> StandardTokenizer(IndexConfiguration.LUCENE_VERSION, reader);
>        TokenStream tokenStream =  new 
> LowerCaseFilter(IndexConfiguration.LUCENE_VERSION, tokenizer);
>        // ASCIIFoldingFilter
>        // Stemming
>        return new TokenStreamComponents(tokenizer, tokenStream);
>    }
> 
> }
> 
> You were exactly right that subclassing Analyzer and overriding the 
> initReader is the way to go.
> The composition I was talking about can happen among filters.  I guess you 
> have to duplicate the internals of StandardAnalyzer, but I don't think 
> there's all that much in there?

You are right, it is not that hard.  It is only that my goal was to have "a 
StandardAnalyzer with a CharFilter" and I hate unnecessarily duplicating code 
:-)

But it seems that this is my only course of action.

> 
> I used AnalyzerWrapper for something -- um switching between multiple 
> analyzers based on the input.  But it doesn't allow you to do anything with 
> the internals of the analyzer(s) it wraps.

Yeah, this is a little unfortunate.  Just being able to override initReader 
would be nice.

Thanks for the pointers,
Steven


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

Reply via email to