On Jun 12, 2013, at 5:26 PM, Michael Sokolov <msoko...@safaribooksonline.com> wrote:
> On 6/12/2013 7:02 PM, Steven Schlansker wrote: >> On Jun 12, 2013, at 3:44 PM, Michael Sokolov >> <msoko...@safaribooksonline.com> wrote: >> >>> You may not have noticed that CharFilter extends Reader. The expected >>> pattern here is that you chain instances together -- your CharFilter should >>> act as *input* to the Analyzer, I think. Don't think in terms of extending >>> these analysis classes (except the base ones designed for it): compose them >>> so that each consumes the one before it >>> >> Hi Mike, >> >> Hm, that may work out. I am a little surprised because I thought the >> intention is that you set the Analyzer up as part of the configuration, and >> when you add documents, the analyzer takes care of all text processing. In >> particular this means that now I have to ensure that the same transformation >> is done at query time, and I thought the analyzer abstraction was supposed >> to avoid this. >> >> But if this is how it should be done, it could work. Thanks for the pointer. >> >> Steven >> >> > Um I'm sorry I was in a hurry and forgot to think... I went back and looked > at my code and found the pattern was different from what I was thinking. I > have: > > public final class DefaultAnalyzer extends Analyzer { > > @Override > protected TokenStreamComponents createComponents(String fieldName, Reader > reader) { > Tokenizer tokenizer = new > StandardTokenizer(IndexConfiguration.LUCENE_VERSION, reader); > TokenStream tokenStream = new > LowerCaseFilter(IndexConfiguration.LUCENE_VERSION, tokenizer); > // ASCIIFoldingFilter > // Stemming > return new TokenStreamComponents(tokenizer, tokenStream); > } > > } > > You were exactly right that subclassing Analyzer and overriding the > initReader is the way to go. > The composition I was talking about can happen among filters. I guess you > have to duplicate the internals of StandardAnalyzer, but I don't think > there's all that much in there? You are right, it is not that hard. It is only that my goal was to have "a StandardAnalyzer with a CharFilter" and I hate unnecessarily duplicating code :-) But it seems that this is my only course of action. > > I used AnalyzerWrapper for something -- um switching between multiple > analyzers based on the input. But it doesn't allow you to do anything with > the internals of the analyzer(s) it wraps. Yeah, this is a little unfortunate. Just being able to override initReader would be nice. Thanks for the pointers, Steven --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org