Re: TokenStreamComponents in Lucene 4.0

Carsten Schnober Tue, 20 Nov 2012 03:27:37 -0800

Am 20.11.2012 10:22, schrieb Uwe Schindler:

Hi,


> The createComponents() method of Analyzers is only called *once* for each 
> thread and the Tokenstream is *reused* for later documents. The Analyzer will 
> call the final method Tokenizer#setReader() to notify the Tokenizer of a new 
> Reader (this method will update the protected "input" field in the Tokenizer 
> base class) and then it will reset() the whole tokenization chain. The custom 
> TokenStream components must "initialize" themselves with the new settings on 
> the reset() method.

Thanks, Uwe!
I think what changed in comparison to Lucene 3.6 is that reset() is
called upon initialization, too, instead of after processing the first
document only, right? Apart from the fact that it used not to be
obligatory to make all components reuseable, I suppose.
Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: TokenStreamComponents in Lucene 4.0

Reply via email to