Hello,
I'm interested in knowing how these tokenizers work together.
The API doc for TeeTokenizer
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/analysis/TeeTokenFilter.html

has this sample code:
SinkTokenizer sink1 = new SinkTokenizer(null);
SinkTokenizer sink2 = new SinkTokenizer(null);

TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new 
WhitespaceTokenizer(reader1), sink1), sink2);
TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new 
WhitespaceTokenizer(reader2), sink1), sink2);

TokenStream final3 = new EntityDetect(sink1);
TokenStream final4 = new URLDetect(sink2);

with an explanation that reads "sink1 and sink2 will both get tokens from both 
reader1 and reader2 after whitespace tokenizer",
but I don't understand how the input from reader1 and reader2 are mixed 
together.
Will sink1 first reaturn the reader1 text, and reader2?
Or are they mixed randomly?

-Kuro
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to