[
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546058
]
Yonik Seeley commented on LUCENE-1058:
--------------------------------------
I think having the "tee" solves the many-to-many case... you can have many
fields contribute tokens to a new field.
{code}
ListTokenizer sink1 = new ListTokenizer(null);
ListTokenizer sink2 = new ListTokenizer(null);
TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new
WhitespaceTokenizer(reader1), sink1), sink2);
TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new
WhitespaceTokenizer(reader2), sink1), sink2);
// now sink1 and sink2 will both get tokens from both reader1 and reader2 after
whitespace tokenizer
// now we can further wrap any of these in extra analysis, and more "tees" can
be inserted if desired.
TokenStream final1 = new LowerCaseFilter(source1);
TokenStream final2 = source2;
TokenStream final3 = new EntityDetect(sink1);
TokenStream final4 = new URLDetect(sink2);
d.add(new Field("f1", final1));
d.add(new Field("f2", final2));
d.add(new Field("f3", final3));
d.add(new Field("f4", final4));
{code}
> New Analyzer for buffering tokens
> ---------------------------------
>
> Key: LUCENE-1058
> URL: https://issues.apache.org/jira/browse/LUCENE-1058
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 2.3
>
> Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch,
> LUCENE-1058.patch, LUCENE-1058.patch
>
>
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that
> could siphon off certain tokens and store them in a buffer to be used later
> in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but
> all the other analysis is the same, then you could save off the tokens to be
> output for a different field.
> Patch to follow, but I am still not sure about a couple of things, mostly how
> it plays with the new reuse API.
> See
> http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]