[
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546052
]
Yonik Seeley commented on LUCENE-1058:
--------------------------------------
Very similar to what I came up with I think... (all untested, etc)
{code}
class ListTokenizer extends Tokenizer {
protected List<Token> lst = new ArrayList<Token>();
protected Iterator<Token> iter;
public ListTokenizer(List<Token> input) {
this.lst = input;
if (this.lst==null) this.lst = new ArrayList<Token>();
}
/** only valid if tokens have not been consumed,
* i.e. if this tokenizer is not part of another tokenstream
*/
public List<Token> getTokens() {
return lst;
}
public Token next(Token result) throws IOException {
if (iter==null) iter = lst.iterator();
return iter.next();
}
/** Override this method to cache only certain tokens, or new tokens based
* on the old tokens.
*/
public void add(Token t) {
if (t==null) return;
lst.add((Token)t.clone());
}
public void reset() throws IOException {
iter = lst.iterator();
}
}
class TeeTokenFilter extends TokenFilter {
ListTokenizer sink;
protected TeeTokenFilter(TokenStream input, ListTokenizer sink) {
super(input);
this.sink = sink;
}
public Token next(Token result) throws IOException {
Token t = input.next(result);
sink.add(t);
return t;
}
}
{code}
> New Analyzer for buffering tokens
> ---------------------------------
>
> Key: LUCENE-1058
> URL: https://issues.apache.org/jira/browse/LUCENE-1058
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 2.3
>
> Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch,
> LUCENE-1058.patch, LUCENE-1058.patch
>
>
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that
> could siphon off certain tokens and store them in a buffer to be used later
> in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but
> all the other analysis is the same, then you could save off the tokens to be
> output for a different field.
> Patch to follow, but I am still not sure about a couple of things, mostly how
> it plays with the new reuse API.
> See
> http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]