[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546052 ]
Yonik Seeley commented on LUCENE-1058: -------------------------------------- Very similar to what I came up with I think... (all untested, etc) {code} class ListTokenizer extends Tokenizer { protected List<Token> lst = new ArrayList<Token>(); protected Iterator<Token> iter; public ListTokenizer(List<Token> input) { this.lst = input; if (this.lst==null) this.lst = new ArrayList<Token>(); } /** only valid if tokens have not been consumed, * i.e. if this tokenizer is not part of another tokenstream */ public List<Token> getTokens() { return lst; } public Token next(Token result) throws IOException { if (iter==null) iter = lst.iterator(); return iter.next(); } /** Override this method to cache only certain tokens, or new tokens based * on the old tokens. */ public void add(Token t) { if (t==null) return; lst.add((Token)t.clone()); } public void reset() throws IOException { iter = lst.iterator(); } } class TeeTokenFilter extends TokenFilter { ListTokenizer sink; protected TeeTokenFilter(TokenStream input, ListTokenizer sink) { super(input); this.sink = sink; } public Token next(Token result) throws IOException { Token t = input.next(result); sink.add(t); return t; } } {code} > New Analyzer for buffering tokens > --------------------------------- > > Key: LUCENE-1058 > URL: https://issues.apache.org/jira/browse/LUCENE-1058 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch, > LUCENE-1058.patch, LUCENE-1058.patch > > > In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that > could siphon off certain tokens and store them in a buffer to be used later > in the processing pipeline. > For example, if you want to have two fields, one lowercased and one not, but > all the other analysis is the same, then you could save off the tokens to be > output for a different field. > Patch to follow, but I am still not sure about a couple of things, mostly how > it plays with the new reuse API. > See > http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]