[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546050 ]
Michael Busch commented on LUCENE-1058: --------------------------------------- We need to change the CachingTokenFilter a bit (untested code): {code:java} public class CachingTokenFilter extends TokenFilter { private List cache; private Iterator iterator; public CachingTokenFilter(TokenStream input) { super(input); this.cache = new LinkedList(); } public Token next() throws IOException { if (iterator != null) { if (!iterator.hasNext()) { // the cache is exhausted, return null return null; } return (Token) iterator.next(); } else { Token token = input.next(); addTokenToCache(token); return token; } } public void reset() throws IOException { if(cache != null) { iterator = cache.iterator(); } } protected void addTokenToCache(Token token) { if (token != null) { cache.add(token); } } } {code} Then you can implement the ProperNounTF: {code:java} class ProperNounTF extends CachingTokenFilter { protected void addTokenToCache(Token token) { if (token != null && isProperNoun(token)) { cache.add(token); } } private boolean isProperNoun() {...} } {code} And then you add everything to Document: {code:java} Document d = new Document(); TokenStream properNounTf = new ProperNounTF(new StandardTokenizer(reader)); TokenStream stdTf = new CachingTokenFilter(new StopTokenFilter(properNounTf)); TokenStrean lowerCaseTf = new LowerCaseTokenFilter(stdTf); d.add(new Field("std", stdTf)); d.add(new Field("nouns", properNounTf)); d.add(new Field("lowerCase", lowerCaseTf)); {code} Again, this is untested, but I believe should work? > New Analyzer for buffering tokens > --------------------------------- > > Key: LUCENE-1058 > URL: https://issues.apache.org/jira/browse/LUCENE-1058 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch, > LUCENE-1058.patch, LUCENE-1058.patch > > > In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that > could siphon off certain tokens and store them in a buffer to be used later > in the processing pipeline. > For example, if you want to have two fields, one lowercased and one not, but > all the other analysis is the same, then you could save off the tokens to be > output for a different field. > Patch to follow, but I am still not sure about a couple of things, mostly how > it plays with the new reuse API. > See > http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]