In the longer term, I think we do something that is more automatic and correct - but for now, adding this brute force option is best I think.
David Kaelbling wrote: > Uwe, > > I kind of like the idea of changing WeightedSpanTermExtractor to test for > "!(tokenStream instanceof RandomAccess)" :-) > > - David > > -- > David Kaelbling > Senior Software Engineer > Black Duck Software, Inc. > > dkaelbl...@blackducksoftware.com > T +1.781.810.2041 > F +1.781.891.5145 > > http://www.blackducksoftware.com > ________________________________________ > From: David Kaelbling > Sent: Friday, August 28, 2009 11:54 AM > To: Uwe Schindler; java-dev@lucene.apache.org > Subject: RE: CachingTokenFilter extensibility and LUCENE-1685 > > Hi Uwe, > > The problem is that I need to have a random access token stream for other > reasons, and don't want CachingTokenFilter to buffer up a redundant copy of > it. In existing releases I subclass it to override all the methods to use my > store, and ignore the LinkedList cache member. The old internal structures > were still present, but were never used. In 2.9 I can't do that any more, > and without a subclassed object I have no way to prevent > WeightedSpanTermExtractor from wrapping the stream. > > If there were some way to tell WeightedSpanTermExtractor not wrap the stream > (a new TokenStream.isCachingTokens() method, checking for an new > "CachedTokenStream" interface rather than for CachingTokenFilter, some > attribute, anything! :-) then I could still work with the public API. > > - David > > -- > David Kaelbling > Senior Software Engineer > Black Duck Software, Inc. > > dkaelbl...@blackducksoftware.com > T +1.781.810.2041 > F +1.781.891.5145 > > http://www.blackducksoftware.com > ________________________________________ > From: Uwe Schindler [...@thetaphi.de] > Sent: Friday, August 28, 2009 4:03 AM > To: java-dev@lucene.apache.org > Subject: RE: CachingTokenFilter extensibility and LUCENE-1685 > > Hi David, > > What is exactly your problem? Even the old 2.4 CachingTokenFilter did not > expose its internal structures, so overriding would not change its internal > implementation. The only change now is, that *all* TokenFilters in core have > final implementations, which is a consequence of the new TokenStream API and > the migration path to it. So it should not be possible to override > next()/next(Token)/incrementToken() in all TokenStreams, as extensibility of > the whole API is because of simply adding new TokenFilters into the chain, > that do what you want to add. Let users override incrementToken() would > possibly break a lot of things (see LUCENE-1753) > > To fix your specific problems, it may be an idea to add a method > (isCachingTokens) in future to TokenStreams that default to false and is > true for CachingTokenFilter and TeeSinkTokenStream.SinkTokenStream. > Highlighter would be able to detect, if it can reset() (better name would be > rewind) the TokenStream. In this case you could simply provide another > TokenFilter subclass with isCachingTokens=true and random access to the > AttributeSource.States. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: David Kaelbling [mailto:dkaelbl...@blackducksoftware.com] >> Sent: Thursday, August 27, 2009 10:40 PM >> To: java-dev@lucene.apache.org >> Subject: CachingTokenFilter extensibility and LUCENE-1685 >> >> Hi, >> >> Looking at Lucene 2.9 trunk, CachingTokenFilter seems much less extensible >> than before. In previous releases I subclassed it so I could back the >> cache with an array and provide random access to the stream. I can't see >> how to do this any more, and the >> WeightedSpanTermExtractor.getReaderForField() is still hardwired to >> require a CachingTokenFilter-derived object. >> >> Am I missing something? Having two copies of the token stream, one for >> random access and one hidden inside the CachingTokenFilter, does not sound >> efficient :-) >> >> Thanks, >> David >> >> -- >> David Kaelbling >> Senior Software Engineer >> Black Duck Software, Inc. >> >> dkaelbl...@blackducksoftware.com >> T +1.781.810.2041 >> F +1.781.891.5145 >> >> http://www.blackducksoftware.com >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org