I do not know, how this could affect Solr, but it could be the case. Currently most Tokenizers do not use CharStreams at all. After committing LUCENE-1906, I think there is also some additional work in Solr's custom Tokenizers needed (changed the correctOffset method).
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > From: Jason Rutherglen [mailto:[email protected]] > Sent: Friday, September 11, 2009 12:28 AM > To: [email protected] > Subject: Re: Problem with CharStream and Tokenizers with custom > reset(Reader) method > > I've been seeing strange behavior perhaps related to this? Where > sometimes a query is parsed and analyzed using Solr analyzers to > it's first clause fairly randomly, and other times the same > exact query is parsed and analyzed to the full correct query with all > clauses. It's so baffling I haven't really figured out an > approach to debugging it. I wonder if it's related to this > stream resetting issue. > > On Thu, Sep 10, 2009 at 7:54 AM, Uwe Schindler <[email protected]> wrote: > > When reviewing the new CharStream code added to Tokenizers, I found a > > serious problem with backwards compatibility and other Tokenizers, that > do > > not override reset(CharStream). > > > > The problem is, that e.g. CharTokenizer only overrides reset(Reader): > > > > public void reset(Reader input) throws IOException { > > super.reset(input); > > bufferIndex = 0; > > offset = 0; > > dataLen = 0; > > } > > > > If you reset such a Tokenizer with another CharStream (not a Reader), > this > > method will never be called and breaking the whole Tokenizer. > > > > As CharStream extends Reader, I propose to remove this reset(CharStream > > method) and simply do an instanceof check to detect if the supplied > Reader > > is no CharStream and wrap it. We could also remove the extra ctor > (because > > most Tokenizers have no support for passing CharStreams). If the ctor > also > > checks with instanceof and warps as needed the code is backwards > compatible > > and we do not need to add additional ctors in subclasses. > > > > As this instanceof check is always done in CharReader.get() why not > remove > > ctor(CharStream) and reset(CharStream) completely? > > > > Any thoughts? > > > > I would like to fix this somehow before RC4, I', sorry :( > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
