I tested the attached patch, all tests still compile and work as exspected
(as CharStream extends Reader). 

I think I should open an issue?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Thursday, September 10, 2009 4:54 PM
> To: java-dev@lucene.apache.org
> Subject: Problem with CharStream and Tokenizers with custom reset(Reader)
> method
> 
> When reviewing the new CharStream code added to Tokenizers, I found a
> serious problem with backwards compatibility and other Tokenizers, that do
> not override reset(CharStream).
> 
> The problem is, that e.g. CharTokenizer only overrides reset(Reader):
> 
>   public void reset(Reader input) throws IOException {
>     super.reset(input);
>     bufferIndex = 0;
>     offset = 0;
>     dataLen = 0;
>   }
> 
> If you reset such a Tokenizer with another CharStream (not a Reader), this
> method will never be called and breaking the whole Tokenizer.
> 
> As CharStream extends Reader, I propose to remove this reset(CharStream
> method) and simply do an instanceof check to detect if the supplied Reader
> is no CharStream and wrap it. We could also remove the extra ctor (because
> most Tokenizers have no support for passing CharStreams). If the ctor also
> checks with instanceof and warps as needed the code is backwards
> compatible
> and we do not need to add additional ctors in subclasses.
> 
> As this instanceof check is always done in CharReader.get() why not remove
> ctor(CharStream) and reset(CharStream) completely?
> 
> Any thoughts?
> 
> I would like to fix this somehow before RC4, I', sorry :(
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org

Attachment: Tokenizer-CharStream-fix.patch
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to