Problem with CharStream and Tokenizers with custom reset(Reader) method -----------------------------------------------------------------------
Key: LUCENE-1906 URL: https://issues.apache.org/jira/browse/LUCENE-1906 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 2.9 When reviewing the new CharStream code added to Tokenizers, I found a serious problem with backwards compatibility and other Tokenizers, that do not override reset(CharStream). The problem is, that e.g. CharTokenizer only overrides reset(Reader): public void reset(Reader input) throws IOException { super.reset(input); bufferIndex = 0; offset = 0; dataLen = 0; } If you reset such a Tokenizer with another CharStream (not a Reader), this method will never be called and breaking the whole Tokenizer. As CharStream extends Reader, I propose to remove this reset(CharStream method) and simply do an instanceof check to detect if the supplied Reader is no CharStream and wrap it. We could also remove the extra ctor (because most Tokenizers have no support for passing CharStreams). If the ctor also checks with instanceof and warps as needed the code is backwards compatible and we do not need to add additional ctors in subclasses. As this instanceof check is always done in CharReader.get() why not remove ctor(CharStream) and reset(CharStream) completely? Any thoughts? I would like to fix this somehow before RC4, I'm, sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org