The tokenstream create by SmartChineseAnalyzer can't reset
----------------------------------------------------------

                 Key: LUCENE-3834
                 URL: https://issues.apache.org/jira/browse/LUCENE-3834
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 3.5
            Reporter: dingjin


That is because the field input in class SentenceTokenizer isn't reset after we 
call the method reset().

They are two input field,one is from Tokenizer and another is from  
TokenFilter,if we need to reset a tokenstream created by SmartChineseAnalyzer, 
both  them need reset.This bug is because of the author forget reset input 
field in class SentenceTokenizer .

class path : org.apache.lucene.analysis.cn.smart.SentenceTokenizer

oringal code

public final class SentenceTokenizer extends Tokenizer {
  ....
  @Override
  public void reset() throws IOException {
    super.reset();
    tokenStart = tokenEnd = 0;
  }

 ...
}

this method should changes as follow

 
  public void reset() throws IOException {
    super.reset();
    //should reset input
    input.reset();
    tokenStart = tokenEnd = 0;
  }





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to