All, I realize that we should be consuming all tokens from a stream. I'd like to wrap a client's Analyzer with LimitTokenCountAnalyzer with consume=false. For the analyzers that I've used, this has caused no problems. When I use MockTokenizer, I run into this assertion error: "end() called before incrementToken()". The comment in MockTokenizer reads:
// some tokenizers, such as limiting tokenizers, call end() before incrementToken() returns false. // these tests should disable this check (in general you should consume the entire stream) Disabling assertions gives me pause as does disobeying the workflow (http://lucene.apache.org/core/4_5_1/core/index.html). I assume from the warnings that there are Analyzers and use cases that will fail unless the stream is entirely consumed. Is there a safe way to wrap a client Analyzer and only read x number of tokens? Should I allow the client to decide whether or not to consume? Thank you! Best, Tim