On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > a quick test tokenizing all of Wikipedia w/ > SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in > ReadTokensTask.java.
We could slim down clear() a little by only resetting certain things... startOffset and endOffset need to be set each time if anyone cares about offsets, so they don't really need to be reset. The only tokenizer to use "type" sets it every time AFAIK, so would could argue for skipping that as well. Not sure if the small performance gain would be worth it though. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]