Regarding the recent changes in Token (reusability and use char[] instead of Token)
1) If we are deprecating some methods like String termText(), how about at the same time deprecating "String type"? If we want lightweight per-token metadata for communication between filters, an int or a long used as a bitvector (32 or 64 independent boolean vars per token) would be much more useful than a single String. 2) I think we need to clarify who needs to "clean up" a token's state when it's being reused (or if it needs to be cleaned up)... for example, in the CharTokenizer, the token type, token payload, and positionIncrement is not reset, so they will default to the last token's value.... is this a) a bug b) guaranteed behavior one can depend on or c) undefined? Since this includes positionIncrement, I'm inclined to say that this is a bug. There is a Token.clear().... should it be used by either the caller or the Tokenizer? -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]