Regarding the recent changes in Token (reusability and use char[]
instead of Token)

1) If we are deprecating some methods like String termText(), how
about at the same time deprecating "String type"?  If we want
lightweight per-token metadata for communication between filters, an
int or a long used as a bitvector (32 or 64 independent boolean vars
per token) would be much more useful than a single String.

2) I think we need to clarify who needs to "clean up" a token's state
when it's being reused (or if it needs to be cleaned up)... for
example, in the CharTokenizer, the token type, token payload, and
positionIncrement is not reset, so they will default to the last
token's value.... is this a) a bug  b) guaranteed behavior one can
depend on or c) undefined?  Since this includes positionIncrement, I'm
inclined to say that this is a bug.  There is a Token.clear()....
should it be used by either the caller or the Tokenizer?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to