Re: String representation

Philip Newton Mon, 18 Dec 2000 07:22:42 -0800

On Sun, 17 Dec 2000, Dan Sugalski wrote:

> I'm thinking for speed that binary and UTF-32 should be our internal 
> representations, at least for the data that gets handed to the regex 
> engine. Or at least we use a constant-width character that's 8 and 32 bits, 
> if I'm misusing UTF-32. (UTF-8 is variable-width--is UTF-32?)

No. UTF-32 is always 4 bytes AIUI. UTF-8 is variable (1..4) and so is
UTF-16 (either 2 or 4, though 4 bytes are needed only for characters >
U+FFFF, i.e. outside the BMP or Basic Multilingual Plane).

Cheers,
Philip
-- 
Philip Newton <[EMAIL PROTECTED]>
I appreciate copies of replies to this message

Re: String representation

Reply via email to