On Sun, 17 Dec 2000, Dan Sugalski wrote: > I'm thinking for speed that binary and UTF-32 should be our internal > representations, at least for the data that gets handed to the regex > engine. Or at least we use a constant-width character that's 8 and 32 bits, > if I'm misusing UTF-32. (UTF-8 is variable-width--is UTF-32?) No. UTF-32 is always 4 bytes AIUI. UTF-8 is variable (1..4) and so is UTF-16 (either 2 or 4, though 4 bytes are needed only for characters > U+FFFF, i.e. outside the BMP or Basic Multilingual Plane). Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]> I appreciate copies of replies to this message
- String representation Simon Cozens
- Re: String representation Jarkko Hietaniemi
- Re: String representation Jarkko Hietaniemi
- Re: String representation Dan Sugalski
- Re: String representation Jarkko Hietaniemi
- Re: String representation Dan Sugalski
- Re: String representation Philip Newton
- Re: String representation Philip Newton
- Re: String representation Jarkko Hietaniemi
- Re: String representation Nicholas Clark
- Re: String representation Nick Ing-Simmons
- Re: String representation Kai Henningsen
- Re: String representation Nick Ing-Simmons
- Re: String representation Jarkko Hietaniemi
- Re: String representation Nick Ing-Simmons
- Re: String representation Jarkko Hietaniemi
- Re: String representation Philip Newton