Dan Sugalski <[EMAIL PROTECTED]> writes:
> At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote:

>> (As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable
>> byte encoding, with each character taking up anywhere from one to six
>> bytes in the encoded form depending on where in Unicode the character
>> falls.)

> Have they changed that again? Last I checked, UTF-8 was capped at 4
> bytes, but that's in the Unicode 3.0 standard.

Yes, it changed with Unicode 3.1 when they started allocating characters
from higher planes.

Far and away the best reference for UTF-8 that I've found is RFC 2279.
It's much more concise and readable than the version in the Unicode
standard, and is more aimed at implementors and practical considerations.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to