At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote:
>Bart Lateur <[EMAIL PROTECTED]> writes:
> > UTF-8 is NOT limited to 16 bits (3 bytes).
>
>That's an odd definition of byte you have there.  :)

Maybe it's RAD50. :) Still, it may take 3 bytes to represent in UTF-8 a 
character that takes 2 bytes in UTF-16.

> > With 4 bytes, UTF-8 can represent 20 bit charatcers, i.e. 6 times more
> > than the "desired number" of 170000.
>
>UTF-8 is a mapping from a 31-bit (yes, not 32, interestingly enough)
>character numbering, and as such can represent over two billion
>characters.  For some reason that I've never understood, the Unicode folks
>are limiting that to only a subset of what one can do with 31 bits by
>putting an artificial limit on how high of character values they're
>willing to assign, but even with that as soon as they started using the
>higher planes, there's easily enough space to add every character the
>author mentioned and then some.

Yeah, the limitations are kind of odd. I'm presuming they're in there so 
the technical folks have at least some sort of a stick to smack the 
crankier non-technical folks with.

>(As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable
>byte encoding, with each character taking up anywhere from one to six
>bytes in the encoded form depending on where in Unicode the character
>falls.)

Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, 
but that's in the Unicode 3.0 standard.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to