At 04:44 PM 6/5/2001 -0700, Larry Wall wrote:
>Dan Sugalski writes:
>: Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes,
>: but that's in the Unicode 3.0 standard.
>
>Doesn't really matter where they install the artificial cap, because
>for philosophical reasons Perl is gonna support larger values anyway.
>It's just that 4 bytes of UTF-8 happens to be large enough to represent
>anything UTF-16 can represent with surrogates.  So they refuse to
>believe in anything longer than 4 bytes, even though the representation
>can be extended much further.  (Perl 5 extends it all the way to 64-bit
>values, represented in 13 bytes!)

I know we can, but is it really a good idea? 32 bits is really stretching 
it for character encoding, and 64 seems rather excessive. Really 
space-wasteful as well, if we maintain a character type with a fixed width 
large enough to hold the largest decoded variable-width character. And I 
really, *really* want to do as little as possible internally with 
variable-width encodings. Yech.

>They also arbitrarily define UTF-32 to not use higher values than
>0x10ffff, but that doesn't mean we're gonna send in the high-bit Nazis
>if people want higher values for their own purposes.

Well, that'd be inappropriate since a good chunk of the rest of the set's 
been dedicated to future expansion. I think it might be a reasonable idea 
for -w to grumble if someone's used a character in the unassigned range, 
though. (IIRC there's a piece set aside for folks to do whatever they want 
with)

>But since the names UTF-8 and UTF-32 are becoming associated with those
>arbitrary restrictions, it's getting even more important to refer to
>Perl's looser style as utf8 (and, potentially, utf32).  I don't know
>if Perl will have a utf16 that is distinguised from UTF-16.

I'd as soon not do UTF-16 at all, or at least no more than we need to 
convert to UTF-32 or UTF-8.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to