On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote:
> On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote:
> > At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote:
> > >
> > >As painful as it may sound (codingwise) I would urge to spare some
> > >thought to using (internally) UTF-32 for those encodings for which
> > >UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).
> >
> > If we can manage it, I'd prefer to not have a preferred internal
>
> I didn't mean 'preferred', I meant that if UTF-8 would be longer for
> some encodings, both for space *and* speed using straight honest UTF-32
> would make much more sense.
Are you confusing UTF-32 and UTF-16 here? As I understand it, UTF-32
always uses four bytes, while UTF-8 only needs three bytes max for
characters from U+0000 to U+FFFF. However, UTF-8 is longer than UTF-16 for
characters gt U+07FF (but catches up again for U+10000 to U+10FFFF: both
encodings need four bytes for characters in that range because of
UTF-16's surrogate encoding).
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>