Re: String representation

Jarkko Hietaniemi Mon, 18 Dec 2000 07:26:40 -0800
On Mon, Dec 18, 2000 at 10:30:53AM -0500, Philip Newton wrote:
> On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote:
> 
> > On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote:
> > > At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote:
> > > >
> > > >As painful as it may sound (codingwise) I would urge to spare some
> > > >thought to using (internally) UTF-32 for those encodings for which
> > > >UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).
> > > 
> > > If we can manage it, I'd prefer to not have a preferred internal 
> > 
> > I didn't mean 'preferred', I meant that if UTF-8 would be longer for
> > some encodings, both for space *and* speed using straight honest UTF-32
> > would make much more sense.
> 
> Are you confusing UTF-32 and UTF-16 here? As I understand it, UTF-32
> always uses four bytes, while UTF-8 only needs three bytes max for
> characters from U+0000 to U+FFFF. However, UTF-8 is longer than UTF-16 for
> characters gt U+07FF (but catches up again for U+10000 to U+10FFFF: both
> encodings need four bytes for characters in that range because of
> UTF-16's surrogate encoding).

Darn, foiled again in my evil plan to confuse everybody about Unicode issues.

> Cheers,
> Philip
> -- 
> Philip Newton <[EMAIL PROTECTED]>

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen
Re: String representation

Reply via email to