Re: String representation

Nick Ing-Simmons Mon, 18 Dec 2000 11:05:19 -0800
Nicholas Clark <[EMAIL PROTECTED]> writes:
>On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote:
>
>> As painful as it may sound (codingwise) I would urge to spare some
>> thought to using (internally) UTF-32 for those encodings for which
>> UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).
>
>most CPUs can load a 32 bit quantity in 1 machine instruction
>most CPUs would take 2 or 3 machine instructions to load 2 or 3 bytes of
>variable length encoding, and I'd guess that on most RISC CPUs those
>three instructions take three times the space, 

Okay so far.

>(and take 3 times the
>single load instruction)

Almost certainly more than the single load, but much less than 3 
due to cache effects.

>And that's ignoring the code to bit shuffle those bytes that make up the
>character.
>
>So it may be more total space efficient to use 32 bits for data.
>And although it feels like we'll be shifting 32 bits of data round per
>character instead of 8-40 with an average less than 32, it might still take
>longer because we're doing it less efficiently.

My big worry is that "strings" are would fill the data cache much more quickly.


>
>Just a passing thought. Extrapolated up from 1 RISC CPU I know quite well.
>
>Nicholas Clark
-- 
Nick Ing-Simmons
Re: String representation

Reply via email to