On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:

At 12:30 PM +0100 5/25/04, Nicholas Clark wrote:

I may be misremembering what I've read here but I thought that Dan said
that for variable length encodings (such as shift-JIS) parrot would store
the byte(s) in memory in constant size 16 or 32 bit integers, rather than
the (external) variable length byte sequence, as this gives O(1) random
access, and avoids much coding pain.


However, he made no explicit comment about UTF8 (just another variable
length encoding), which would imply that parrot will be storing UTF8 in
this way.

Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32.

This has the unfortunate side-effect of wasting 50-75% of the storage space in the common cases, of course.


JEff



Reply via email to