Re: Making all strings UTF ranges has some risk of WTF

Andrei Alexandrescu Thu, 04 Feb 2010 17:55:12 -0800

Rainer Deyke wrote:

Andrei Alexandrescu wrote:

One idea I've had for a while was to have a universal string type:


struct UString {
    union {
        char[] utf8;
        wchar[] utf16;
        dchar[] utf32;
    }
    enum Discriminator { utf8, utf16, utf32 };
    Discriminator kind;
    IntervalTree!(size_t) skip;
    ...
}

The IntervalTree stores the skip amounts that must be added for a given
index in the string. For ASCII strings that would be null. Then its size
grows with the number of multibyte characters. Beyond a threshold,
representation is transparently switched to utf16 or utf32 as needed and
the tree becomes smaller or null again.


Although I see some potential in a universal string type, I don't think
this is the right implementation strategy.  I'd rather have my short
strings in utf-32 (optimized for speed) and my long strings in
utf-8/utf-16 (optimized for memory usage).

The definition I outlined does not specify or constrain the strategy ofchanging the discriminator.


Andrei

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to