Andrei Alexandrescu wrote:
Rainer Deyke wrote:
Don wrote:
I suspect that string, wstring should have been the primary types and
had a .codepoints property, which returned a ubyte[] resp. ushort[]
reference to the data. It's too late, of course. The extra value you get
by having a specific type for 'this is a code point for a UTF8 string'
seems to be very minor, compared to just using a ubyte.

If it's not too late to completely change the semantics of char[], then
it's also not too late to dump 'char' completely.  If it /is/ too late
to remove 'char', then 'char[]' should retain the current semantics and
a new string type should be added for the new semantics.

One idea I've had for a while was to have a universal string type:

struct UString {
    union {
        char[] utf8;
        wchar[] utf16;
        dchar[] utf32;
    }
    enum Discriminator { utf8, utf16, utf32 };
    Discriminator kind;
    IntervalTree!(size_t) skip;
    ...
}

You mean like this?
http://www.dprogramming.com/mtext.php

Reply via email to