Re: Making all strings UTF ranges has some risk of WTF

grauzone Thu, 04 Feb 2010 17:20:28 -0800

Andrei Alexandrescu wrote:

Rainer Deyke wrote:

Don wrote:

I suspect that string, wstring should have been the primary types and
had a .codepoints property, which returned a ubyte[] resp. ushort[]
reference to the data. It's too late, of course. The extra value you get
by having a specific type for 'this is a code point for a UTF8 string'
seems to be very minor, compared to just using a ubyte.


If it's not too late to completely change the semantics of char[], then
it's also not too late to dump 'char' completely.  If it /is/ too late
to remove 'char', then 'char[]' should retain the current semantics and
a new string type should be added for the new semantics.


One idea I've had for a while was to have a universal string type:

struct UString {
    union {
        char[] utf8;
        wchar[] utf16;
        dchar[] utf32;
    }
    enum Discriminator { utf8, utf16, utf32 };
    Discriminator kind;
    IntervalTree!(size_t) skip;
    ...
}


You mean like this?
http://www.dprogramming.com/mtext.php

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to