Andrei Alexandrescu wrote:
Rainer Deyke wrote:
Don wrote:
I suspect that string, wstring should have been the primary types and
had a .codepoints property, which returned a ubyte[] resp. ushort[]
reference to the data. It's too late, of course. The extra value you get
by having a specific type for 'this is a code point for a UTF8 string'
seems to be very minor, compared to just using a ubyte.
If it's not too late to completely change the semantics of char[], then
it's also not too late to dump 'char' completely. If it /is/ too late
to remove 'char', then 'char[]' should retain the current semantics and
a new string type should be added for the new semantics.
One idea I've had for a while was to have a universal string type:
struct UString {
union {
char[] utf8;
wchar[] utf16;
dchar[] utf32;
}
enum Discriminator { utf8, utf16, utf32 };
Discriminator kind;
IntervalTree!(size_t) skip;
...
}
You mean like this?
http://www.dprogramming.com/mtext.php