On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
Ok, I understand, that "length" is - obviously - used in
analogy to any array's length value.
Still, this seems to be inconsistent. D elaborates on
implementing "char"s as UTF-8 which means that a "char" in D
can be of any length between 1 and 4 bytes for an arbitrary
Unicode code point. Shouldn't then this (i.e. the character's
length) be the "unit of measurement" for "char"s - like e.g.
the size of the underlying struct in an array of "struct"s? The
story continues with indexing "string"s: In a consistent
implementation, shouldn't
writeln("säд"[2])
return "д" instead of the trailing surrogate of this cyrillic
letter?
This is impossible given current design. At runtime "säд"[2] is
viewed as struct { void *ptr; size_t length; }; ptr points to
memory having at least five bytes and length having value 5.
Druntime hasn't taken UTF course.
One option would be to add support in druntime so it can
correctly handle such strings, or implement separate string type
which does not default to char[], but of course the easiest way
is to convince everybody that everything is OK and advice to use
some library function which does the job correctly essentially
implying that the language does the job wrong (pardon me, some D
skepticism, the deeper I am in it, the more critically view it).