On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
Ok, I understand, that "length" is - obviously - used in analogy to any array's length value.

Still, this seems to be inconsistent. D elaborates on implementing "char"s as UTF-8 which means that a "char" in D can be of any length between 1 and 4 bytes for an arbitrary Unicode code point. Shouldn't then this (i.e. the character's length) be the "unit of measurement" for "char"s - like e.g. the size of the underlying struct in an array of "struct"s? The story continues with indexing "string"s: In a consistent implementation, shouldn't

   writeln("säд"[2])

return "д" instead of the trailing surrogate of this cyrillic letter?

This is impossible given current design. At runtime "säд"[2] is viewed as struct { void *ptr; size_t length; }; ptr points to memory having at least five bytes and length having value 5. Druntime hasn't taken UTF course.

One option would be to add support in druntime so it can correctly handle such strings, or implement separate string type which does not default to char[], but of course the easiest way is to convince everybody that everything is OK and advice to use some library function which does the job correctly essentially implying that the language does the job wrong (pardon me, some D skepticism, the deeper I am in it, the more critically view it).

Reply via email to