Ok, I understand, that "length" is - obviously - used in analogy to any array's length value.

Still, this seems to be inconsistent. D elaborates on implementing "char"s as UTF-8 which means that a "char" in D can be of any length between 1 and 4 bytes for an arbitrary Unicode code point. Shouldn't then this (i.e. the character's length) be the "unit of measurement" for "char"s - like e.g. the size of the underlying struct in an array of "struct"s? The story continues with indexing "string"s: In a consistent implementation, shouldn't

   writeln("säд"[2])

return "д" instead of the trailing surrogate of this cyrillic letter? Btw. how do YOU implement this for "string" (for "dstring" it works - logically, for "wstring" the same problem arises for code points above D800)?

Also, I understand, that there is the std.utf.count() function which returns the length that I was searching for. However, why - if D is so UTF-8-centric - isn't this function implemented in the core like ".length"?

Reply via email to