On Sunday, 13 October 2013 at 16:31:58 UTC, nickles wrote:
Well that's a point; on the other hand, D is constantly creating and throwing away new strings, so this isn't quite an argument. The current solution puts the programmer in charge of dealing with UTF-x, where a more consistent implementation would put the burden on the implementors of the libraries/core, i.e. the ones who usually have a better understanding of Unicode than the average programmer.

Ironically, reason is consistency. `string` is just `immutable(char)[]` and it conforms to usual array behavior rules. Saying that array element value assignment may allocate it hardly a good option.

So, how do you guys handle UTF-8 strings in D? What are your solutions to the problems described? Does it all come down to converting "string"s and "wstring"s to "dstring"s, manipulating them, and re-convert them to "string"s? Btw, what would this mean in terms of speed?

If single element access is needed, str.front yields decoded `dchar`. Or simple `foreach (dchar d; str)` - it won't hide the fact it is O(n) operation at least. As `str.front` yields dchar, most `std.algorithm` and `std.range` utilities will also work correctly on default UTF-8 strings.

Slicing / .length are probably the only operations that do not respect UTF-8 encoding (because they are exactly the same for all arrays).

Reply via email to