Re: Inconsitency

Dicebot Sun, 13 Oct 2013 10:06:42 -0700

On Sunday, 13 October 2013 at 16:31:58 UTC, nickles wrote:

Well that's a point; on the other hand, D is constantlycreating and throwing away new strings, so this isn't quite anargument. The current solution puts the programmer in charge ofdealing with UTF-x, where a more consistent implementationwould put the burden on the implementors of the libraries/core,i.e. the ones who usually have a better understanding ofUnicode than the average programmer.

Ironically, reason is consistency. `string` is just`immutable(char)[]` and it conforms to usual array behaviorrules. Saying that array element value assignment may allocate ithardly a good option.

So, how do you guys handle UTF-8 strings in D? What are yoursolutions to the problems described? Does it all come down toconverting "string"s and "wstring"s to "dstring"s, manipulatingthem, and re-convert them to "string"s? Btw, what would thismean in terms of speed?

If single element access is needed, str.front yields decoded`dchar`. Or simple `foreach (dchar d; str)` - it won't hide thefact it is O(n) operation at least. As `str.front` yields dchar,most `std.algorithm` and `std.range` utilities will also workcorrectly on default UTF-8 strings.

Slicing / .length are probably the only operations that do notrespect UTF-8 encoding (because they are exactly the same for allarrays).

Re: Inconsitency

Reply via email to