On Friday, 7 March 2014 at 16:43:30 UTC, Dicebot wrote:
On Friday, 7 March 2014 at 16:18:06 UTC, Vladimir Panteleev wrote:
Can we look at some example situations that this will break?

Any code that relies on countUntil to count dchar's? Or, to generalize, almost any code that uses std.algorithm functions with string?

This is a pretty fragile design in the first place, since we use the same basic type (integers) to count two different things (code units / code points). Code that relies on this behavior would need to be explicitly tested with Unicode data to be sure that it works correctly - otherwise, it will only appear at a glance that it works right if it's only tested with ASCII.

Correct code where these indices never left the equation will not be affected, e.g.:

auto s = "日本語";
auto x = s.countUntil("本語"); // was 1, will be 3
s = s.drop(x);
assert(s == "本語"); // still OK

Thinking about dstrings as character arrays is less flawed only to a certain extent.

Sure. But I find this extent practical enough to make the difference. It is good compromise between perfectly correct (and very slow) string processing and having your program unusable with anything but basic latin symbol set.

I think that if we are to draw a line somewhere on what to support and not, the decision should not be embedded as deep into the language. Ideally, it would be clearly visible in the code that you are counting code points.

Reply via email to