On Friday, 7 March 2014 at 16:43:30 UTC, Dicebot wrote:
On Friday, 7 March 2014 at 16:18:06 UTC, Vladimir Panteleev
wrote:
Can we look at some example situations that this will break?
Any code that relies on countUntil to count dchar's? Or, to
generalize, almost any code that uses std.algorithm functions
with string?
This is a pretty fragile design in the first place, since we use
the same basic type (integers) to count two different things
(code units / code points). Code that relies on this behavior
would need to be explicitly tested with Unicode data to be sure
that it works correctly - otherwise, it will only appear at a
glance that it works right if it's only tested with ASCII.
Correct code where these indices never left the equation will not
be affected, e.g.:
auto s = "日本語";
auto x = s.countUntil("本語"); // was 1, will be 3
s = s.drop(x);
assert(s == "本語"); // still OK
Thinking about dstrings as character arrays is less flawed
only to a certain extent.
Sure. But I find this extent practical enough to make the
difference. It is good compromise between perfectly correct
(and very slow) string processing and having your program
unusable with anything but basic latin symbol set.
I think that if we are to draw a line somewhere on what to
support and not, the decision should not be embedded as deep into
the language. Ideally, it would be clearly visible in the code
that you are counting code points.