On Friday, 30 December 2011 at 19:55:45 UTC, Timon Gehr wrote:
I think the way we have it now is optimal. The only reason we
are discussing this is because of fear that uneducated users
will write code that does not take into account Unicode
characters above code point 0x80. But what is the worst thing
that can happen?
1. They don't notice. Then it is not a problem, because they
are obviously only using ASCII characters and it is perfectly
reasonable to assume that code units and characters are the
same thing.
2. They get screwed up string output, look for the reason,
patch up their code with some functions from std.utf and will
never make the same mistakes again.
I have *never* seen an user in D.learn complain about it. They
might have been some I missed, but it is certainly not a
prevalent problem. Also, just because an user can type .rep
does not mean he understands Unicode: He is able to make just
the same mistakes as before, even more so, as the array he is
getting back has the _wrong element type_.
I strongly agree with this. It would be nice to have everything
be simple, work correctly *and* efficiently at the same time, but
I don't believe the proposed changes make a definite improvement.
In the end, if you don't want to use the standard library or
other UTF-aware string libraries, you'll have to know the basics
of UTF to write the correct code. I too wish it was harder to
write it incorrectly, but the current solution is simply the best
one to appear yet.