On Friday, 30 December 2011 at 19:55:45 UTC, Timon Gehr wrote:
I think the way we have it now is optimal. The only reason we are discussing this is because of fear that uneducated users will write code that does not take into account Unicode characters above code point 0x80. But what is the worst thing that can happen?

1. They don't notice. Then it is not a problem, because they are obviously only using ASCII characters and it is perfectly reasonable to assume that code units and characters are the same thing.

2. They get screwed up string output, look for the reason, patch up their code with some functions from std.utf and will never make the same mistakes again.


I have *never* seen an user in D.learn complain about it. They might have been some I missed, but it is certainly not a prevalent problem. Also, just because an user can type .rep does not mean he understands Unicode: He is able to make just the same mistakes as before, even more so, as the array he is getting back has the _wrong element type_.

I strongly agree with this. It would be nice to have everything be simple, work correctly *and* efficiently at the same time, but I don't believe the proposed changes make a definite improvement.

In the end, if you don't want to use the standard library or other UTF-aware string libraries, you'll have to know the basics of UTF to write the correct code. I too wish it was harder to write it incorrectly, but the current solution is simply the best one to appear yet.

Reply via email to