On Friday, 7 March 2014 at 22:27:35 UTC, H. S. Teoh wrote:
This illustrates one of my objections to Andrei's post: by auto-decoding behind the user's back and hiding the intricacies of unicode from him, it has masked the fact that codepoint-for-codepoint comparison of a unicode string is not guaranteed to always return the correct results,
due to the possibility of non-normalized strings.

Basically, to have correct behaviour in all cases, the user must be aware of, and use, the Unicode collation / normalization algorithms prescribed by the Unicode standard. What we have in std.algorithm right now is an incomplete implementation with non-working edge cases (like Vladimir's example) that has poor performance to start with. Its only
redeeming factor is that the auto-decoding hack has given it the
illusion of being correct, when actually it's not correct according to the Unicode standard. I don't see how this is necessarily superior to
Walter's proposal.


T

Yes, I realised too late.

Would it not be beneficial to have different types of literals, one type which is implicitly normalized and one which is "raw"(like today)? Since typically you'd want to normalize most string literals at compile-time, then you only have to normalize external input at run-time.

Reply via email to