On Friday, 7 March 2014 at 22:27:35 UTC, H. S. Teoh wrote:
This illustrates one of my objections to Andrei's post: by
auto-decoding
behind the user's back and hiding the intricacies of unicode
from him,
it has masked the fact that codepoint-for-codepoint comparison
of a
unicode string is not guaranteed to always return the correct
results,
due to the possibility of non-normalized strings.
Basically, to have correct behaviour in all cases, the user
must be
aware of, and use, the Unicode collation / normalization
algorithms
prescribed by the Unicode standard. What we have in
std.algorithm right
now is an incomplete implementation with non-working edge cases
(like
Vladimir's example) that has poor performance to start with.
Its only
redeeming factor is that the auto-decoding hack has given it the
illusion of being correct, when actually it's not correct
according to
the Unicode standard. I don't see how this is necessarily
superior to
Walter's proposal.
T
Yes, I realised too late.
Would it not be beneficial to have different types of literals,
one type which is implicitly normalized and one which is
"raw"(like today)? Since typically you'd want to normalize most
string literals at compile-time, then you only have to normalize
external input at run-time.