On Fri, Mar 07, 2014 at 09:58:39PM +0000, Vladimir Panteleev wrote:
> On Friday, 7 March 2014 at 21:56:45 UTC, Eyrk wrote:
> >On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote:
> >>No, it doesn't.
> >>
> >>import std.algorithm;
> >>
> >>void main()
> >>{
> >>   auto s = "cassé";
> >>   assert(s.canFind('é'));
> >>}
> >>
> >
> >Hm, I'm not following? Works perfectly fine on my system?

Probably because your browser is normalizing the unicode string when you
copy-n-paste Vladimir's message? See below:

> Something's messing with your Unicode. Try downloading and compiling
> this file:
> http://dump.thecybershadow.net/6f82ea151c1a00835cbcf5baaace2801/test.d

I downloaded the file and looked at it through `od -ctx1`: the first é
is encoded as the byte sequence 65 cc 81, that is, [U+65, U+301] (small
letter e + combining diacritic acute accent), whereas the second é is
encoded as c3 a9, that is, U+E9 (precomposed small letter e with acute

This illustrates one of my objections to Andrei's post: by auto-decoding
behind the user's back and hiding the intricacies of unicode from him,
it has masked the fact that codepoint-for-codepoint comparison of a
unicode string is not guaranteed to always return the correct results,
due to the possibility of non-normalized strings.

Basically, to have correct behaviour in all cases, the user must be
aware of, and use, the Unicode collation / normalization algorithms
prescribed by the Unicode standard. What we have in std.algorithm right
now is an incomplete implementation with non-working edge cases (like
Vladimir's example) that has poor performance to start with. Its only
redeeming factor is that the auto-decoding hack has given it the
illusion of being correct, when actually it's not correct according to
the Unicode standard. I don't see how this is necessarily superior to
Walter's proposal.


Just because you survived after you did it, doesn't mean it wasn't stupid!

Reply via email to