On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote:
On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu wrote:
Allow me to enumerate the functions of std.algorithm and how they work today and how they'd work with the proposed change. Let s be a variable of some string type.

s.canFind('é') currently works as expected.

No, it doesn't.

import std.algorithm;

void main()
{
    auto s = "cassé";
    assert(s.canFind('é'));
}

That's the whole problem - all this hot steam and it still does not work properly. Because it can't - not without pulling in all of the Unicode algorithms implicitly, and that would be much worse.

I went down std.algorithm in the order listed in its documentation and found pernicious issues with almost every single algorithm.

All of your examples are variations of one and the same case: searching for a non-ASCII dchar or dchar literal.

How often does this pattern occur in real programs? I think the only real metric is to try the change and find out.

Clearly one might argue that their app has no business dealing with diacriticals or Asian characters. But that's the typical provincial view that marred many languages' approach to UTF and internationalization.

So is yours, if you think that making everything magically a dchar is going to solve all problems.

The TDPL example only showcases the problem. Yes, it works with Swedish. Now try it again with Sanskrit.

+1
In Indian languages, a character consists of one or more UNICODE code points. For example, in Sanskrit "ddhrya" http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg consists of 7 UNICODE code points. So to search for this char I have to use string search.

- Sarath

Reply via email to