On Thursday, June 02, 2016 15:05:44 Andrei Alexandrescu via Digitalmars-d wrote: > The intent of autodecoding was to make std.algorithm work meaningfully > with strings. As it's easy to see I just went through > std.algorithm.searching alphabetically and found issues literally with > every primitive in there. It's an easy exercise to go forth with the others.
It comes down to the question of whether it's better to fail quickly when Unicode is handled incorrectly so that it's obvious that you're doing it wrong, or whether it's better for it to work in a large number of cases so that for a lot of code it "just works" but is still wrong in the general case, and it's a lot less obvious that that's the case, so many folks won't realize that they need to do more in order to have their string handling be Unicode-correct. With code units - especially UTF-8 - it becomes obvious very quickly that treating each element of the string/range as a character is wrong. With code points, you have to work far harder to find examples that are incorrect. So, it's not at all obvious (especially to the lay programmer) that the Unicode handling is incorrect and that their code is wrong - but their code will end up working a large percentage of the time in spite of it being wrong in the general case. So, yes, it's trivial to show how operating on ranges of code units as if they were characters gives incorrect results far more easily than operating on ranges of code points does. But operating on code points as if they were characters is still going to give incorrect results in the general case. Regardless of auto-decoding, the anwser is that the programmer needs to understand the Unicode issues and use ranges of code units or code points where appropriate and use ranges of graphemes where appropriate. It's just that if we default to handling code points, then a lot of code will be written which treats those as characters, and it will provide the correct result more often than it would if it treated code units as characters. In any case, I've probably posted too much in this thread already. It's clear that the first step to solving this problem is to improve Phobos so that it handles ranges of code units, code points, and graphemes correctly whether auto-decoding is involved or not, and only then can we consider the possibility of removing auto-decoding (and even then, the answer may still be that we're stuck, because we consider the resulting code breakage to be too great). But whether Phobos retains auto-decoding or not, the Unicode handling stuff in general is the same, and what we need to do to improve the siutation is the same. So, clearly, I need to do a much better job of finding time to work on D so that I can create some PRs to help the situation. Unfortunately, it's far easier to find a few minutes here and there while waiting on other stuff to shoot off a post or two in the newsgroup than it is to find time to substantively work on code. :| - Jonathan M Davis