On 3/6/14, 6:37 PM, Walter Bright wrote:
In "Lots of low hanging fruit in Phobos" the issue came up about the
automatic encoding and decoding of char ranges.
[snip]
Is there any hope of fixing this?
There's nothing to fix.
Allow me to enumerate the functions of std.algorithm and how they work
today and how they'd work with the proposed change. Let s be a variable
of some string type.
1.
s.all!(x => x == 'é') currently works as expected. Proposed: fails silently.
2.
s.any!(x => x == 'é') currently works as expected. Proposed: fails silently.
3.
s.canFind!(x => x == 'é') currently works as expected. Proposed: fails
silently.
4.
s.canFind('é') currently works as expected. Proposed: fails silently.
5.
s.count() currently works as expected. Proposed: fails silently.
6.
s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é")
currently works as expected (with the known issues of lowercase
conversion). Proposed: fails silently.
7.
s.count('é') currently works as expected. Proposed: fails silently.
8.
s.countUntil("a") currently work as expected. Proposed: fails silently.
This applies to all variations of countUntil.
9.
s.endsWith('é') currently works as expected. Proposed: fails silently.
10.
s.find('é') currently works as expected. Proposed: fails silently. This
applies to other variations of find that include custom predicates.
11.
...
I went down std.algorithm in the order listed in its documentation and
found pernicious issues with almost every single algorithm.
I designed the range behavior of strings after much thinking and
consideration back in the day when I designed std.algorithm. It was
painfully obvious (but it seems to have been forgotten now that it's
working so well) that approaching strings as arrays of char[] would
break almost every single algorithm leaving us essentially in the
pre-UTF C++aveman era.
Making strings bidirectional ranges has been a very good choice within
the constraints. There was already a string type, and that was
immutable(char)[], and a bunch of code depended on that definition.
Clearly one might argue that their app has no business dealing with
diacriticals or Asian characters. But that's the typical provincial view
that marred many languages' approach to UTF and internationalization. If
you know your string is ASCII, the remedy is simple - don't use char[]
and friends. From day 1, the type "char" was meant to mean "code unit of
UTF characters".
So please ponder the above before going to do surgery on the patient
that's going to kill him.
Andrei