On 3/6/14, 6:37 PM, Walter Bright wrote:
In "Lots of low hanging fruit in Phobos" the issue came up about the
automatic encoding and decoding of char ranges.
[snip]
Is there any hope of fixing this?

There's nothing to fix.

Allow me to enumerate the functions of std.algorithm and how they work today and how they'd work with the proposed change. Let s be a variable of some string type.

1.

s.all!(x => x == 'é') currently works as expected. Proposed: fails silently.

2.

s.any!(x => x == 'é') currently works as expected. Proposed: fails silently.

3.

s.canFind!(x => x == 'é') currently works as expected. Proposed: fails silently.

4.

s.canFind('é') currently works as expected. Proposed: fails silently.

5.

s.count() currently works as expected. Proposed: fails silently.

6.

s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é") currently works as expected (with the known issues of lowercase conversion). Proposed: fails silently.

7.

s.count('é') currently works as expected. Proposed: fails silently.

8.

s.countUntil("a") currently work as expected. Proposed: fails silently. This applies to all variations of countUntil.

9.

s.endsWith('é') currently works as expected. Proposed: fails silently.

10.

s.find('é') currently works as expected. Proposed: fails silently. This applies to other variations of find that include custom predicates.

11.

...

I went down std.algorithm in the order listed in its documentation and found pernicious issues with almost every single algorithm.

I designed the range behavior of strings after much thinking and consideration back in the day when I designed std.algorithm. It was painfully obvious (but it seems to have been forgotten now that it's working so well) that approaching strings as arrays of char[] would break almost every single algorithm leaving us essentially in the pre-UTF C++aveman era.

Making strings bidirectional ranges has been a very good choice within the constraints. There was already a string type, and that was immutable(char)[], and a bunch of code depended on that definition.

Clearly one might argue that their app has no business dealing with diacriticals or Asian characters. But that's the typical provincial view that marred many languages' approach to UTF and internationalization. If you know your string is ASCII, the remedy is simple - don't use char[] and friends. From day 1, the type "char" was meant to mean "code unit of UTF characters".

So please ponder the above before going to do surgery on the patient that's going to kill him.


Andrei

Reply via email to