On Monday, 30 May 2016 at 21:51:36 UTC, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length` and `opIndex` work,

Why? strings are arrays of code units.

So, strings are _implemented_ as arrays of code units. But indiscriminately treating them as such in all situations leads to wrong results (just like arrays of code points would).

In an ideal world, the programs someone intuitively writes will do the right thing, and if they can't, they at least refuse to compile. If we agree that it's up to the user whether to iterate over a string by code unit or code points or graphemes, and that we shouldn't arbitrarily choose one of those (except when we know that it's what the user wants), then the same applies to indexing, slicing and counting.

On the other hand, changing such low-level things will likely be impractical, that's why I said "In an ideal world".

All the trouble comes from erratically pretending otherwise.

For me, the trouble comes from pretending otherwise _without being told to_.

To make sure there are no misunderstandings, here is what is suggested as an alternative to the current situation:

* `char[]`, `wchar[]` (and `dchar[]`?) no longer pass `isInputRange`. * Ranges with element type `char`, `wchar`, and `dchar` do pass `isInputRange`. * A bunch of rangeifying helpers are added to `std.string` (I believe they are already there): `byCodePoint`, `byCodeUnit`, `byChar`, `byWchar`, `byDchar`, ... * Algorithms like `find`, `join(er)` get overloads that accept char slices directly.
* Built-in operators and `length` of char slices are unchanged.

Advantages:

* Algorithms that can work _correctly_ without any kind of decoding will do so. * Algorithms that would yield incorrect results won't compile, requiring the user to make a decision regarding the desired element type.
* No auto-decoding.
  => Best performance depending on the actual requirements.
=> No results that look correct when tested with only precomposed characters but are wrong in the general case.
* Behaviour of [] and .length is no worse than today.

Reply via email to