On 21/10/11 3:26 AM, Walter Bright wrote:
On 10/20/2011 2:49 PM, Peter Alexander wrote:
The whole mess is caused by conflating the idea of an array with a
variable
length encoding that happens to use an array for storage. I don't
believe there
is any clean and tidy way to fix the problem without breaking
compatibility.

There is no 'fixing' it, even to break compatibility. Sometimes you want
to look at an array of utf8 as 8 bit characters, and sometimes as 20 bit
dchars. Someone will be dissatisfied no matter what.

Then separate those ways of viewing strings.

Here's one solution that I believe would satisfy everyone:

1. Remove the string, wstring and dstring aliases. An array of char should be an array of char, i.e. the same as array of byte. Same for arrays of wchar and dchar. This way, arrays of T have no subtle differences for certain kinds of T.

2. Add string, wstring and dstring structs with the following interface:

 a. foreach should iterate as dchar.
 b. @property front() would be dchar.
 c. @property length() would not exist.
d. @property buffer() returns the underlying immutable array of char, wchar etc.
 e. Remove opIndex and co.

What this does:
- Makes all array types consistent and intuitive.
- Makes looping over strings do the expected thing.
- Provides an interface to the underlying 8-bit chars for those that want it.


Of course, people will still need to understand UTF-8. I don't think that's a problem. It's unreasonable to expect the language to do the thinking for you. The problem is that we have people that *do* understand UTF-8 (like the OP), but *don't* understand D's strings.

Reply via email to