Re: Why the hell doesn't foreach decode strings

Peter Alexander Fri, 21 Oct 2011 11:06:09 -0700

On 21/10/11 3:26 AM, Walter Bright wrote:

On 10/20/2011 2:49 PM, Peter Alexander wrote:

The whole mess is caused by conflating the idea of an array with a
variable
length encoding that happens to use an array for storage. I don't
believe there
is any clean and tidy way to fix the problem without breaking
compatibility.


There is no 'fixing' it, even to break compatibility. Sometimes you want
to look at an array of utf8 as 8 bit characters, and sometimes as 20 bit
dchars. Someone will be dissatisfied no matter what.


Then separate those ways of viewing strings.

Here's one solution that I believe would satisfy everyone:

1. Remove the string, wstring and dstring aliases. An array of charshould be an array of char, i.e. the same as array of byte. Same forarrays of wchar and dchar. This way, arrays of T have no subtledifferences for certain kinds of T.


2. Add string, wstring and dstring structs with the following interface:

 a. foreach should iterate as dchar.
 b. @property front() would be dchar.
 c. @property length() would not exist.

d. @property buffer() returns the underlying immutable array of char,wchar etc.

 e. Remove opIndex and co.

What this does:
- Makes all array types consistent and intuitive.
- Makes looping over strings do the expected thing.

- Provides an interface to the underlying 8-bit chars for those thatwant it.

Of course, people will still need to understand UTF-8. I don't thinkthat's a problem. It's unreasonable to expect the language to do thethinking for you. The problem is that we have people that *do*understand UTF-8 (like the OP), but *don't* understand D's strings.

Re: Why the hell doesn't foreach decode strings

Reply via email to