On 12/28/2011 11:12 PM, foobar wrote:
On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr wrote:

I was educated enough not to make that mistake, because I read the
entire language specification before deciding the language was awesome
and downloading the compiler. I find it strange that the product
should be made less usable because we do not expect users to read the
manual. But it is of course a valid point.


That's awfully optimistic to expect people to read the manual.


Well, if the alternative is slowly butchering the language I will be awfully optimistic about it all day long.

There is nothing wrong with operating at the code unit level.
Efficient slicing is very desirable.


I agree that it's useful. It is however the incorrect abstraction level
when you need a "string" which is by far the common case in user code.

I would not go as far as to call it 'incorrect'.

i.e. if I need a name variable in a class: codeUnit[] name; // bug!
string Name; // correct


From a pragmatic viewpoint it does not matter because if string is used like this, then codeUnit[] does exactly the same thing. Nobody forces anyone to index or slice into a string variable when they don't need that functionality. All engineers have to work with leaky abstractions. Why is it such a big deal?


I expect that most uses of code-unit arrays should be in the standard
library anyway since it provides the string manipulation routines. It
all boils down to making the common case trivial and the rare case
possible.  You can use the underlying data structure (code units) if you
need it but the default "string" is what people expect when thinking
about what such a type does (a string of letters). D's already 80% there
since Phobos already treats strings as bi-directional ranges of
code-points which is much closer to the mental image of a string of
letters, so I think this is about bringing the current design to its
final conclusion.


Well, that mental image is just not the right one when dealing with Unicode.


Exactly. It is acting less and less like an array of code units. But
it *is* an array of code units. If the general consensus is that we
need a string data type that acts at a different abstraction level by
default (with which I'd disagree, but apparently I don't have a
popular opinion here), then we need a string type in the standard
library to do that. Changing the language so that an array of code
units stops behaving like an array of code units is not a solution.


I agree that we should not break T[] for any T and instead introduce a
library type. While I personally believe that such a change will expose
hidden bugs (certainly when unaware programmers treat string as ASCII
and the product is later on localized), it's a big disturbance in
people's code and it's worth a consideration if the benefit worth the
costs. Perhaps, some middle ground could be found such that existing
code can rely on existing behavior and the new library type will be an
opt-in.

What will such a type offer, except that it disallows indexing and slicing?

Reply via email to