On Wednesday, 28 December 2011 at 22:39:15 UTC, Timon Gehr wrote:
On 12/28/2011 11:12 PM, foobar wrote:
On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr wrote:

I was educated enough not to make that mistake, because I read the entire language specification before deciding the language was awesome and downloading the compiler. I find it strange that the product should be made less usable because we do not expect users to read the
manual. But it is of course a valid point.


That's awfully optimistic to expect people to read the manual.


Well, if the alternative is slowly butchering the language I will be awfully optimistic about it all day long.

There is nothing wrong with operating at the code unit level.
Efficient slicing is very desirable.


I agree that it's useful. It is however the incorrect abstraction level when you need a "string" which is by far the common case in user code.

I would not go as far as to call it 'incorrect'.

i.e. if I need a name variable in a class: codeUnit[] name; // bug!
string Name; // correct


From a pragmatic viewpoint it does not matter because if string is used like this, then codeUnit[] does exactly the same thing. Nobody forces anyone to index or slice into a string variable when they don't need that functionality. All engineers have to work with leaky abstractions. Why is it such a big deal?


I expect that most uses of code-unit arrays should be in the standard library anyway since it provides the string manipulation routines. It all boils down to making the common case trivial and the rare case possible. You can use the underlying data structure (code units) if you need it but the default "string" is what people expect when thinking about what such a type does (a string of letters). D's already 80% there
since Phobos already treats strings as bi-directional ranges of
code-points which is much closer to the mental image of a string of letters, so I think this is about bringing the current design to its
final conclusion.


Well, that mental image is just not the right one when dealing with Unicode.


Exactly. It is acting less and less like an array of code units. But it *is* an array of code units. If the general consensus is that we need a string data type that acts at a different abstraction level by default (with which I'd disagree, but apparently I don't have a popular opinion here), then we need a string type in the standard library to do that. Changing the language so that an array of code units stops behaving like an array of code units is not a solution.


I agree that we should not break T[] for any T and instead introduce a library type. While I personally believe that such a change will expose hidden bugs (certainly when unaware programmers treat string as ASCII and the product is later on localized), it's a big disturbance in people's code and it's worth a consideration if the benefit worth the costs. Perhaps, some middle ground could be found such that existing code can rely on existing behavior and the new library type will be an
opt-in.

What will such a type offer, except that it disallows indexing and slicing?


From a pragmatic view point people can also continue programming in C++ instead of investing a lot of effort learning a new language.

The only difference between programming languages is the human interface aspect. Anything you can program with D you could also do in assembly yet you prefer D because it's more convenient. In that regard, a code-unit array is definitely worse than a string type.

A programmer can choose to either change his 'naive' mental image or change the programming language. Most will do the latter. Computers need to adapt and be human friendly, not vice-versa.

Reply via email to