On Mon, 19 Jul 2010 16:04:21 -0400, Walter Bright <newshou...@digitalmars.com> wrote:

bearophile wrote:
This odd post comes from reading the nice part about strings of chapter 4 of TDPL. In the last few years I have seen changes in how D strings are meant and managed, changes that make them less and less like arrays (random-access
sequences of mutable code units) and more and more what they are at high
level (immutable bidirectional sequences of code points).

Strings in D are deliberately meant to be arrays, not special things. Other languages make them special because they have insufficiently powerful arrays.

Andrei is changing that. Already, isRandomAccessRange!(string) == false. I kind of don't like this direction, even though its clever. What you end up with is phobos refusing to believe that a string or char[] is an array, but the compiler saying it is.

What I'd prefer is something where the compiler types string literals as string, a type defined by phobos which contains as its first member an immutable(char)[] (where the compiler puts the literal). Then we can properly limit the other operations.

As for indexing by code point, I also believe this is a mistake. It is proposed often, but overlooks:

1. most string operations, such as copying and searching, even regular expressions, work just fine using regular indices.

2. doing the operations in (1) using code points and having to continually decode the strings would result in disastrously slow code.

3. the user can always layer a code point interface over the strings, but going the other way is not so practical.

I agree here. Anything that uses indexing to perform a linear operation is bound for the scrap heap. But what about this:

foreach(c; str)

which types c as char (or immutable char), not dchar. These are the subtle problems that we have with the dichotomy of phobos refusing to believe a string is an array, but the compiler believing it is.

I think the default inference for this should be dchar, and phobos can make that true as long as it controls the string type.

There are other points to consider:

1) a string *could be* indexed by character and return the code point being pointed to. 2) even slicing could be valid as long as the slice operator jumps back to the start of the dchar being encoded. This might make for very tricky code, but then again, such is the cost of trying to slice something like a utf-8 string :)

But having the compiler force the string type to be an array, when it clearly isn't, doesn't help. Give the runtime the choice, like it's done for AA's, and I think we may have something that is workable, and doesn't suck performance-wise.

-Steve

Reply via email to