On Sat, 16 Oct 2010 15:49:56 -0400, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

On 10/16/2010 01:39 PM, Steven Schveighoffer wrote:
Andrei, I am increasingly seeing people struggling with the decision to
make strings bidirectional ranges of dchar instead of what the compiler
says they are. This needs a different solution. It's too
confusing/difficult to deal with.

I'm not seeing that. I'm seeing strings working automagically with most of std.algorithm without ever destroying a wide string.

I've seen several posts regarding char[] being considered differently by the compiler and std.algorithm.

The most prominent was the fact that:

foreach(x; str)

iterates over individual char's, not dchars.

While I agree that a bidirectional range is the only sane way to view utf-8 strings, a char[] is not necessarily a utf-8 string. It's an array of utf-8 code points. At least to the compiler.

You can interpret it as a utf-8 string, or as an array. And the compiler allows both. std.algorithm doesn't. This half-ass attempt to make strings safe just fosters confusion.

My suggestion is to make a range that enforces the correct restrictions on strings. The compiler should treat string literals as a polysemous type that is by default this new type, or could optionally be an array of immutable characters.

So for example if you define:

struct string(T) if (is(T == char) || is(T == wchar))
{
   private immutable(T)[] data;
   // range functions to ensure data is only accessed via dchar
   ...
}

Which then is used by the compiler to represent string literals, then we have control over what a string literal allows without littering std.algorithm with special cases (and any external algorithms that might encounter strings).

So for example, I'd want something like this:

immutable(char)[] asciiarr = "abcdef";
auto str = "abcdef"; // typed as string

foreach(x; str)
{
   assert(is(typeof(x) == dchar));
}

foreach(ref x; str) // fails

foreach(ref x; asciiarr) // ok, x is of type immutable(char)

The truth is, 100% of the time for me, I want to use string literals to represent ASCII strings, not utf-8 strings (I speak English, so I care almost nothing for unicode). And std.algorithm steadfastly refuses to treat them as such. I think it's just too limited. Yes, it would be nice if by default strings were bi-directional ranges of dchar, to be on the safe side, but I also want the ability to have an array of chars, which works as an array, even in std.algorithm, *and* is initializeable via string literals.


My requirements for the string struct would be:

1. only access via dchar
2. prevent slicing a code point
3. Indexing returns a dchar as well, which provides pseudo-random access (if you access an index that's in the middle of a code point, you get an exception).

-Steve

Reply via email to