Re: Proposal for fixing dchar ranges

John Colvin Mon, 10 Mar 2014 14:51:28 -0700

On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighofferwrote:

I proposed this inside the long "major performance problem withstd.array.front," I've also proposed it before, a long time ago.
But seems to be getting no attention buried in that thread, noteven negative attention :)
An idea to fix the whole problems I see with char[] beingtreated specially by phobos: introduce an actual string type,with char[] as backing, that is a dchar range, that actuallydictates the rules we want. Then, make the compiler use thistype for literals.
e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:
1. No more issues with foreach(c; "cassé"), it iterates viadchar2. No more issues with "cassé"[4], it is a static compilererror.
3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool thecompiler.6. Any other special rules we come up with can be dictated bythe library, and not ignored by the compiler.
Note, std.algorithm.copy(string1, mutablestring) will stilldecode/encode, but it's more explicit. It's EXPLICITLY a dcharrange. Use std.algorithm.copy(string1.representation,mutablestring.representation) will avoid the issues.
I imagine only code that is currently UTF ignorant will break,and that code is easily 'fixed' by adding the 'representation'qualifier.
-Steve


just to check I understand this fully:

in this new scheme, what would this do?

auto s = "cassé".representation;
foreach(i, c; s) write(i, ':', c, ' ');
writeln(s);

Currently - without the .representation - I get

0:c 1:a 2:s 3:s 4:e 5:̠6:`
cassé

or, to spell it out a bit more:
0:c 1:a 2:s 3:s 4:e 5:xCC 6:x81
cassé

Re: Proposal for fixing dchar ranges

Reply via email to