On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer
wrote:
I proposed this inside the long "major performance problem with
std.array.front," I've also proposed it before, a long time ago.
But seems to be getting no attention buried in that thread, not
even negative attention :)
An idea to fix the whole problems I see with char[] being
treated specially by phobos: introduce an actual string type,
with char[] as backing, that is a dchar range, that actually
dictates the rules we want. Then, make the compiler use this
type for literals.
e.g.:
struct string {
immutable(char)[] representation;
this(char[] data) { representation = data;}
... // dchar range primitives
}
Then, a char[] array is simply an array of char[].
points:
1. No more issues with foreach(c; "cassé"), it iterates via
dchar
2. No more issues with "cassé"[4], it is a static compiler
error.
3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the
compiler.
6. Any other special rules we come up with can be dictated by
the library, and not ignored by the compiler.
Note, std.algorithm.copy(string1, mutablestring) will still
decode/encode, but it's more explicit. It's EXPLICITLY a dchar
range. Use std.algorithm.copy(string1.representation,
mutablestring.representation) will avoid the issues.
I imagine only code that is currently UTF ignorant will break,
and that code is easily 'fixed' by adding the 'representation'
qualifier.
-Steve
just to check I understand this fully:
in this new scheme, what would this do?
auto s = "cassé".representation;
foreach(i, c; s) write(i, ':', c, ' ');
writeln(s);
Currently - without the .representation - I get
0:c 1:a 2:s 3:s 4:e 5:̠6:`
cassé
or, to spell it out a bit more:
0:c 1:a 2:s 3:s 4:e 5:xCC 6:x81
cassé