On Wed, 03 Feb 2010 23:41:02 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
dsimcha wrote:
I personally would find this extremely annoying because most of the
code I write
that involves strings is scientific computing code that will never be
internationalized, let alone released to the general public. I
basically just use
ASCII because it's all I need and if your UTF-8 string contains only
ASCII
characters, it can be treated as random-access. I don't know how many
people out
there are in similar situations, but I doubt they'll be too happy.
On the other hand, I guess it wouldn't be hard to write a simple
wrapper struct on
top of immutable(ubyte)[] and call it AsciiString. Once alias this
gets fully
debugged, I could even make it implicitly convert to immutable(char)[].
It's definitely going to be easy to use all sensible algorithms with
immutable(ubyte)[]. But even if you go with string, there should be no
problem at all. Remember, telling ASCII from UTF is one mask and one
test away, and the way Walter and I wrote virtually all related routines
was to special-case ASCII. In most cases I don't think you'll notice a
decrease in performance.
I'm in the same camp as dsimcha, I generally write all my apps assuming
ASCII strings (most are internal tools anyways).
Can the compiler help making ASCII strings easier to use? i.e., this
already works:
wstring s = "hello"; // converts to immutable(wchar)[]
what about this?
asciistring a = "hello"; // converts to immutable(ubyte)[] (or
immutable(ASCIIChar)[])
asciistring a = "\uFBCD"; // error, requires cast.
The only issue that remains to be resolved then is the upgradability that
ascii characters currently enjoy for utf8. I.e. I can call any utf-8
accepting function with an ASCII string, but not an ASCII string accepting
function with utf-8 data.
Ideally, there should be a 7-bit ASCII character type that implicitly
upconverts to char, and can be initialized with a string literal.
In addition, you are putting D's utf8 char even further away from C's
ASCII char. It would be nice to separate compatible C strings from d
strings. At some point, I should be able to designate a function (even a
C function) takes only ASCII data, and the compiler should disallow
passing general utf8 data into it. This involves either renaming D's char
to keep source closer to C, or rewriting C function signatures to reflect
the difference.
-Steve