On Wed, 03 Feb 2010 23:41:02 -0500, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

dsimcha wrote:
I personally would find this extremely annoying because most of the code I write
that involves strings is scientific computing code that will never be
internationalized, let alone released to the general public. I basically just use ASCII because it's all I need and if your UTF-8 string contains only ASCII characters, it can be treated as random-access. I don't know how many people out
there are in similar situations, but I doubt they'll be too happy.
On the other hand, I guess it wouldn't be hard to write a simple wrapper struct on top of immutable(ubyte)[] and call it AsciiString. Once alias this gets fully
debugged, I could even make it implicitly convert to immutable(char)[].

It's definitely going to be easy to use all sensible algorithms with immutable(ubyte)[]. But even if you go with string, there should be no problem at all. Remember, telling ASCII from UTF is one mask and one test away, and the way Walter and I wrote virtually all related routines was to special-case ASCII. In most cases I don't think you'll notice a decrease in performance.

I'm in the same camp as dsimcha, I generally write all my apps assuming ASCII strings (most are internal tools anyways).

Can the compiler help making ASCII strings easier to use? i.e., this already works:

wstring s = "hello"; // converts to immutable(wchar)[]

what about this?

asciistring a = "hello"; // converts to immutable(ubyte)[] (or immutable(ASCIIChar)[])
asciistring a = "\uFBCD"; // error, requires cast.

The only issue that remains to be resolved then is the upgradability that ascii characters currently enjoy for utf8. I.e. I can call any utf-8 accepting function with an ASCII string, but not an ASCII string accepting function with utf-8 data.

Ideally, there should be a 7-bit ASCII character type that implicitly upconverts to char, and can be initialized with a string literal.

In addition, you are putting D's utf8 char even further away from C's ASCII char. It would be nice to separate compatible C strings from d strings. At some point, I should be able to designate a function (even a C function) takes only ASCII data, and the compiler should disallow passing general utf8 data into it. This involves either renaming D's char to keep source closer to C, or rewriting C function signatures to reflect the difference.

-Steve

Reply via email to