Re: Making all strings UTF ranges has some risk of WTF

Steven Schveighoffer Mon, 08 Feb 2010 07:46:53 -0800

On Wed, 03 Feb 2010 23:41:02 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> wrote:

dsimcha wrote:
I personally would find this extremely annoying because most of thecode I write
that involves strings is scientific computing code that will never be
internationalized, let alone released to the general public. Ibasically just useASCII because it's all I need and if your UTF-8 string contains onlyASCIIcharacters, it can be treated as random-access. I don't know how manypeople out
there are in similar situations, but I doubt they'll be too happy.
On the other hand, I guess it wouldn't be hard to write a simplewrapper struct ontop of immutable(ubyte)[] and call it AsciiString. Once alias thisgets fully
debugged, I could even make it implicitly convert to immutable(char)[].
It's definitely going to be easy to use all sensible algorithms withimmutable(ubyte)[]. But even if you go with string, there should be noproblem at all. Remember, telling ASCII from UTF is one mask and onetest away, and the way Walter and I wrote virtually all related routineswas to special-case ASCII. In most cases I don't think you'll notice adecrease in performance.

I'm in the same camp as dsimcha, I generally write all my apps assumingASCII strings (most are internal tools anyways).

Can the compiler help making ASCII strings easier to use? i.e., thisalready works:


wstring s = "hello"; // converts to immutable(wchar)[]

what about this?

asciistring a = "hello"; // converts to immutable(ubyte)[] (orimmutable(ASCIIChar)[])

asciistring a = "\uFBCD"; // error, requires cast.

The only issue that remains to be resolved then is the upgradability thatascii characters currently enjoy for utf8. I.e. I can call any utf-8accepting function with an ASCII string, but not an ASCII string acceptingfunction with utf-8 data.

Ideally, there should be a 7-bit ASCII character type that implicitlyupconverts to char, and can be initialized with a string literal.

In addition, you are putting D's utf8 char even further away from C'sASCII char. It would be nice to separate compatible C strings from dstrings. At some point, I should be able to designate a function (even aC function) takes only ASCII data, and the compiler should disallowpassing general utf8 data into it. This involves either renaming D's charto keep source closer to C, or rewriting C function signatures to reflectthe difference.


-Steve

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to