Re: Making all strings UTF ranges has some risk of WTF

Don Thu, 04 Feb 2010 12:21:08 -0800

Andrei Alexandrescu wrote:

Michel Fortin wrote:
On 2010-02-04 12:19:42 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:
bearophile wrote:
Simen kjaeraas:
Of the above, I feel (b) is the correct solution, and I understand
it has already been implemented in svn.
Yes, I presume he was mostly looking for a justification of his ideas
he has already accepted and even partially implemented :-)
I am ready to throw away the implementation as soon as a better ideacomes around. As other times, I operated the change to see how thingsfeel with the new approach.
Has any thought been given to foreach? Currently all these work forstrings:
    foreach (c; "abc") { } // typeof(c) is 'char'
    foreach (char c; "abc") { }
    foreach (wchar c; "abc") { }
    foreach (dchar c; "abc") { }
I'm concerned about the first case where the element type is implicit.The implicit element type is (currently) the code units. If the rangeuse code points 'dchar' as the element type, then I think foreachneeds to be changed so that the default element type is 'dchar' too(in the first line of my example). Having ranges and foreach disagreeon this would be very inconsistent. Of course you should be allowed toiterate using 'char' and 'wchar' too.
I think this would fit nicely. I was surprised at first when learningD and I noticed that foreach didn't do this, that I had to explicitlyhas for it.
This is a good point. I'm in favor of changing the language to make theimplicit type dchar.
Andrei

We seem to be approaching the point where char[], wchar[] and dchar[]are all arrays of dchar, but with different levels of compression.

It makes me wonder if the char, wchar types actually make any sense.

If char[] is actually a UTF string, then char[] ~ char should bepermitted ONLY if char can be implicitly converted to dchar. Otherwise,you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which willnot necessarily result in a valid unicode string.

I suspect that string, wstring should have been the primary types andhad a .codepoints property, which returned a ubyte[] resp. ushort[]reference to the data. It's too late, of course. The extra value you getby having a specific type for 'this is a code point for a UTF8 string'seems to be very minor, compared to just using a ubyte.

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to