Andrei Alexandrescu wrote:
Michel Fortin wrote:
On 2010-02-04 12:19:42 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> said:
bearophile wrote:
Simen kjaeraas:
Of the above, I feel (b) is the correct solution, and I understand
it has already been implemented in svn.
Yes, I presume he was mostly looking for a justification of his ideas
he has already accepted and even partially implemented :-)
I am ready to throw away the implementation as soon as a better idea
comes around. As other times, I operated the change to see how things
feel with the new approach.
Has any thought been given to foreach? Currently all these work for
strings:
foreach (c; "abc") { } // typeof(c) is 'char'
foreach (char c; "abc") { }
foreach (wchar c; "abc") { }
foreach (dchar c; "abc") { }
I'm concerned about the first case where the element type is implicit.
The implicit element type is (currently) the code units. If the range
use code points 'dchar' as the element type, then I think foreach
needs to be changed so that the default element type is 'dchar' too
(in the first line of my example). Having ranges and foreach disagree
on this would be very inconsistent. Of course you should be allowed to
iterate using 'char' and 'wchar' too.
I think this would fit nicely. I was surprised at first when learning
D and I noticed that foreach didn't do this, that I had to explicitly
has for it.
This is a good point. I'm in favor of changing the language to make the
implicit type dchar.
Andrei
We seem to be approaching the point where char[], wchar[] and dchar[]
are all arrays of dchar, but with different levels of compression.
It makes me wonder if the char, wchar types actually make any sense.
If char[] is actually a UTF string, then char[] ~ char should be
permitted ONLY if char can be implicitly converted to dchar. Otherwise,
you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will
not necessarily result in a valid unicode string.
I suspect that string, wstring should have been the primary types and
had a .codepoints property, which returned a ubyte[] resp. ushort[]
reference to the data. It's too late, of course. The extra value you get
by having a specific type for 'this is a code point for a UTF8 string'
seems to be very minor, compared to just using a ubyte.