On Thu, 20 Oct 2011 21:58:20 +0200, Jonathan M Davis <jmdavisp...@gmx.com>
wrote:
On Thursday, October 20, 2011 21:37:56 Martin Nowak wrote:
It just took me over one hour to find out the unthinkable.
foreach(c; str) will deduce c to immutable(char) and doesn't care about
unicode.
Now there is so many unicode transcoding happening in the language that
it
starts to get annoying,
but the most basic string iteration doesn't support it by default?
Walter won't change it, because it would silently change too much code.
Now,
I'm willing to bet that in 99.9999999% of cases, it would _fix_ the code
rather
than break it, but still, he won't do it. However, the behavior _is_
completely consistent with the rest of the language, since it's the
range-
based stuff which decodes arrays of chars or wchars as characters. And it
_would_ be inconsistent with all other uses of foreach for arrays of
char or
wchar to be iterated over as ranges of dchar. But still, it's a bug
waiting to
happen which doesn't really benefit anyone.
I've suggested that there should be a warning when code uses a foreach
over an
array of char or wchar without specifying the iteration type (
http://d.puremagic.com/issues/show_bug.cgi?id=4483 ). That way, you can
specify char or wchar if you really want it, but anyone who forgets to
explicitly use dchar (or doesn't realize that they should) is warned.
But that
hasn't been implemented as of yet, and I don't believe that Walter has
voiced
his opinion on it.
- Jonathan M Davis
At least it was your ∞ that revealed my bug.
Incidentally this has brought me a nice idea.
You need to combine the foreach loop 'bug' with the ability to alter the
index variable
(http://d.puremagic.com/issues/show_bug.cgi?id=6652).
Then you can construct a terrifically fast, still correct, utf8 decoder.
foreach(i, c; s)
{
if (c < 0x80)
outp.put(c);
else
(outp.put(std.utf.decode(s, i)), --i);
}
But you better write foreach(ref i, char c; s).