Am Sun, 23 Jun 2013 19:12:21 +0200 schrieb Marco Leise <marco.le...@gmx.de>:
> Am Sun, 23 Jun 2013 18:37:16 +0200 > schrieb "bearophile" <bearophileh...@lycos.com>: > > > Adam D. Ruppe: > > > > > char[] a; > > > int b = 1000; > > > a ~= b; > > > > > > the "a ~= b" is more like "a ~= cast(dchar) b", and then dchar > > > -> char means it may be multibyte encoded, going from utf-32 to > > > utf-8. > > No no no, this is not what happens. In my case it was: > string a; > int b = 228; // CP850 value for 'รค'. Note: fits in a single byte! > a ~= b; > > Maybe it goes as follows: > o compiler sees ~= to a string and becomes "aware" of wchar and dchar > conversions to char > o appended value is only checked for size (type and signedness are lost) > and maps int to dchar > o this dchar value is now checked for Unicode conformity and fails the test > o the dchar value is now assumed to be Latin-1, Windows-1252 or similar > and a conversion routine invoked > o the dchar value is converted to utf-8 and... > o appended as a multi-byte string to variable "a". > > That still doesn't sound right to me thought. What if the dchar value is > not valid Unicode AND >= 256 ? Actually you were 100% right, Adam. I was distracted by the fact that the source was CP850. UTF-32 maps all of Latin-1 in a 1:1 correspondence and most of CP850 has the same code in Latin-1. So yes, all the compiler was doing is to append a dchar value. And with char/ubyte I do find it convenient to mix them without casting. E.g. "if (someChar < 0x80)" and similar code. As confusing as it was for me, I agree with "WONT FIX". -- Marco