Re: Implicit encoding conversion on string ~= int ?

Adam D. Ruppe Sun, 23 Jun 2013 10:36:07 -0700

On Sunday, 23 June 2013 at 17:12:41 UTC, Marco Leise wrote:

int b = 228; // CP850 value for 'ä'. Note: fits in a singlebyte!

228 (e4 in hex) is also the Unicode code point for ä, which is[195, 164] when encoded as UTF-8. see:http://www.utf8-chartable.de/unicode-utf8-table.pl?number=512&utf8=dec

While the number 228 would fit in a byte normally, utf-8 uses thehigh bits as markers that this is part of a multibyte sequence(this helps with ascii compatibility), so any code point > 127will always be a multibyte sequence in utf-8. see:http://en.wikipedia.org/wiki/UTF-8#Description

Re: Implicit encoding conversion on string ~= int ?

Reply via email to