On Sunday, 23 June 2013 at 17:12:41 UTC, Marco Leise wrote:
int b = 228; // CP850 value for 'ä'. Note: fits in a single byte!

228 (e4 in hex) is also the Unicode code point for ä, which is [195, 164] when encoded as UTF-8. see: http://www.utf8-chartable.de/unicode-utf8-table.pl?number=512&utf8=dec

While the number 228 would fit in a byte normally, utf-8 uses the high bits as markers that this is part of a multibyte sequence (this helps with ascii compatibility), so any code point > 127 will always be a multibyte sequence in utf-8. see: http://en.wikipedia.org/wiki/UTF-8#Description

Reply via email to