On Mon, Apr 15, 2013, at 15:16, Strake wrote: > On 15/04/2013, random...@fastmail.us <random...@fastmail.us> wrote: > > On Mon, Apr 15, 2013, at 10:58, Martti Kühne wrote: > >> According to a quick google those chars can become as wide as 6 > >> bytes, > > > > No, they can't. I have no idea what your source on this is. > > In UTF-8 the maximum encoded character length is 6 bytes [1]
What on earth does that have to do with using an int to store the code point *instead of* the raw UTF-8 bytes (which are used _now_)? Also, this is out of date; the latest version of unicode (since 2003 at the latest) limits code points to 0x10FFFF and therefore UTF-8 sequences to four bytes. Unless your manpage is much older than mine, it states this clearly and you misread it.