----- "Graeme Geldenhuys" <graemeg.li...@gmail.com> schreef:
> On 16/09/2011 00:01, Dimitri Smits wrote: > > > > errrm, utf-8 can have 6 octets representing one character, > > Last time I checked, that was only in the very early stages of > developing the utf-8 specification. Since then, the maximums size of > a > utf-8 code point is 4 bytes. > > If you know otherwise, please post a URL. Here is the information I > have: > > "The original specification allowed for sequences of up to six bytes, > covering numbers up to 31 bits (the original limit of the Universal > Character Set). In November 2003 UTF-8 was restricted by RFC 3629 to > four bytes covering only the range U+0000 to U+10FFFF, in order to > match > the constraints of the UTF-16 character encoding." > > http://en.wikipedia.org/wiki/UTF-8#History > good to know. I've learned about unicode/utf8 from the following links http://www.joelonsoftware.com/articles/Unicode.html http://www.cl.cam.ac.uk/~mgk25/unicode.html never bothered to look into the rfc's and/or official unicode site(s). when I follow the link to the rfc you mentioned in the second link above, I indeed see that it is 4 octets according to the rfc. However, when I follow the link to the unicode appendix (http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html), mentioned in that second page (anchored link: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8), I see that according to the iso spec, it still is (was?) 6. kind regards, Dimitri Smits _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel