----- "Graeme Geldenhuys" <graemeg.li...@gmail.com> schreef:

> On 16/09/2011 00:01, Dimitri Smits wrote:
> > 
> > errrm, utf-8 can have 6 octets representing one character,
> 
> Last time I checked, that was only in the very early stages of
> developing the utf-8 specification. Since then, the maximums size of
> a
> utf-8 code point is 4 bytes.
> 
> If you know otherwise, please post a URL. Here is the information I
> have:
> 
> "The original specification allowed for sequences of up to six bytes,
> covering numbers up to 31 bits (the original limit of the Universal
> Character Set). In November 2003 UTF-8 was restricted by RFC 3629 to
> four bytes covering only the range U+0000 to U+10FFFF, in order to
> match
> the constraints of the UTF-16 character encoding."
> 
>   http://en.wikipedia.org/wiki/UTF-8#History
> 

good to know.
I've learned about unicode/utf8 from the following links
http://www.joelonsoftware.com/articles/Unicode.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html

never bothered to look into the rfc's and/or official unicode site(s).

when I follow the link to the rfc you mentioned in the second link above, I 
indeed see that it is 4 octets according to the rfc. However, when I follow the 
link to the unicode appendix 
(http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html), mentioned in that 
second page (anchored link: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8), 
I see that according to the iso spec, it still is (was?) 6.

kind regards,
Dimitri Smits
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to