Am Montag, 30. Oktober 2006 23:14 schrieb Joost Verburg: > Georg Baum wrote: > > So you say Markus Kuhn is wrong? That would be surprising to me, since he > > is considered to be an unicode expert. > > His information is outdated. RFC 2279 (the old UTF-8 specification) did > include support for a 31-bit code space. Because the Unicode code space > was later restricted, the RFC has been updated as RFC 3629 and is > restricted to the range 0000-10FFFF. There will never be any characters > outside this range. RFC 2279 is obsolete. > > So the _current_ definition of UTF-8 (RFC 3629) does _not_ allow 5 and 6 > byte sequences. See http://www.faqs.org/rfcs/rfc3629.html
Thanks for the clarification. I believe that I really understood it now. I'll update the conversion facet where 6 is used. Georg
