utf-8 and well-formed but illegal chars

Rich Felker Wed, 18 Jan 2006 16:11:25 -0800

hope this isn't too off-topic -- i'm working on a utf-8 implementation
and trying to decide what to do with byte sequences that are
well-formed but represent illegal code positions, i.e. 0xd800-0xdfff,
0xfffe-0xffff, and 0x110000-0x1fffff. should these be treated as
illegal sequences (EILSEQ) or decoded as ordinary characters? is there
a good reference on the precedents? my main reference is the
linux/unix unicode faq (http://www.cl.cam.ac.uk/~mgk25/unicode.html)
which is somewhat ambiguous on the matter.


rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

utf-8 and well-formed but illegal chars

Reply via email to