hope this isn't too off-topic -- i'm working on a utf-8 implementation and trying to decide what to do with byte sequences that are well-formed but represent illegal code positions, i.e. 0xd800-0xdfff, 0xfffe-0xffff, and 0x110000-0x1fffff. should these be treated as illegal sequences (EILSEQ) or decoded as ordinary characters? is there a good reference on the precedents? my main reference is the linux/unix unicode faq (http://www.cl.cam.ac.uk/~mgk25/unicode.html) which is somewhat ambiguous on the matter.
rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/