Re: svn commit: r265095 - head/lib/libc/locale
On 04/30/14 16:10, Jilles Tjoelker wrote: On Tue, Apr 29, 2014 at 03:25:57PM +, Pedro F. Giffuni wrote: Author: pfg Date: Tue Apr 29 15:25:57 2014 New Revision: 265095 URL: http://svnweb.freebsd.org/changeset/base/265095 Log: citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days Modified: head/lib/libc/locale/utf8.c Modified: head/lib/libc/locale/utf8.c == --- head/lib/libc/locale/utf8.c Tue Apr 29 15:12:23 2014(r265094) +++ head/lib/libc/locale/utf8.c Tue Apr 29 15:25:57 2014(r265095) @@ -203,6 +203,14 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc, errno = EILSEQ; return ((size_t)-1); } + if ((wch >= 0xd800 && wch <= 0xdfff) || + wch == 0xfffe || wch == 0x) { + /* +* Malformed input; invalid code points. +*/ + errno = EILSEQ; + return ((size_t)-1); + } if (pwc != NULL) *pwc = wch; us->want = 0; Hmm, I think U+FFFE and U+ should be passed through normally. According to http://www.unicode.org/faq/private_use.html they are "noncharacters" (basically a more private variant of private-use characters) and must be mapped through UTFs. The part that rejects U+D800 to U+DFFF is definitely correct, though. http://unicode.org/faq/utf_bom.html#utf8-4 tells to do only that. The part about U+FFFE and U+ in http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 seems out of date. Note the last modified date of that page: 2009-05-11. On another note, everything above U+0010 should perhaps be rejected since those codes, which cannot be encoded in UTF-16, were excluded from Unicode and ISO 10646. Thank you! I will fix soon the UTF-8 part. Pedro. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r265095 - head/lib/libc/locale
On Tue, Apr 29, 2014 at 03:25:57PM +, Pedro F. Giffuni wrote: > Author: pfg > Date: Tue Apr 29 15:25:57 2014 > New Revision: 265095 > URL: http://svnweb.freebsd.org/changeset/base/265095 > Log: > citrus: Avoid invalid code points. > > From the OpenBSD log: > The UTF-8 decoder should not accept byte sequences which decode to unicode > code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+. > http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 > http://unicode.org/faq/utf_bom.html#utf8-4 > Reported by:Stefan Sperling > Obtained from: OpenBSD > MFC after: 5 days > Modified: > head/lib/libc/locale/utf8.c > Modified: head/lib/libc/locale/utf8.c > == > --- head/lib/libc/locale/utf8.c Tue Apr 29 15:12:23 2014 > (r265094) > +++ head/lib/libc/locale/utf8.c Tue Apr 29 15:25:57 2014 > (r265095) > @@ -203,6 +203,14 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc, > errno = EILSEQ; > return ((size_t)-1); > } > + if ((wch >= 0xd800 && wch <= 0xdfff) || > + wch == 0xfffe || wch == 0x) { > + /* > + * Malformed input; invalid code points. > + */ > + errno = EILSEQ; > + return ((size_t)-1); > + } > if (pwc != NULL) > *pwc = wch; > us->want = 0; Hmm, I think U+FFFE and U+ should be passed through normally. According to http://www.unicode.org/faq/private_use.html they are "noncharacters" (basically a more private variant of private-use characters) and must be mapped through UTFs. The part that rejects U+D800 to U+DFFF is definitely correct, though. http://unicode.org/faq/utf_bom.html#utf8-4 tells to do only that. The part about U+FFFE and U+ in http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 seems out of date. Note the last modified date of that page: 2009-05-11. On another note, everything above U+0010 should perhaps be rejected since those codes, which cannot be encoded in UTF-16, were excluded from Unicode and ISO 10646. -- Jilles Tjoelker ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r265095 - head/lib/libc/locale
Author: pfg Date: Tue Apr 29 15:25:57 2014 New Revision: 265095 URL: http://svnweb.freebsd.org/changeset/base/265095 Log: citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from:OpenBSD MFC after:5 days Modified: head/lib/libc/locale/utf8.c Modified: head/lib/libc/locale/utf8.c == --- head/lib/libc/locale/utf8.c Tue Apr 29 15:12:23 2014(r265094) +++ head/lib/libc/locale/utf8.c Tue Apr 29 15:25:57 2014(r265095) @@ -203,6 +203,14 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc, errno = EILSEQ; return ((size_t)-1); } + if ((wch >= 0xd800 && wch <= 0xdfff) || + wch == 0xfffe || wch == 0x) { + /* +* Malformed input; invalid code points. +*/ + errno = EILSEQ; + return ((size_t)-1); + } if (pwc != NULL) *pwc = wch; us->want = 0; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"