Ezio Melotti <ezio.melo...@gmail.com> added the comment:

I added a test for the 'ignore' error handler. I will commit the patch before 
the RC unless someone has something against it.

To summarize, the patch updates PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629, 
so:
1) Invalid sequences are now handled as described in 
http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bits-long sequences are now invalid (no changes in behavior, I just 
removed the "deafult:" of the switch/case and marked them with '0' in the first 
table);
3) According to RFC 3629, codepoints in the surrogate range (U+D800-U+DFFF) 
should be considered invalid, but this would not be backward compatible, so I 
added code and tests but left them commented away;
4) I changed the error message "unexpected code byte" to "invalid start byte" 
and "invalid data" to "invalid continuation byte";
5) I added an extensive set of tests in test_unicode;
6) I fixed test_codeccallbacks because it was failing after this change.

----------
Added file: http://bugs.python.org/file17552/issue8271v5.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to