Serhiy Storchaka <storch...@gmail.com> added the comment:

> The only issue left was about the number of U+FFFD generated with invalid 
> sequences in some cases.
> My last patch has extensive tests for this, so you could try to apply it (or 
> copy the tests) and see if they all pass.

Tests fails, but I'm not sure that the tests are correct.

b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
continuation byte'. This is terminological issue.

b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
don't think that is right.

----------
title: str.decode('utf8',       'replace') -- conformance with Unicode 5.2.0 -> 
str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to