Serhiy Storchaka <storch...@gmail.com> added the comment: > The only issue left was about the number of U+FFFD generated with invalid > sequences in some cases. > My last patch has extensive tests for this, so you could try to apply it (or > copy the tests) and see if they all pass.
Tests fails, but I'm not sure that the tests are correct. b'\xe0\x00' raises 'unexpected end of data' and not 'invalid continuation byte'. This is terminological issue. b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I don't think that is right. ---------- title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com