[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Serhiy Storchaka Thu, 17 May 2012 10:31:09 -0700

Serhiy Storchaka <[email protected]> added the comment:

> The only issue left was about the number of U+FFFD generated with invalid 
> sequences in some cases.
> My last patch has extensive tests for this, so you could try to apply it (or 
> copy the tests) and see if they all pass.


Tests fails, but I'm not sure that the tests are correct.

b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
continuation byte'. This is terminological issue.

b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
don't think that is right.

----------
title: str.decode('utf8',       'replace') -- conformance with Unicode 5.2.0 -> 
str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Reply via email to