STINNER Victor added the comment:
> The surrogateescape error handler is dangerous with utf-16/32. It can produce
> globally invalid output.
I don't understand, can you give an example? surrogateescape generate invalid
encoded string with any encoding. Example with UTF-8:
>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'
>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'
>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid
start byte
So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18713>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com