[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings

Antoine Pitrou Tue, 23 Sep 2014 04:24:07 -0700

Antoine Pitrou added the comment:

The encoding used impacts the result:


>>> s = 'abc\udcc3\udca9'
>>> s.encode('ascii', 'surrogateescape').decode('ascii', 'replace')
'abc��'
>>> s.encode('utf-8', 'surrogateescape').decode('utf-8', 'replace')
'abcé'

The original string ('abc\udcc3\udca9') was obtained by decoding a valid utf-8 
string with the 'ascii' codec and the 'surrogateescape' error handler.

If anything, the default encoding should probably be 
sys.getfilesystemencoding().

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings

Reply via email to