Antoine Pitrou added the comment: The encoding used impacts the result:
>>> s = 'abc\udcc3\udca9' >>> s.encode('ascii', 'surrogateescape').decode('ascii', 'replace') 'abc��' >>> s.encode('utf-8', 'surrogateescape').decode('utf-8', 'replace') 'abcé' The original string ('abc\udcc3\udca9') was obtained by decoding a valid utf-8 string with the 'ascii' codec and the 'surrogateescape' error handler. If anything, the default encoding should probably be sys.getfilesystemencoding(). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com