R. David Murray added the comment:
To understand why, understand that a byte string has no encoding inherent. So
when you call b'utf8string'.decode('unicode_escape'), python has no way to know
how to interpret the non-ascii characters in that bytestring. If you want the
unicode_escape representation of something, you want to do
'string'.encode('unicode_escape'). If you then want that as a python string,
you can do:
'mystring'.encode('unicode_escape').decode('ascii')
In theory there ought to be a way to use the codecs module to go directly from
unicode string to unicode-escaped string, but I don't know how to do it, since
the proposal for the 'transform' method was rejected :)
Just to bend your brain a bit further, note that this does work:
>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'),
>>> 'unicode-escape')
'ä'
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21331>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com