[issue21331] Reversing an encoding with unicode-escape returns a different result

R. David Murray Wed, 23 Apr 2014 15:17:32 -0700

R. David Murray added the comment:

To understand why, understand that a byte string has no encoding inherent.  So 
when you call b'utf8string'.decode('unicode_escape'), python has no way to know 
how to interpret the non-ascii characters in that bytestring.  If you want the 
unicode_escape representation of something, you want to do 
'string'.encode('unicode_escape').  If you then want that as a python string, 
you can do:


    'mystring'.encode('unicode_escape').decode('ascii')

In theory there ought to be a way to use the codecs module to go directly from 
unicode string to unicode-escaped string, but I don't know how to do it, since 
the proposal for the 'transform' method was rejected :)

Just to bend your brain a bit further, note that this does work:

>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 
>>> 'unicode-escape')
'ä'

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21331>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21331] Reversing an encoding with unicode-escape returns a different result

Reply via email to