[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

Marko Lalic Thu, 20 Jun 2013 12:16:35 -0700

Marko Lalic added the comment:

That will work fine as long as the characters are actually latin. We cannot 
forget the rest of the unicode character planes. Consider::


>>> message = message_from_string("""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
... 
... 한글ᥡ╥ສए""")
>>> message.get_payload(decode=True).decode('latin1')
'\\ud55c\\uae00\\u1961\\u2565\\u0eaa\\u090f'
>>> message.get_payload(decode=True).decode('raw-unicode-escape')
'한글ᥡ╥ສए'

However, even if latin1 did work, the main point is that a different encoding 
than the one the message specifies must be used in order to decode the bytes to 
a unicode string.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

Reply via email to