New submission from Marko Lalic:
When the message's Content-Transfer-Encoding is set to 8bit, the
get_payload(decode=True) method returns the payload encoded using
raw-unicode-escape. This means that it is impossible to decode the returned
bytes using the content charset obtained by the get_content_charset method.
It seems this should be fixed so that get_payload returns the bytes as found in
the payload when Content-Transfer-Encoding is 8bit, exactly like Python2.7
handles it.
>>> from email import message_from_string
>>> message = message_from_string("""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
...
... ünicöde data..""")
>>> message.get_content_charset()
'utf-8'
>>> message.get_payload(decode=True)
b'\xfcnic\xf6de data..'
>>> message.get_payload(decode=True).decode(message.get_content_charset())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 0: invalid
start byte
>>> message.get_payload(decode=True).decode('raw-unicode-escape')
'ünicöde data..'
----------
components: email
messages: 191526
nosy: barry, mlalic, r.david.murray
priority: normal
severity: normal
status: open
title: get_payload method returns bytes which cannot be decoded using the
message's charset
type: behavior
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com