R. David Murray added the comment:
The python3 email package's handling of 8bit definitely has quirks. (So did
the python2 email package's, but they were different quirks. :)
You can't correctly handle 8bit unless you use message_from_bytes and take the
input from a byte string. It is a good question what should be done with a
unicode string that claims its payload is 8bit...since that situation can't
arise on the wire (or in a disk file), perhaps it should produce an exception
("message must be parsed as binary data"?) The problem with that idea is that
the email parser promises to never raise errors, but always produce *some* sort
of model from the input, possibly with defects attached.
All that aside, here is what you want to be doing:
>>> from email import message_from_bytes
>>> message = message_from_bytes(b"""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
...
... \xc3\xbcnic\xc3\xb6de data..""")
>>> message.get_content_charset()
'utf-8'
>>> message.get_payload(decode=True)
b'\xc3\xbcnic\xc3\xb6de data..'
>>> message.get_payload(decode=True).decode('utf-8')
'ünicöde data..'
>>> message.get_payload()
'ünicöde data..'
You will note that get_payload without the decode automatically does the
charset decode. I know this is counter-intuitive, but we are dealing with a
legacy API that I had to retrofit. Think of decode=True as "produce binary
from the wire content transfer encoding", and decode=False as "produce the
string representation of the payload". For ASCII content-transfer-encodings,
this is more intuitive (the raw quoted printable, for example), but for 8bit we
can only produce a python string if we do the unicode decode...so that's what
we do.
You will also note that the payload in this case really *is* utf-8, whereas in
your example it was unicode...and what the python3 email package does with a
unicode payload is not well defined and is definitely buggy.
I'm going to close this issue, because dealing with the vagaries of 8bit with
string input is on my master list of things to tackle this summer, and will be
dealt with in the context of other changes.
----------
resolution: -> invalid
stage: -> committed/rejected
status: open -> closed
versions: -Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com