R. David Murray added the comment:

The python3 email package's handling of 8bit definitely has quirks.  (So did 
the python2 email package's, but they were different quirks. :)

You can't correctly handle 8bit unless you use message_from_bytes and take the 
input from a byte string.  It is a good question what should be done with a 
unicode string that claims its payload is 8bit...since that situation can't 
arise on the wire (or in a disk file), perhaps it should produce an exception 
("message must be parsed as binary data"?)  The problem with that idea is that 
the email parser promises to never raise errors, but always produce *some* sort 
of model from the input, possibly with defects attached.

All that aside, here is what you want to be doing:

>>> from email import message_from_bytes
>>> message = message_from_bytes(b"""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
... 
... \xc3\xbcnic\xc3\xb6de data..""")
>>> message.get_content_charset()
'utf-8'
>>> message.get_payload(decode=True)
b'\xc3\xbcnic\xc3\xb6de data..'
>>> message.get_payload(decode=True).decode('utf-8')
'ünicöde data..'
>>> message.get_payload()
'ünicöde data..'

You will note that get_payload without the decode automatically does the 
charset decode.  I know this is counter-intuitive, but we are dealing with a 
legacy API that I had to retrofit.  Think of decode=True as "produce binary 
from the wire content transfer encoding", and decode=False as "produce the 
string representation of the payload".  For ASCII content-transfer-encodings, 
this is more intuitive (the raw quoted printable, for example), but for 8bit we 
can only produce a python string if we do the unicode decode...so that's what 
we do.

You will also note that the payload in this case really *is* utf-8, whereas in 
your example it was unicode...and what the python3 email package does with a 
unicode payload is not well defined and is definitely buggy.

I'm going to close this issue, because dealing with the vagaries of 8bit with 
string input is on my master list of things to tackle this summer, and will be 
dealt with in the context of other changes.

----------
resolution:  -> invalid
stage:  -> committed/rejected
status: open -> closed
versions:  -Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to