Peter J. Holzer <hjp-pyt...@hjp.at> wrote: > The problem is that the message contains a '\ufeff' character (byte > order mark) where email/generator.py expects only ASCII characters. > > I see two possible reasons for this: > > * The mbox writing code assumes that all messages with non-ascii > characters are QP or base64 encoded, and some higher layer uses 8bit > instead. > > * A mime-part is declared as charset=us-ascii but contains really > Unicode characters. > > Both reasons are weird. > > The first would be an unreasonable assumption (8bit encoding has been > common since the mid-1990s), but even if the code made that assumption, > one would expect that other code from the same library honors it. > > The second shouldn't be possible: If a message is mis-declared (that > happens) one would expect that the error happens during parsing, not > when trying to serialize the already parsed message. > > But then you haven't shown where msg comes from. How do you parse the > message to get "msg"? > > Can you construct a minimal test message which triggers the bug? > Yes, simply sending myself an E-Mail with (for example) accented characters triggers the error.
I'm pretty certain my system (and E-Mail in and out, and Usenet news) handle these correctly as UTF8. E.g.:- àéçł It's *only* when I switch the mail delivery to Python 3 that the error appears. -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list