> On 27 Aug 2020, at 10:40, Chris Green <c...@isbd.net> wrote: > > Karsten Hilbert <karsten.hilb...@gmx.net> wrote: >>> Terry Reedy <tjre...@udel.edu> wrote: >>>>> On 8/26/2020 11:10 AM, Chris Green wrote: >>>>> >>>>>> I have a simple[ish] local mbox mail delivery module as follows:- >>>>> ... >>>>>> It has run faultlessly for many years under Python 2. I've now >>>>>> changed the calling program to Python 3 and while it handles most >>>>>> E-Mail OK I have just got the following error:- >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File "/home/chris/.mutt/bin/filter.py", line 102, in <module> >>>>>> mailLib.deliverMboxMsg(dest, msg, log) >>>>> ... >>>>>> File "/usr/lib/python3.8/email/generator.py", line 406, in write >>>>>> self._fp.write(s.encode('ascii', 'surrogateescape')) >>>>>> UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in >>>>> position 4: ordinal not in range(128)
I would guess the fix is do s.encode(‘utf-8’). You might need to add a header to say that you are using utf-8 to the email/mime-part. If you do that does your code work? Barry >>>>> >>>>> '\ufeff' is the Unicode byte-order mark. It should not be present in an >>>>> ascii-only 3.x string and would not normally be present in general >>>>> unicode except in messages like this that talk about it. Read about it, >>>>> for instance, at >>>>> https://en.wikipedia.org/wiki/Byte_order_mark >>>>> >>>>> I would catch the error and print part or all of string s to see what is >>>>> going on with this particular message. Does it have other non-ascii >>>>> chars? >>>>> >>> I can provoke the error simply by sending myself an E-Mail with >>> accented characters in it. I'm pretty sure my Linux system is set up >>> correctly for UTF8 characters, I certainly seem to be able to send and >>> receive these to others and I even get to see messages in other >>> scripts such as arabic, chinese, etc. >>> >>> The code above works perfectly in Python 2 delivering messages with >>> accented (and other extended) characters with no problems at all. >>> Sending myself E-Mails with accented characters works OK with the code >>> running under Python 2. >>> >>> While an E-Mail body possibly *shouldn't* have non-ASCII characters in >>> it one must be able to handle them without errors. In fact haven't >>> the RFCs changed such that the message body should be 8-bit clean? >>> Anyway I think the Python 3 mail handling libraries need to be able to >>> pass extended characters through without errors. >> >> Well, '\ufeff' is not a *character* at all in much of any >> sense of that word in unicode. >> >> It's a marker. Whatever puts it into the stream is wrong. I guess the >> best one can (and should) do is to catch the exception and dump >> the offending stream somewhere binary-capable and pass on a notice. What >> you are receiving there very much isn't a (well-formed) e-mail message. >> >> I would then attempt to backwards-crawl the delivery chain to >> find out where it came from. >> > The error seems to occur with any non-7-bit-ASCII, e.g. my accented > characters gave:- > > File "/usr/lib/python3.8/email/generator.py", line 406, in write > self._fp.write(s.encode('ascii', 'surrogateescape')) > UnicodeEncodeError: 'ascii' codec can't encode character > '\u2019' in position 34: ordinal not in > range(128) > > It just happened that the first example was an escape. > > -- > Chris Green > · > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list