I'm fiddling with a program that reads articles in the news spool using email.parser (standard library) & email_reply_parser.EmailReplyParser (installed with pip). Reading is fine, & I don't get any errors writing output extracted from article bodies *until* I try to suppress invalid characters. This works:
if message.is_multipart(): body = message.get_payload(0, True) else: body = message.get_payload() main_body = EmailReplyParser.parse_reply(body) # fix quoted-printable stuff if equals_regex.search(main_body): main_body = quopri.decodestring(main_body) # suppress attribution before quoted text main_body = attrib_regex.sub('>', main_body) # suppress sig main_body = sig_regex.sub('\n', main_body) main_body.strip() stdout.write(main_body + '\n\n') but the stdout includes invalid characters. I tried adding this at the beginning if stdout.encoding is None: writer = codecs.getwriter("utf-8") stdout = writer(stdout, errors='replace') and changing the output line to stdout.write(main_body.encode('utf-8', errors='replace') + '\n\n') but with either or both of those, I get the dreaded "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 562: ordinal not in range(128)". How can I force the output to be in UTF-8 & silently suppress invalid characters? -- Unit tests are like the boy who cried wolf. -- https://mail.python.org/mailman/listinfo/python-list