Hi, in the teammetrics project I'm trying to parse mailboxes. This worked with Python2 but after porting the code to Python3 I get some encoding troubles. A specific one seem to be an error in the mailbox module. Please run the attached script test_mbox which downloads one of the critical mbox files from aliot-lists.debian.net and calls the also attached simple Python3 script which ends in:
Traceback (most recent call last): File "./test_mbox.py", line 6, in <module> if mbox_file.items() != []: File "/usr/lib/python3.8/mailbox.py", line 132, in items return list(self.iteritems()) File "/usr/lib/python3.8/mailbox.py", line 125, in iteritems value = self[key] File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__ return self.get_message(key) File "/usr/lib/python3.8/mailbox.py", line 781, in get_message msg.set_from(from_line[5:].decode('ascii')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128) Exit code: 1 IMHO it is a bug if those mailboxes can't be read. Am I missing something? Kind regards Andreas. -- http://fam-tille.de
#!/bin/sh wget https://alioth-lists.debian.net/pipermail/pkg-java-maintainers/2020-May.txt.gz gunzip 2020-May.txt.gz python3 test_mbox.py
#!/usr/bin/python3 import mailbox mbox_file = mailbox.mbox('2020-May.txt') if mbox_file.items() != []: print("OK")