New submission from Dickson Chan <dxn...@gmail.com>:
parse_message_id in the email module crashes with bogus message-id Having a Message-ID '<[>' gives me an IndexError: list index out of range This happens when - creating an EmailMessage with the said Message-ID msg = EmailMessage() msg['Message-ID'] = '<[>' - accessing the bogus Message-ID through msg.items() or msg.get('Message-ID') this doesn't happen with python 3.6 or 3.7 when MessageIDHeader didn't exist 3.8/Lib/email/headerregistry.py line 542 _default_header_map = { .... 'message-id': MessageIDHeader, } ------------------------------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id token, value = get_dot_atom_text(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text raise errors.HeaderParseError("expected atom at a start of " email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[>' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 4, in <module> msg['Message-ID'] = '<[>' File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__ self._headers.append(self.policy.header_store_parse(name, val)) File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse return (name, self.header_factory(name, value)) File "/usr/lib/python3.8/email/headerregistry.py", line 607, in __call__ return self[name](name, value) File "/usr/lib/python3.8/email/headerregistry.py", line 202, in __new__ cls.parse(value, kwds) File "/usr/lib/python3.8/email/headerregistry.py", line 535, in parse kwds['parse_tree'] = parse_tree = cls.value_parser(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 2126, in parse_message_id token, value = get_msg_id(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 2073, in get_msg_id token, value = get_obs_local_part(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1516, in get_obs_local_part if (obs_local_part[0].token_type == 'dot' or IndexError: list index out of range ------------------------------------------------------------------------------------------- as you can see in the traceback get_msg_id() calls get_obs_local_part() and in get_obs_local_part(), you have this def get_obs_local_part(value): obs_local_part = ObsLocalPart() while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS): ... if (obs_local_part[0].token_type == 'dot': ... if value does not satisfy the condition in the while loop, this gives an IndexError as obs_local_part is empty (the value in my example is '[>' from the message id '<[>') shouldn't we have a proper Error or default back to no parsing if parsing fails? There's no way of bypassing the parser and getting the Message-ID and I can't even handle the error with a try catch ---------- components: email messages: 381947 nosy: barry, dxn126, r.david.murray priority: normal severity: normal status: open title: parse_message_id, get_msg_id, get_obs_local_part is poorly written type: behavior versions: Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42484> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com