[issue35547] email.parser / email.policy does correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

Martijn Pieters Thu, 20 Dec 2018 16:52:38 -0800


Martijn Pieters <m...@python.org> added the comment:


Right, re-educating myself on the MIME RFCs, and found 
https://bugs.python.org/issue1372770 where the same issue is being discussed 
for previous incarnations of the email library.

Removing the FWS after CRLF is the wrong thing to do, **unless** RFC2047 
separating encoded-word tokens. The work-around regex is a bit more 
complicated, but ideally the EW handling should use a specialist FWS token to 
delimit encoded-word sections that renders to '' as is done in unstructured 
headers, but everywhere. Because in practice, there are email clients out there 
that use EW in structured headers, regardless. 

Regex to work around this 

# crude CRLF-FWS-between-encoded-word matching
value = re.sub(r'(?<=\?=(\r\n|\n|\r))([\t ]+)(?==\?)', '', value)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35547>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35547] email.parser / email.policy does correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

Reply via email to