Eric Lafontaine added the comment:
Hi all,
I believe this is the right behavior and what ever generated the boundary
"<<>>" is the problem ;
RFC 2046 page 22:
_____________________
The only mandatory global parameter for the "multipart" media type is the
boundary parameter, which consists of 1 to 70 characters from a set of
characters known to be very robust through mail gateways, and NOT ending with
white space. (If a boundary delimiter line appears to end with white space, the
white space must be presumed to have been added by a gateway, and must be
deleted.) It is formally specified by the following BNF:
boundary := 0*69<bchars> bcharsnospace
bchars := bcharsnospace / " "
bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
"+" / "_" / "," / "-" / "." /
"/" / ":" / "=" / "?"
_____________________
In other words, the only valid boundaries characters are :
01234567890 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'()+_,-./:=?
Any other character should be removed to get the boundary right. I believe the
issue is that it wasn't removed in the first place. It is a bug in my opinion,
but the other way around :).
Funny thing is that the unquote function only remove the first&last character
it sees... either '<' and the '"'...
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"') and str.endswith('"'):
return str[1:-1].replace('\\\\', '\\').replace('\\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
Now, if I modify unquote to only keep the list of character above, would I
break something? Probably.
I haven't found any other defining RFC about boundaries that tells me what was
the format supported. Can someone help me on that?
This is what the function should look like :
import string
def get_boundary(str):
""" return the valid boundary parameter as per RFC 2046 page 22. """
if len(str) > 1:
import re
return re.sub('[^'+
string.ascii_letters +
string.digits +
r""" '()+_,-./:=?]|="""
,'',str
).rstrip(' ')
return str
import unittest
class boundary_tester(unittest.TestCase):
def test_get_boundary(self):
boundary1 = """ abc def gh< 123 >!@ %!%' """
ref_boundary1 = """ abc def gh 123 '""" # this is the valid Boundary
ret_value = get_boundary(boundary1)
self.assertEqual(ret_value,ref_boundary1)
def test_get_boundary2(self):
boundary1 = ''.join((' ',string.printable))
ref_boundary1 = '
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\'()+,-./:?_' #
this is the valid Boundary
ret_value = get_boundary(boundary1)
self.assertEqual(ret_value,ref_boundary1)
I believe this should be added to the email.message.Message get_boundary
function.
Regards,
Eric Lafontaine
----------
nosy: +Eric Lafontaine
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue28945>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com