Stephen Finucane <step...@that.guru> writes: > On Tue, 2019-05-14 at 16:11 +1000, Daniel Axtens wrote: >> This reverts commit 841f966b8d54b2f51ab1c498eed6e5391f2546a9. >> >> In July 2018, we received a report of OzLabs patchwork mangling >> emails that have subjects containing words with internal commas, >> like "Insert DT binding for foo,bar" (#197). >> >> Stephen took a look and came up with the comment this reverts. Quoting >> the commit message: >> >> RFC2822 states that long headers can be wrapped using CRLF followed by >> WSP [1]. For example: >> >> Subject: Foo bar, >> baz >> >> Should be parsed as: >> >> Foo bar,baz >> >> As it turns out, this is not the case. Journey with me to >> section 2.2.3 of RFC 2822: >> >> 2.2.3. Long Header Fields >> >> Each header field is logically a single line of characters comprising >> the field name, the colon, and the field body. For convenience >> however, and to deal with the 998/78 character limitations per line, >> the field body portion of a header field can be split into a multiple >> line representation; this is called "folding". The general rule is >> that wherever this standard allows for folding white space (not >> simply WSP characters), a CRLF may be inserted before any WSP. For >> example, the header field: >> >> Subject: This is a test >> >> can be represented as: >> >> Subject: This >> is a test >> >> So the issue with the example in the reverted commit is that >> there is no folding white space in "bar,baz", so it's not valid >> to split it. >> >> These are valid: >> Subject: Foo bar,baz >> Subject: Foo >> bar,baz >> >> but splitting "bar,baz" into "bar,\n baz" is not valid. >> >> What then is correct unfolding behaviour? Quoting the RFC again: >> >> The process of moving from this folded multiple-line representation >> of a header field to its single line representation is called >> "unfolding". Unfolding is accomplished by simply removing any CRLF >> that is immediately followed by WSP. Each header field should be >> treated in its unfolded form for further syntactic and semantic >> evaluation. >> >> In other words, the unfolding rule requires you to strip the CRLF, >> but it does not permit you to strip the WSP. Indeed, if "bar,\n baz" >> is received, the correct unfolding is "bar, baz". >> >> If you do strip the WSP, you end up mashing words together, such >> as in https://patchwork.ozlabs.org/patch/1097852/ >> >> So revert the commit, restoring original behaviour, but keep a >> corrected version of the test. >> >> This presents a big question though: how did Rob's email up with >> a mangled subject line? >> >> To answer this question, you end up having to learn about OzLabs >> Patchwork and how it differs from Patchwork the project. >> >> OzLabs Patchwork (patchwork.ozlabs.org) is an installation of >> Patchwork. Part of what makes it so useful for so many projects is >> a little intervening layer that can massage some mail to make it >> end up in the right project. Email that lands in the device tree >> project is an example of email that goes through this process. >> I only learned about this today and I haven't looked in any detail >> at precisely what is done to the mail. The script is not part of the >> Patchwork project. >> >> This intervening filter is a Python script that runs - and this >> is an important detail - in Python 2.7. >> >> Ignoring all the details, the filter basically operates in a pipe >> between the mail program and patchwork's parsemail, like >> >> (mail from system) | filter.py | parsemail >> >> At it's very simplest, filter.py acts as follows: >> >> import email >> import sys >> mail = email.parse_from_file(sys.stdin) >> sys.stdout.write(mail.as_string()) >> >> Fascinatingly, if you take Rob's email from #197 and put it through >> this process, you can see that it is getting mangled: >> >> Before: >> Subject: [PATCH v2 3/4] dt-bindings: sound: wm8994: document wlf,csnaddr-pd >> property >> >> After: >> Subject: [PATCH v2 3/4] dt-bindings: sound: wm8994: document wlf, >> csnaddr-pd property >> >> You can see that python27 has incorrectly wrapped the header, breaking >> where there is not a foldable space. Python3 does not have this issue. >> >> To summarise: >> - part of the magic of OzLabs PW is a filter to make sure mail gets >> to the right place. This isn't part of the Patchwork project and >> so is usually invisible to patchwork developers. >> >> - the filter is written in python27. The email module in py27 has >> a bug that incorrectly breaks subjects around commas within words. >> >> - patchwork correctly unfolds those broken subjects with a space >> after the comma. >> >> - thr extra space was interpreted as a bug in patchwork, leading to >> a misinterpretation of the spec to strip out the whitespace that >> was believed to be in error. >> >> - that broke other wrapped subjects. >> >> To solve this, revert the commit and I'll work with jk to get the >> filter script into py3 compatibility. (Given that py27 sunsets in >> ~7mo, trying to fix it is not worth it.) >> >> Closes: #197 >> CC: stable > > Thanks for the detailed overview. I dropped the changes to the release > note and simply added a new one because this revert won't appear in > 2.1.2, for example, so the release note for that shouldn't be changed. > Otherwise all good though so I've applied this to both master and > stable/2.1. >
Thanks - release notes are still something of a mystery to me so thanks for fixing that up. I'll talk to the OzLabs crew about whether they want a 2.1.3 with this patch or if they're happy to just carry it for a while. Regards, Daniel > Stephen > >> --- >> patchwork/parser.py | 1 - >> patchwork/tests/test_parser.py | 2 +- >> releasenotes/notes/issue-197-4f7594db1e4c9887.yaml | 9 +++++---- >> 3 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/patchwork/parser.py b/patchwork/parser.py >> index 712780a498c4..91e9920c8782 100644 >> --- a/patchwork/parser.py >> +++ b/patchwork/parser.py >> @@ -47,7 +47,6 @@ class DuplicateMailError(Exception): >> >> >> def normalise_space(value): >> - value = ''.join(re.split(r'\n\s+', value)) >> whitespace_re = re.compile(r'\s+') >> return whitespace_re.sub(' ', value).strip() >> >> diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py >> index ddbcf5b15a19..f18220298078 100644 >> --- a/patchwork/tests/test_parser.py >> +++ b/patchwork/tests/test_parser.py >> @@ -838,7 +838,7 @@ class SubjectTest(TestCase): >> self.assertEqual(clean_subject("[PATCH] meep \n meep"), >> ('meep meep', [])) >> self.assertEqual(clean_subject("[PATCH] meep,\n meep"), >> - ('meep,meep', [])) >> + ('meep, meep', [])) >> self.assertEqual(clean_subject('[PATCH RFC] meep'), >> ('[RFC] meep', ['RFC'])) >> self.assertEqual(clean_subject('[PATCH,RFC] meep'), >> diff --git a/releasenotes/notes/issue-197-4f7594db1e4c9887.yaml >> b/releasenotes/notes/issue-197-4f7594db1e4c9887.yaml >> index 2777fbc2f85b..41b86c064b8a 100644 >> --- a/releasenotes/notes/issue-197-4f7594db1e4c9887.yaml >> +++ b/releasenotes/notes/issue-197-4f7594db1e4c9887.yaml >> @@ -1,7 +1,8 @@ >> --- >> fixes: >> - | >> - Long headers can be wrapped using CRLF followed by WSP (whitespace). >> This >> - whitespace was not being stripped, resulting in errant whitespace being >> - saved for the patch subject. This is resolved though existing patches >> and >> - cover letters will need to be updated manually. >> + Long headers can be wrapped using CRLF followed by WSP (whitespace). >> There >> + was an incorrect fix that would lead to whitespace being stripped where >> it >> + shouldn't be, resulting in words being stuck together (likethis). This >> is >> + resolved, though existing patches and cover letters will need to be >> + updated manually. _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork