On Wed, Jun 23, 2021 at 12:51:46PM -0400, Raxel Gutierrez wrote: > > When contributors using Gnus reply to a patch, the In-Reply-To & > References header fields include comments in the form described in > remove_rfc2822_comments spec. These comments can lead to threading > issues where users' replies don't get associated to the patch they > reply to because the added comment will be part of the message-id when > parsed. Following RFC2822, a comment is not supposed to affect threading > in any way, so we remove it. To use regex, we assume that comments are > not nested; further changes to more thoroughly match the RFC2822 grammar > remains for anyone interested.
Hm, does the nesting level of the comments really matter? Or is the issue that they may be multiline? That is - it's pretty trivial to write a regex to match (foo(bar)baz((quot)meh)), as long as you don't actually care about the semantics of the nesting. > > Signed-off-by: Raxel Gutierrez <[email protected]> > Closes: #399 > --- > patchwork/parser.py | 25 +++++++++++++++++-- > .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++ > 2 files changed, 30 insertions(+), 2 deletions(-) > create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml > > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 61a8124..683ff55 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -70,6 +70,27 @@ def normalise_space(value): > return whitespace_re.sub(' ', value).strip() > > > +def remove_rfc2822_comments(header_contents): > + """Removes RFC2822 comments from header fields. > + > + Gnus create reply emails with commments like In-Reply-To/References: > + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment]. > + Patchwork parses the values of the In-Reply-To & References header fields > + with the comment included as part of their value. A side effect of the > + comment not being removed is that message-ids are mismatched. These > + comments do not provide useful information for processing patches > + because they are ignored for threading and not rendered by mail readers. > + """ > + > + # Captures comments in header fields. Firstly, I'd like to point out for other reviewers that Raxel commented the expression this way because I told him to - if you hate it, blame me, not him ;) > + comment_pattern = re.compile(r""" > + \( # The opening parenthesis of comment > + [^()]* # The contents of the comment I *think* this is the bit that's making it not support nesting. "Match anything besides another open- or close-paren". https://docs.python.org/3/library/re.html tells me that Python treats '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments? Or is there an issue that you can have more that one, e.g. In-Reply-To: (danica's mail) [email protected] (from gnus) in which case greedy-matching would also obliterate the actual message-id? This actually brings to mind that I'd like to see an example of one such problematic line in the commit message, if you've got one handy. > + \) # The closing parenthesis of comment > + """, re.X) > + return re.sub(comment_pattern, '', header_contents) > + > + > def sanitise_header(header_contents, header_name=None): > """Clean and individual mail header. > > @@ -483,13 +504,13 @@ def find_references(mail): > > if 'In-Reply-To' in mail: > for in_reply_to in mail.get_all('In-Reply-To'): > - r = clean_header(in_reply_to) > + r = remove_rfc2822_comments(clean_header(in_reply_to)) > if r: > refs.append(r) > > if 'References' in mail: > for references_header in mail.get_all('References'): > - h = clean_header(references_header) > + h = remove_rfc2822_comments(clean_header(references_header)) > if not h: > continue > references = h.split() > diff --git a/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml > b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml > new file mode 100644 > index 0000000..a3e2ec1 > --- /dev/null > +++ b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml > @@ -0,0 +1,7 @@ > +--- > +fixes: > + - | > + Gnus users' replies would not be associated to the patch they reply to > + because their In-Reply-To & References field generated explanatory > + comments that were added to the value of the message-id parsed. > + (`#399 <https://github.com/getpatchwork/patchwork/issues/399>`__) Hum. Is there a test suite we can add a regression test to for this specific kind of line format? Sorry not to have left any lines on the Python style - I'm the last person you should ask. ;) Please don't consider my lack of comment to be an approval, there. - Emily _______________________________________________ Patchwork mailing list [email protected] https://lists.ozlabs.org/listinfo/patchwork
