Daniel Axtens <[email protected]> writes: > However, given that this is the first time comments have mattered at > all, I'd be OK to ignore multi-line comments. I would like nested > comments to work though, unless it makes a real mess of things.
Ah so it turns out I'm totally wrong here: we've been given some sample data on the GH issue (http://www.delorie.com/tmp/patchwork-399-1.txt) and it contains a multi-line comment: In-Reply-To: <[email protected]> (message from liqingqing on Thu, 1 Apr 2021 16:51:45 +0800) I don't know if Python's email module will fold multi-line headers automatically - it's very possible it does - or if the regex will work over multiple lines... I lose track of which regex engine does what! Kind regards, Daniel > >>> >>> Signed-off-by: Raxel Gutierrez <[email protected]> >>> Closes: #399 >>> --- >>> patchwork/parser.py | 25 +++++++++++++++++-- >>> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++ >>> 2 files changed, 30 insertions(+), 2 deletions(-) >>> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml >>> >>> diff --git a/patchwork/parser.py b/patchwork/parser.py >>> index 61a8124..683ff55 100644 >>> --- a/patchwork/parser.py >>> +++ b/patchwork/parser.py >>> @@ -70,6 +70,27 @@ def normalise_space(value): >>> return whitespace_re.sub(' ', value).strip() >>> >>> >>> +def remove_rfc2822_comments(header_contents): >>> + """Removes RFC2822 comments from header fields. >>> + >>> + Gnus create reply emails with commments like In-Reply-To/References: >>> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment]. >>> + Patchwork parses the values of the In-Reply-To & References header >>> fields >>> + with the comment included as part of their value. A side effect of the >>> + comment not being removed is that message-ids are mismatched. These >>> + comments do not provide useful information for processing patches >>> + because they are ignored for threading and not rendered by mail >>> readers. >>> + """ >>> + >>> + # Captures comments in header fields. >> >> Firstly, I'd like to point out for other reviewers that Raxel commented >> the expression this way because I told him to - if you hate it, blame >> me, not him ;) > > If `tox -e flake8` is happy, I am happy :) > >>> + comment_pattern = re.compile(r""" >>> + \( # The opening parenthesis of >>> comment >>> + [^()]* # The contents of the comment >> I *think* this is the bit that's making it not support nesting. >> "Match anything besides another open- or close-paren". >> >> https://docs.python.org/3/library/re.html tells me that Python treats >> '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments? >> Or is there an issue that you can have more that one, e.g. >> >> In-Reply-To: (danica's mail) [email protected] (from gnus) >> >> in which case greedy-matching would also obliterate the actual >> message-id? >> >> This actually brings to mind that I'd like to see an example of one such >> problematic line in the commit message, if you've got one handy. > > I've asked on the issue > (https://github.com/getpatchwork/patchwork/issues/399) to see if we can > get some examples. Ostensibly emacs generates them, but I use > emacs+notmuch and I don't see them so I think it might be gnus specific. > > Kind Regards, > Daniel _______________________________________________ Patchwork mailing list [email protected] https://lists.ozlabs.org/listinfo/patchwork
