Re: body-CC-comment regression
Junio C Hamanowrites: > Matthieu Moy writes: > >> Junio C Hamano writes: >> >>> That approach may still constrain what those in the former camp can >>> write in the "cruft" part, like they cannot write comma or semicolon >>> as part of the "cruft", no? >> >> Right. Indeed, this may be a problem since the use of "#" for stable >> seem to include commit message, and they may contain commas. >> >> So, maybe Johan's patch is better indeed. > > OK, so I'll queue that one with your Ack for now so that we won't > forget. I guess we still want a few tests? It seems that there is an expectation in one of the tests that needs to be adjusted.
Re: body-CC-comment regression
Matthieu Moywrites: > Junio C Hamano writes: > >> That approach may still constrain what those in the former camp can >> write in the "cruft" part, like they cannot write comma or semicolon >> as part of the "cruft", no? > > Right. Indeed, this may be a problem since the use of "#" for stable > seem to include commit message, and they may contain commas. > > So, maybe Johan's patch is better indeed. OK, so I'll queue that one with your Ack for now so that we won't forget. I guess we still want a few tests? Thanks.
Re: body-CC-comment regression
Junio C Hamanowrites: > That approach may still constrain what those in the former camp can > write in the "cruft" part, like they cannot write comma or semicolon > as part of the "cruft", no? Right. Indeed, this may be a problem since the use of "#" for stable seem to include commit message, and they may contain commas. So, maybe Johan's patch is better indeed. -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Re: body-CC-comment regression
Matthieu Moywrites: > Junio C Hamano writes: > >> Johan Hovold writes: >> >>> That's precisely what the patch I posted earlier in the thread did. >> >> That's good. I didn't see any patch yet > > It's here: > > http://public-inbox.org/git/20170217110642.GD2625@localhost/ > > but as I explained, this removes a feature suported since several major > releases and we have no idea how many users may use the "mupliple emails > in one field". The approach I proposed does not suffer from this. OK, so you are aiming higher to please both parties, i.e. those who place a single address but cruft at the end would get the cruft stripped when we grab a usable address from the field, and those who write two or more addresses would get all of these addresses? That approach may still constrain what those in the former camp can write in the "cruft" part, like they cannot write comma or semicolon as part of the "cruft", no? If that does not pose a practical problem, then I can imagine it would work well for people in both camps.
Re: body-CC-comment regression
Junio C Hamanowrites: > Johan Hovold writes: > >> That's precisely what the patch I posted earlier in the thread did. > > That's good. I didn't see any patch yet It's here: http://public-inbox.org/git/20170217110642.GD2625@localhost/ but as I explained, this removes a feature suported since several major releases and we have no idea how many users may use the "mupliple emails in one field". The approach I proposed does not suffer from this. -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Re: body-CC-comment regression
Johan Hovoldwrites: > That's precisely what the patch I posted earlier in the thread did. That's good. I didn't see any patch yet but the message you are responding to is a response to Matthieu's message asking if you are planning to work on it, so I'd assume you are and and look forward to seeing a patch (or a series?) we can queue. Thanks.
Re: body-CC-comment regression
On Fri, Feb 17, 2017 at 10:18:46AM -0800, Junio C Hamano wrote: > Matthieu Moywrites: > > >> On Fri, Feb 17, 2017 at 02:16:42PM +0100, Matthieu Moy wrote: > > ... > > If I had a time machine, I'd probably go back then and forbid multiple > > addresses there, but ... > > > >> There does not seem to be single commit in the kernel where multiple > >> address are specified in a CC tag since after git-send-email started > >> allowing it, but there are ten commits before (to my surprise), and that > >> should be contrasted with at least 4178 commits with trailing comments > >> including a # sign. > > > > Hey, there's a life outside the kernel ;-). > > ... > >>> 1) Stop calling Mail::Address even if available.[...] > >> > >> Right, that sounds like the right thing to do regardless. > >> > >>> 2) Modify our in-house parser to discard garbage after the >. [...] > >> > >> Sounds perfectly fine to me, and seems to work too after quick test. > > > > OK, sounds like the way to go. > > > > Do you want to work on a patch? If not, I should be able to do that > > myself. The code changes are straightforward, but we probably want a > > proper test for that. > > The true headers and the things at the bottom seem to be handled in > a separate loop in send-email, so treating Cc: found in the former > and in the latter differently should be doable. I think it is OK to > explicitly treat the latter as "these are not e-mail addresses, but > just a single e-mail address possibly with non-address cruft", > without losing the ability to have more than one addresses on a > single CC: e-mail header. That's precisely what the patch I posted earlier in the thread did. Thanks, Johan
Re: body-CC-comment regression
Matthieu Moywrites: >> On Fri, Feb 17, 2017 at 02:16:42PM +0100, Matthieu Moy wrote: > ... > If I had a time machine, I'd probably go back then and forbid multiple > addresses there, but ... > >> There does not seem to be single commit in the kernel where multiple >> address are specified in a CC tag since after git-send-email started >> allowing it, but there are ten commits before (to my surprise), and that >> should be contrasted with at least 4178 commits with trailing comments >> including a # sign. > > Hey, there's a life outside the kernel ;-). > ... >>> 1) Stop calling Mail::Address even if available.[...] >> >> Right, that sounds like the right thing to do regardless. >> >>> 2) Modify our in-house parser to discard garbage after the >. [...] >> >> Sounds perfectly fine to me, and seems to work too after quick test. > > OK, sounds like the way to go. > > Do you want to work on a patch? If not, I should be able to do that > myself. The code changes are straightforward, but we probably want a > proper test for that. The true headers and the things at the bottom seem to be handled in a separate loop in send-email, so treating Cc: found in the former and in the latter differently should be doable. I think it is OK to explicitly treat the latter as "these are not e-mail addresses, but just a single e-mail address possibly with non-address cruft", without losing the ability to have more than one addresses on a single CC: e-mail header.
Re: body-CC-comment regression
On Fri, Feb 17, 2017 at 5:16 AM, Matthieu Moywrote: > > I mostly agree for the SoB, but why should a Cc tag have only one email? Because changing that clearly broke real and useful behavior. The "multiple email addresses" thing is bogus and wrong. Just don't do it. How would you even parse it sanely? Are the Cc: lines now SMTP-compliant with the whole escaping and all the usual "next line" rules? For example, in email, the rule for "next line" is that if you're in a header block, and it starts with whitespace, then it's a continuation of the last line. That's *not* how Cc: lines work in commit messages. They are all individual lines, and we have lots of tools (mainly just scripts with grepping) that simply depend on it. So this notion that the bottom of the commit message is some email header crap is WRONG. Stop it. It caused bugs. It's wrong. Don't do it. Linus
Re: body-CC-comment regression
On Fri, Feb 17, 2017 at 05:58:11PM +0100, Matthieu Moy wrote: > > On Fri, Feb 17, 2017 at 02:16:42PM +0100, Matthieu Moy wrote: > >> Johan Hovoldwrites: > > > >> The "multiple emails per Cc: field" has been there for a while already > >> (b1c8a11c8024 released in 2.6.0, sept 2015), some users may have got > >> used to it. What you are proposing breaks their flow. > > > > Note that that commit never mentions multiple addresses in either > > headers or body-tags -- it's all about being able to specify multiple > > entries on the command line. > > Indeed. I'm not the author of the patch, but I was supervising the > students who wrote it and "multiple addresses in Cc:" was not the goal, > but a (IMHO positive) side effect we discovered after the fact. Yeah, and the broken --suppress-cc=self I mention below is indicative of that too. > If I had a time machine, I'd probably go back then and forbid multiple > addresses there, but ... > > > There does not seem to be single commit in the kernel where multiple > > address are specified in a CC tag since after git-send-email started > > allowing it, but there are ten commits before (to my surprise), and that > > should be contrasted with at least 4178 commits with trailing comments > > including a # sign. > > Hey, there's a life outside the kernel ;-). Sure, but it's the origin of git as well as the tags we're discussing (I believe). My point of bringing it up was that the multiple addresses in a CC-tag was indeed an unintended (and undocumented) side-effect and I doubt many people have started using it given that it's sort of counter-intuitive (again, compare with SoB). If either the trailing comments or multiple addresses in a CC-tag has to go, I think dropping the latter is clearly the best choice. > >> 1) Stop calling Mail::Address even if available.[...] > > > > Right, that sounds like the right thing to do regardless. > > > >> 2) Modify our in-house parser to discard garbage after the >. [...] > > > > Sounds perfectly fine to me, and seems to work too after quick test. > > OK, sounds like the way to go. > > Do you want to work on a patch? If not, I should be able to do that > myself. The code changes are straightforward, but we probably want a > proper test for that. Feel free to implement it this way if that's what people prefer. As long as trailing comments are supported and discarded, I don't really have a preference. > > addresses in a Cc-tag in that it breaks --suppress-cc=self, but I guess > > that can be fixed separately. > > OK. If it's unrelated enough, please start a separate thread to explain > the problem (and/or write a patch ;-) ). Well, it's related to the "offending" patch that added support for multiple addresses in tags. By disallowing that, as my fix does, the problem goes away. # Now parse the message body while(<$fh>) { $message .= $_; if (/^(Signed-off-by|Cc): (.*)$/i) { chomp; my ($what, $c) = ($1, $2); chomp $c; my $sc = sanitize_address($c); if ($sc eq $sender) { next if ($suppress_cc{'self'}); The problem here is that $sc will never match $sender when there are more than one address in a tag. For example: From: Johan Hovold ... Cc: alpha , Johan Hovold results in sc = alpha , Johan Hovold sender = Johan Hovold so that --suppress-cc=self is not honoured. Thanks, Johan
Re: body-CC-comment regression
> On Fri, Feb 17, 2017 at 02:16:42PM +0100, Matthieu Moy wrote: >> Johan Hovoldwrites: > >> The "multiple emails per Cc: field" has been there for a while already >> (b1c8a11c8024 released in 2.6.0, sept 2015), some users may have got >> used to it. What you are proposing breaks their flow. > > Note that that commit never mentions multiple addresses in either > headers or body-tags -- it's all about being able to specify multiple > entries on the command line. Indeed. I'm not the author of the patch, but I was supervising the students who wrote it and "multiple addresses in Cc:" was not the goal, but a (IMHO positive) side effect we discovered after the fact. If I had a time machine, I'd probably go back then and forbid multiple addresses there, but ... > There does not seem to be single commit in the kernel where multiple > address are specified in a CC tag since after git-send-email started > allowing it, but there are ten commits before (to my surprise), and that > should be contrasted with at least 4178 commits with trailing comments > including a # sign. Hey, there's a life outside the kernel ;-). >> 1) Stop calling Mail::Address even if available.[...] > > Right, that sounds like the right thing to do regardless. > >> 2) Modify our in-house parser to discard garbage after the >. [...] > > Sounds perfectly fine to me, and seems to work too after quick test. OK, sounds like the way to go. Do you want to work on a patch? If not, I should be able to do that myself. The code changes are straightforward, but we probably want a proper test for that. > addresses in a Cc-tag in that it breaks --suppress-cc=self, but I guess > that can be fixed separately. OK. If it's unrelated enough, please start a separate thread to explain the problem (and/or write a patch ;-) ). Thanks, -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Re: body-CC-comment regression
On Fri, Feb 17, 2017 at 02:16:42PM +0100, Matthieu Moy wrote: > Johan Hovoldwrites: > > > There is another option, namely to only accept a single address for tags > > in the body. I understand that being able to copy a CC-header to either > > the header section or to the command line could be useful, but I don't > > really see the point in allowing this in the tags in the body (a SoB > > always has one address, and so should a CC-tag). > > I mostly agree for the SoB, but why should a Cc tag have only one email? For symmetry (with SoB) and readability reasons (one tag per line). These are body tags, not mail headers, after all. > The "multiple emails per Cc: field" has been there for a while already > (b1c8a11c8024 released in 2.6.0, sept 2015), some users may have got > used to it. What you are proposing breaks their flow. Note that that commit never mentions multiple addresses in either headers or body-tags -- it's all about being able to specify multiple entries on the command line. There does not seem to be single commit in the kernel where multiple address are specified in a CC tag since after git-send-email started allowing it, but there are ten commits before (to my surprise), and that should be contrasted with at least 4178 commits with trailing comments including a # sign. > > And since this is a regression for something that has been working for > > years that was introduced by a new feature, I also think it's reasonable > > to (partially) revert the feature. > > I'd find it rather ironic to fix your case by breaking a feature that > has been working for more than a year :-(. What would you answer to a > contributor comming one year from now and proposing to revert your > reversion because it breaks his flow? Such conflicts are not uncommon when dealing with regressions introduced by new features, and need to be dealt with on a case-by-case basis. But the fact that trailing comments have been properly supported for more than four years should carry some weight. > All that said, I think another fix would be both satisfactory for > everyone and rather simple: > > 1) Stop calling Mail::Address even if available. It used to make sense >to do that when our in-house parser was really poor, but we now have >something essentially as good as Mail::Address. We test our parser >against Mail::Address and we do have a few known differences (see >t9000), but they are really corner-cases and shouldn't matter. > >A good consequence of this is that we stop depending on the way Perl >is installed to parse emails. Regardless of the current issue, I >think it is a good thing. Right, that sounds like the right thing to do regardless. > 2) Modify our in-house parser to discard garbage after the >. The patch >should look like (untested): > > --- a/perl/Git.pm > +++ b/perl/Git.pm > @@ -903,11 +903,11 @@ sub parse_mailboxes { > my (@addr_list, @phrase, @address, @comment, @buffer) = (); > foreach my $token (@tokens) { > if ($token =~ /^[,;]$/) { > - # if buffer still contains undeterminated strings > - # append it at the end of @address or @phrase > - if ($end_of_addr_seen) { > - push @phrase, @buffer; > - } else { > + # if buffer still contains undeterminated > + # strings append it at the end of @address, > + # unless we already saw the closing >, in > + # which case we discard it. > + if (!$end_of_addr_seen) { > push @address, @buffer; > } > > What do you think? Sounds perfectly fine to me, and seems to work too after quick test. Note however that there's another minor issue with using multiple addresses in a Cc-tag in that it breaks --suppress-cc=self, but I guess that can be fixed separately. Thanks, Johan
Re: body-CC-comment regression
Johan Hovoldwrites: > There is another option, namely to only accept a single address for tags > in the body. I understand that being able to copy a CC-header to either > the header section or to the command line could be useful, but I don't > really see the point in allowing this in the tags in the body (a SoB > always has one address, and so should a CC-tag). I mostly agree for the SoB, but why should a Cc tag have only one email? The "multiple emails per Cc: field" has been there for a while already (b1c8a11c8024 released in 2.6.0, sept 2015), some users may have got used to it. What you are proposing breaks their flow. > And since this is a regression for something that has been working for > years that was introduced by a new feature, I also think it's reasonable > to (partially) revert the feature. I'd find it rather ironic to fix your case by breaking a feature that has been working for more than a year :-(. What would you answer to a contributor comming one year from now and proposing to revert your reversion because it breaks his flow? All that said, I think another fix would be both satisfactory for everyone and rather simple: 1) Stop calling Mail::Address even if available. It used to make sense to do that when our in-house parser was really poor, but we now have something essentially as good as Mail::Address. We test our parser against Mail::Address and we do have a few known differences (see t9000), but they are really corner-cases and shouldn't matter. A good consequence of this is that we stop depending on the way Perl is installed to parse emails. Regardless of the current issue, I think it is a good thing. 2) Modify our in-house parser to discard garbage after the >. The patch should look like (untested): --- a/perl/Git.pm +++ b/perl/Git.pm @@ -903,11 +903,11 @@ sub parse_mailboxes { my (@addr_list, @phrase, @address, @comment, @buffer) = (); foreach my $token (@tokens) { if ($token =~ /^[,;]$/) { - # if buffer still contains undeterminated strings - # append it at the end of @address or @phrase - if ($end_of_addr_seen) { - push @phrase, @buffer; - } else { + # if buffer still contains undeterminated + # strings append it at the end of @address, + # unless we already saw the closing >, in + # which case we discard it. + if (!$end_of_addr_seen) { push @address, @buffer; } What do you think? -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Re: body-CC-comment regression
On Thu, Feb 16, 2017 at 07:16:57PM +0100, Matthieu Moy wrote: > Johan Hovoldwrites: > > > Hi, > > > > I recently noticed that after an upgrade, git-send-email (2.10.2) > > started aborting when trying to send patches that had a linux-kernel > > stable-tag in its body. For example, > > > > Cc: # 4.4 > > > > was now parsed as > > > > "sta...@vger.kernel.org#4.4" > > > > which resulted in > > > > Died at /usr/libexec/git-core/git-send-email line 1332, line 1. > > This has changed in e3fdbcc8e1 (parse_mailboxes: accept extra text after > <...> address, 2016-10-13), released v2.11.0 as you noticed: > > > The problem with the resulting fixes that are now in 2.11.1 is that > > git-send-email no longer discards the trailing comment but rather > > shoves it into the name after adding some random white space: > > > > "# 3 . 3 . x : 1b9508f : sched : Rate-limit newidle" > > " > > > > This example is based on the example from > > Documentation/process/stable-kernel-rules.rst: > > > > Cc: # 3.3.x: 1b9508f: sched: Rate-limit newidle > > > > and this format for stable-tags has been documented at least since 2009 > > and 8e9b9362266d ("Doc/stable rules: add new cherry-pick logic"), and > > has been supported by git since 2012 and 831a488b76e0 ("git-send-email: > > remove garbage after email address") I believe. > > > > Can we please revert to the old behaviour of simply discarding such > > comments (from body-CC:s) or at least make it configurable through a > > configuration option? > > The problem is that we now accept list of emails instead of just one > email, so it's hard to define what "comments after the email", for > example > > Cc: # , > > Is not accepted as two emails. > > So, just stripping whatever comes after # before parsing the list of > emails would change the behavior once more, and possibly break other > user's flow. Dropping the garbage after the email while parsing is > possible, but only when we use our in-house parser (and we currently use > Perl's Mail::Address when available). > > So, a proper fix is far from obvious, and unfortunately I won't have > time to work on that, at least not before a while. There is another option, namely to only accept a single address for tags in the body. I understand that being able to copy a CC-header to either the header section or to the command line could be useful, but I don't really see the point in allowing this in the tags in the body (a SoB always has one address, and so should a CC-tag). And since this is a regression for something that has been working for years that was introduced by a new feature, I also think it's reasonable to (partially) revert the feature. > OTOH, the current behavior isn't that bad. It accepts the input, and > extracts a valid email out of it. Just the display name is admitedly > suboptimal ... Yeah, but the display name can end up with so much noise that auto-cc is effectively broken for people submitting kernel patches (with stable tags) as the only way to avoid it is to suppress all bodycc. So what I'm proposing is to revert to the earlier behaviour of only allowing one address per body tag by simply discarding anything after the address. Something like the below seems to do the trick. Thanks, Johan >From f551b4ca9926624dc7af6c286d7cf0f97af39541 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Fri, 17 Feb 2017 11:55:47 +0100 Subject: [PATCH] send-email: only allow one address per body tag Adding comments after a tag in the body is a common practise (e.g. in the Linux kernel) and git-send-email has been supporting this for years by removing any trailing cruft after the address. After some recent changes, any trailing comment is now instead appended to the recipient name (with some random white space inserted) resulting in undesirable noise in the headers, for example: CC: "# 3 . 3 . x : 1b9508f : sched : Rate-limit newidle" Revert to the earlier behaviour of discarding anything after the (first) address in a tag while parsing the body. Note that multiple addresses after are still allowed after a command-line switch (and in a CC-header). Fixes: b1c8a11c8024 ("send-email: allow multiple emails using --cc, --to and --bcc") Fixes: e3fdbcc8e164 ("parse_mailboxes: accept extra text after <...> address") Signed-off-by: Johan Hovold --- git-send-email.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-send-email.perl b/git-send-email.perl index 068d60b3e698..eea0a517f71b 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -1563,7 +1563,7 @@ foreach my $t (@files) { # Now parse the message body while(<$fh>) { $message .= $_; - if (/^(Signed-off-by|Cc): (.*)$/i) { + if (/^(Signed-off-by|Cc): ([^>]*>?)/i)
Re: body-CC-comment regression
Johan Hovoldwrites: > Hi, > > I recently noticed that after an upgrade, git-send-email (2.10.2) > started aborting when trying to send patches that had a linux-kernel > stable-tag in its body. For example, > > Cc: # 4.4 > > was now parsed as > > "sta...@vger.kernel.org#4.4" > > which resulted in > > Died at /usr/libexec/git-core/git-send-email line 1332, line 1. This has changed in e3fdbcc8e1 (parse_mailboxes: accept extra text after <...> address, 2016-10-13), released v2.11.0 as you noticed: > The problem with the resulting fixes that are now in 2.11.1 is that > git-send-email no longer discards the trailing comment but rather > shoves it into the name after adding some random white space: > > "# 3 . 3 . x : 1b9508f : sched : Rate-limit newidle" > " > > This example is based on the example from > Documentation/process/stable-kernel-rules.rst: > > Cc: # 3.3.x: 1b9508f: sched: Rate-limit newidle > > and this format for stable-tags has been documented at least since 2009 > and 8e9b9362266d ("Doc/stable rules: add new cherry-pick logic"), and > has been supported by git since 2012 and 831a488b76e0 ("git-send-email: > remove garbage after email address") I believe. > > Can we please revert to the old behaviour of simply discarding such > comments (from body-CC:s) or at least make it configurable through a > configuration option? The problem is that we now accept list of emails instead of just one email, so it's hard to define what "comments after the email", for example Cc: # , Is not accepted as two emails. So, just stripping whatever comes after # before parsing the list of emails would change the behavior once more, and possibly break other user's flow. Dropping the garbage after the email while parsing is possible, but only when we use our in-house parser (and we currently use Perl's Mail::Address when available). So, a proper fix is far from obvious, and unfortunately I won't have time to work on that, at least not before a while. OTOH, the current behavior isn't that bad. It accepts the input, and extracts a valid email out of it. Just the display name is admitedly suboptimal ... -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Re: body-CC-comment regression
On Thu, Feb 16, 2017 at 09:59:25AM -0800, Junio C Hamano wrote: > Johan Hovoldwrites: > > > I recently noticed that after an upgrade, git-send-email (2.10.2) > > started aborting when trying to send patches that had a linux-kernel > > stable-tag in its body. For example, > > > > Cc: # 4.4 > > > > was now parsed as > > > > "sta...@vger.kernel.org#4.4" > > ... > > It sounds like a fallout of this: > > > https://public-inbox.org/git/41164484-309b-bfff-ddbb-55153495d...@lwfinger.net/#t > > > and any change to "fix" you may break the other person. Yes, that's the thread I was referring to as well, and the reported breakage is the same even if the reporter used a non-standard stable-tag format (e.g. a "[4.8+]" suffix). What I'm wondering is whether the alternative fix proposed in that thread, to revert to the old behaviour of discarding trailing comments, should be considered instead of what was implemented. > > Can we please revert to the old behaviour of simply discarding such > > comments (from body-CC:s) or at least make it configurable through a > > configuration option? > > If I recall the old thread correctly, it was reported that using > Mail::Address without forcing git-send-email fall back to its own > non-parsing-but-paste-address-looking-things-together code would > solve it, so can the "make it configurable" be just "install > Mail::Address"? I believe git-send-email's parser was changed to mimic Mail::Address, and installing it does not seem to change the behaviour of including any trailing comments in the name. Thanks, Johan
Re: body-CC-comment regression
Johan Hovoldwrites: > I recently noticed that after an upgrade, git-send-email (2.10.2) > started aborting when trying to send patches that had a linux-kernel > stable-tag in its body. For example, > > Cc: # 4.4 > > was now parsed as > > "sta...@vger.kernel.org#4.4" > ... It sounds like a fallout of this: https://public-inbox.org/git/41164484-309b-bfff-ddbb-55153495d...@lwfinger.net/#t and any change to "fix" you may break the other person. > Can we please revert to the old behaviour of simply discarding such > comments (from body-CC:s) or at least make it configurable through a > configuration option? If I recall the old thread correctly, it was reported that using Mail::Address without forcing git-send-email fall back to its own non-parsing-but-paste-address-looking-things-together code would solve it, so can the "make it configurable" be just "install Mail::Address"?