Re: Bug in "git am" when the body starts with spaces
On Mon, Apr 03, 2017 at 10:42:46AM -0700, Jonathan Tan wrote: > On 04/01/2017 09:18 PM, Jeff King wrote: > > On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote: > > > The logic is fairly simple: if we encounter an empty line, and we have > > > pending in-body headers, we flush the pending headers, and mark us as > > > no longer in header mode. > > > > Hmm. I think this may work. At first I thought it was too strict in > > always checking inbody_header_accum.len, because we want this to kick in > > always, whether there's whitespace continuation or not. But that > > accumulator has to collect preemptively, before it knows if there's > > continuation. So it will always be non-empty if we've seen _any_ header, > > and it will remain non-empty as long as we keep parsing (because any > > time we flush, we do so in order to handle another line). > > > > IOW, I think this implements the state-machine thing I wrote in my > > earlier email, because the state "are we inside in-body header parsing" > > is always reflected by having a non-empty accumulator. It is a bit > > non-obvious though. > > About obviousness, I think of a non-empty accumulator merely representing > that the next line could potentially be a continuation line. And it is > coincidence that this implies "are we inside in-body header parsing"; if not > all in-body header lines could be "continued", there would be no such > implication. > > mi->inbody_header_accum.len is already used in check_inbody_header to mean > "could the next line potentially be a continuation line" and to trigger a > check for a negative criterion (in this case, a scissors line). I think it's > fine to do the same thing, the negative criterion here being a blank line. FWIW, I looked into making a single "state" variable, but it actually got ugly pretty quickly. I think not because it's not a cleaner method in the long run, but just because the existing code is so dependent on individual state variables that needed changing. So I'm OK with Linus's fix; it certainly follows the existing code patterns. -Peff
Re: Bug in "git am" when the body starts with spaces
Thanks, everyone, for looking into this. On 04/01/2017 09:18 PM, Jeff King wrote: On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote: The logic is fairly simple: if we encounter an empty line, and we have pending in-body headers, we flush the pending headers, and mark us as no longer in header mode. Hmm. I think this may work. At first I thought it was too strict in always checking inbody_header_accum.len, because we want this to kick in always, whether there's whitespace continuation or not. But that accumulator has to collect preemptively, before it knows if there's continuation. So it will always be non-empty if we've seen _any_ header, and it will remain non-empty as long as we keep parsing (because any time we flush, we do so in order to handle another line). IOW, I think this implements the state-machine thing I wrote in my earlier email, because the state "are we inside in-body header parsing" is always reflected by having a non-empty accumulator. It is a bit non-obvious though. About obviousness, I think of a non-empty accumulator merely representing that the next line could potentially be a continuation line. And it is coincidence that this implies "are we inside in-body header parsing"; if not all in-body header lines could be "continued", there would be no such implication. mi->inbody_header_accum.len is already used in check_inbody_header to mean "could the next line potentially be a continuation line" and to trigger a check for a negative criterion (in this case, a scissors line). I think it's fine to do the same thing, the negative criterion here being a blank line.
Re: Bug in "git am" when the body starts with spaces
Linus Torvalds writes: > On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds > wrote: >> >> The continuation logic is oddly complex, and I can't follow the logic. >> But it is completely broken in how it thinks empty lines are somehow >> "continuations". > > The attached patch seems to work for me. Comments? We start at header_stage set to 1, keep skipping empty lines while in that state, and we stay in that state as long as we see in-body header (or a continuation of the in-body header we saw earlier). We get out of this state when we see a blank line after we are done with the in-body headers. Once header_stage is set to 0 with a blank line, we don't do in-body headers (scissors will roll back the whole thing and irrelevant to the analysis of correctness). But you found that "keep skipping empty" done unconditionally is wrong, because we may have already seen an in-body header and are trying to see if the line is a continuation (in which case check_inbody_header() would append to the previous) or another in-body header (in which case again check_inbody_header() would use it), or something else (in which case check_inbody_header() will return false, we get out of header_stage=1 state and process this line as the first line of the log message. An empty line we see in this state must trigger "we are no longer taking in-body header" logic, too. And that is exactly your patch does. The change "feels" correct to me. > mailinfo.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/mailinfo.c b/mailinfo.c > index a489d9d0f..68037758f 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -757,8 +757,13 @@ static int handle_commit_msg(struct mailinfo *mi, struct > strbuf *line) > assert(!mi->filter_stage); > > if (mi->header_stage) { > - if (!line->len || (line->len == 1 && line->buf[0] == '\n')) > + if (!line->len || (line->len == 1 && line->buf[0] == '\n')) { > + if (mi->inbody_header_accum.len) { > + flush_inbody_header_accum(mi); > + mi->header_stage = 0; > + } > return 0; > + } > } > > if (mi->use_inbody_headers && mi->header_stage) {
Re: Bug in "git am" when the body starts with spaces
On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote: > On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds > wrote: > > > > The continuation logic is oddly complex, and I can't follow the logic. > > But it is completely broken in how it thinks empty lines are somehow > > "continuations". > > The attached patch seems to work for me. Comments? > > The logic is fairly simple: if we encounter an empty line, and we have > pending in-body headers, we flush the pending headers, and mark us as > no longer in header mode. Hmm. I think this may work. At first I thought it was too strict in always checking inbody_header_accum.len, because we want this to kick in always, whether there's whitespace continuation or not. But that accumulator has to collect preemptively, before it knows if there's continuation. So it will always be non-empty if we've seen _any_ header, and it will remain non-empty as long as we keep parsing (because any time we flush, we do so in order to handle another line). IOW, I think this implements the state-machine thing I wrote in my earlier email, because the state "are we inside in-body header parsing" is always reflected by having a non-empty accumulator. It is a bit non-obvious though. -Peff
Re: Bug in "git am" when the body starts with spaces
On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds wrote: > > The continuation logic is oddly complex, and I can't follow the logic. > But it is completely broken in how it thinks empty lines are somehow > "continuations". The attached patch seems to work for me. Comments? The logic is fairly simple: if we encounter an empty line, and we have pending in-body headers, we flush the pending headers, and mark us as no longer in header mode. The code used to just ignore empty lines when in header mode, which is garbage, exactly because it would happily continue accumulating more headers. There may be other cases this misses, but this at least seems to fix the case I encountered. Linus mailinfo.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mailinfo.c b/mailinfo.c index a489d9d0f..68037758f 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -757,8 +757,13 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line) assert(!mi->filter_stage); if (mi->header_stage) { - if (!line->len || (line->len == 1 && line->buf[0] == '\n')) + if (!line->len || (line->len == 1 && line->buf[0] == '\n')) { + if (mi->inbody_header_accum.len) { + flush_inbody_header_accum(mi); + mi->header_stage = 0; + } return 0; + } } if (mi->use_inbody_headers && mi->header_stage) {
Re: Bug in "git am" when the body starts with spaces
On Fri, Mar 31, 2017 at 05:52:00PM -0700, Linus Torvalds wrote: > Ok, did a bisect, and this bisects to commit 6b4b013f1884 ("mailinfo: > handle in-body header continuations"). > > The continuation logic is oddly complex, and I can't follow the logic. > But it is completely broken in how it thinks empty lines are somehow > "continuations". I think the continuation logic is OK. The problem is that the newline is never fed to check_inbody_header() at all. So the next line its state machine sees is the indented line, which looks like a continuation. It seems to me that ignoring that newline is a bug, and causes other problems. For instance, if you have a blank line and then another header-looking thing (not a continuation) after, it is respected. For instance, running mailinfo on: From ...mbox header... From: Email Author Subject: email subject Date: whatever From: in-body author Subject: in-body subject And the rest of the message. will use "in-body subject" as the subject, though I'd have thought we should stop parsing in-body headers as soon as we see the first in-body blank line (the one after the in-body "From"). The blank line gets eaten by the check at the beginning of handle_commit_msg(), right _before_ we pass it off to the check_inbody_header() function. It seems like that should be more like: diff --git a/mailinfo.c b/mailinfo.c index a489d9d0f..6d89781eb 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -757,8 +757,10 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line) assert(!mi->filter_stage); if (mi->header_stage) { - if (!line->len || (line->len == 1 && line->buf[0] == '\n')) + if (!line->len || (line->len == 1 && line->buf[0] == '\n')) { + mi->header_stage = 0; return 0; + } } if (mi->use_inbody_headers && mi->header_stage) { But that breaks a bunch of tests. It looks like the loop in handle_body always starts by feeding the initial header-separator (the real headers, not the in-body ones) to the various parsers. So that first newline makes us trigger "no more in-body headers" before we even start parsing any lines. Oops. So doing this: @@ -960,7 +962,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line) goto handle_body_out; } - do { + while (!strbuf_getwholeline(line, mi->input, '\n')) { /* process any boundary lines */ if (*(mi->content_top) && is_multipart_boundary(mi, line)) { /* flush any leftover */ @@ -1013,7 +1015,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line) if (mi->input_error) break; - } while (!strbuf_getwholeline(line, mi->input, '\n')); + } flush_inbody_header_accum(mi); on top fixes that. But that still breaks a test that has blank lines at the beginning of the message, before the in-body header. So probably the state-machine needs an extra state: sucking up pre-inbody-header newlines. -Peff
Re: Bug in "git am" when the body starts with spaces
Ok, did a bisect, and this bisects to commit 6b4b013f1884 ("mailinfo: handle in-body header continuations"). The continuation logic is oddly complex, and I can't follow the logic. But it is completely broken in how it thinks empty lines are somehow "continuations". Jonathan? Linus On Fri, Mar 31, 2017 at 5:24 PM, Linus Torvalds wrote: > > I think the reason is that the "header continuation line" logic kicks > in because the lines in the body start with spaces, but that's > entirely incorrect, since > > (a) we're not in an email header > > (b) there's an empty line in between anyway, so no way are those body > lines continuation lines. > > I didn't check how far back this goes, I guess I'll do that next. But > I thought I'd report it here first in case somebody else goes "ahhh".
Bug in "git am" when the body starts with spaces
Try applying the attached patch with git am 0001-Test-patch.patch in the git repository. At least for me, it results in a very odd commit that has one single line in the commit message: Test patch This should go in the body not in the subject line which is obviously bogus. I think the reason is that the "header continuation line" logic kicks in because the lines in the body start with spaces, but that's entirely incorrect, since (a) we're not in an email header (b) there's an empty line in between anyway, so no way are those body lines continuation lines. I didn't check how far back this goes, I guess I'll do that next. But I thought I'd report it here first in case somebody else goes "ahhh". Linus From ad65cf7ba97ac071da1f845ec854165e7bf1efdf Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 31 Mar 2017 17:18:16 -0700 Subject: [PATCH] Test patch example Subject: [PATCH] Test patch This should go in the body not in the subject line --- Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Makefile b/Makefile index 9b36068ac..9f36c149b 100644 --- a/Makefile +++ b/Makefile @@ -1,3 +1,4 @@ + # The default target of this Makefile is... all:: -- 2.12.2.401.g5d4234a49