Re: [PATCH 10/26] Check for EOF while parsing mails
On Fri, Apr 28, 2017 at 03:33:41PM +0200, Johannes Schindelin wrote: > As usual, EOF is defined as -1 in Git for Windows' context, meaning that > we look at the last entry of the sane_ctype array, which returns 0 for any > sane_istest(x,mask) test for x >= 0x80: > > /* Nothing in the 128.. range */ > > So it would appear that it happens to work, but I doubt that it was > intentional. Yeah, that was the same analysis I came to. Even though it works, it is a potential portability problem if a platform defines EOF in a weird way. The "right" thing in such a case is probably checking explicitly for EOF, but I'd hate to add an extra conditional to sane_istest(). It's a fairly hot code path. So I'm fine with punting on that until we see evidence of such a system. > Having said that, it is really curious why Coverity should get confused by > the code and not realize that casting a negative number to (unsigned char) > will make it valid as an index for the sane_ctype array. Yeah, that is the part that confuses me, too. It seems like an easy case to get right. Oh well. If you are tweaking it to handle EOF separately for other reasons, then the Coverity question goes away entirely. -Peff
Re: [PATCH 10/26] Check for EOF while parsing mails
Hi Peff, On Fri, 28 Apr 2017, Jeff King wrote: > On Fri, Apr 28, 2017 at 12:44:52PM +0200, Johannes Schindelin wrote: > > > > Also, what is the behavior of ungetc when we pass it EOF? > > > > According to the documentation, it would cast EOF to an unsigned char and > > push that back. Definitely incorrect. > > > > > It looks like POSIX does what we want (pushing EOF is a noop, and the > > > stream retains its feof() status), but I don't know if there are other > > > implementations to worry about. > > > > That's not what my man page here says: > > > > ungetc() pushes c back to stream, cast to unsigned char, where > > it is available for subsequent read operations. Pushed-back > > characters will be returned in reverse order; only one pushback is > > guaranteed. > > POSIX covers this this case explicitly: > > If the value of c equals that of the macro EOF, the operation shall > fail and the input stream shall be left unchanged. > > That comes straight from C99, which says: > > If the value of c equals that of the macro EOF, the operation fails > and the input stream is unchanged. > > I don't have a copy of C89 handy, but I didn't see any mention of the > behavior in the "changes from the previous edition" section of C99. Yeah. I'd still be more comfortable with being explicit in this case, also because our documentation states explicitly that we do not necessarily live by the POSIX standard. Ciao, Dscho
Re: [PATCH 10/26] Check for EOF while parsing mails
Hi Peff, On Fri, 28 Apr 2017, Jeff King wrote: > On Fri, Apr 28, 2017 at 12:41:02PM +0200, Johannes Schindelin wrote: > > > But then, I guess I misunderstood what Coverity complained about: > > maybe the problem was not so much the isspace() call but that EOF is > > not being handled correctly. We pass it, unchecked, to ungetc(). > > > > It appears that I (or Coverity, if you will), missed another instance > > where we simply passed EOF unchecked to ungetc(). > > I think that is also fine according to the standard. > > Do you happen to have the exact error from Coverity? Wow, that was unnecessarily hard. It is a major hassle to get to any scan other than the latest one. But I did it. Call me tenatious. The report says this: 233do { 2. negative_return_fn: Function mingw_fgetc(f) returns a negative number. 3. var_assign: Assigning: signed variable peek = mingw_fgetc. 234peek = fgetc(f); CID 1049734: Negative array index read (NEGATIVE_RETURNS) 4. negative_returns: Using variable peek as an index to array sane_ctype. 235} while (isspace(peek)); 236ungetc(peek, f); So part of the thing is that we use mingw_fgetc() instead of fgetc(). However, the return value is *still* the one from the "real" fgetc(), even if we intercept what appears to be a Ctrl+C from an interactive console. > I'm wondering if it is complaining about some aspect of our custom > isspace() when used with EOF. That would appear to be the real issue, yes, and I should have double-checked the claim that POSIX isspace() handles EOF properly: we override isspace() with our own version, after all: #define isspace(x) sane_istest(x,GIT_SPACE) where #define sane_istest(x,mask) \ ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) (rewrapped for readability) As usual, EOF is defined as -1 in Git for Windows' context, meaning that we look at the last entry of the sane_ctype array, which returns 0 for any sane_istest(x,mask) test for x >= 0x80: /* Nothing in the 128.. range */ So it would appear that it happens to work, but I doubt that it was intentional. Having said that, it is really curious why Coverity should get confused by the code and not realize that casting a negative number to (unsigned char) will make it valid as an index for the sane_ctype array. I double-checked, and there is no override for the isspace() function in what Coverity calls a "model file" (i.e. pseudo code intended to helping Coverity realize where it can stop reporting false positives). > > The next iteration will have it completely reworked: I no longer guard > > the isspace() behind an `!= EOF` check, but rather handle an early EOF > > as I think it should be handled. Extra eyes very welcome (this is the > > fixup! patch): > > I do think handling EOF explicitly is probably a better strategy anyway, > as it lets us tell when we have an empty patch. I agree, I came to the same conclusion independently. Ciao, Dscho
Re: [PATCH 10/26] Check for EOF while parsing mails
On Fri, Apr 28, 2017 at 12:41:02PM +0200, Johannes Schindelin wrote: > > Why? isspace(EOF) is well-defined. > > So let's look at the man page on Linux: > > These functions check whether c, which must have the value of an > unsigned char or EOF, [...] > > That is the only mention of it. I find it highly unobvious whether EOF > should be treated as a space or not. So let's look at the MSDN page > (https://msdn.microsoft.com/en-us/library/y13z34da.aspx) whether they talk > more about EOF: > > The behavior of isspace and _isspace_l is undefined if c is not > EOF or in the range 0 through 0xFF, inclusive. > > That's it. So I kind of *guess* that EOF is treated as not being a > whitespace character (why does this make me think of politics now? Focus, > Johannes, focus...). But the mathematician in me protests: why would we > be able to decide the character class of a character that does not exist? This behavior actually comes from the C standard. From C99, 7.4: 1 The header declares several functions useful for classifying and mapping characters. 166) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined. It doesn't explicitly say that EOF returns false for these functions, but it does seem implied. For instance: The isspace function tests for any character that is a standard white-space character or is one of a locale-specific set of characters for which isalnum is false. The standard white-space characters are the following: space (' '), form feed ('\f'), new-line ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). In the "C" locale, isspace returns true only for the standard white-space characters. So I think it is a reasonable thing to depend on. But as I said elsewhere in the thread, we don't actually use the standard isspace() anyway. > But then, I guess I misunderstood what Coverity complained about: maybe > the problem was not so much the isspace() call but that EOF is not being > handled correctly. We pass it, unchecked, to ungetc(). > > It appears that I (or Coverity, if you will), missed another instance > where we simply passed EOF unchecked to ungetc(). I think that is also fine according to the standard. Do you happen to have the exact error from Coverity? I'm wondering if it is complaining about some aspect of our custom isspace() when used with EOF. > The next iteration will have it completely reworked: I no longer guard the > isspace() behind an `!= EOF` check, but rather handle an early EOF as I > think it should be handled. Extra eyes very welcome (this is the fixup! > patch): I do think handling EOF explicitly is probably a better strategy anyway, as it lets us tell when we have an empty patch. -Peff
Re: [PATCH 10/26] Check for EOF while parsing mails
On Fri, Apr 28, 2017 at 12:44:52PM +0200, Johannes Schindelin wrote: > > Also, what is the behavior of ungetc when we pass it EOF? > > According to the documentation, it would cast EOF to an unsigned char and > push that back. Definitely incorrect. > > > It looks like POSIX does what we want (pushing EOF is a noop, and the > > stream retains its feof() status), but I don't know if there are other > > implementations to worry about. > > That's not what my man page here says: > > ungetc() pushes c back to stream, cast to unsigned char, where > it is available for subsequent read operations. Pushed-back > characters will be returned in reverse order; only one pushback is > guaranteed. POSIX covers this this case explicitly: If the value of c equals that of the macro EOF, the operation shall fail and the input stream shall be left unchanged. That comes straight from C99, which says: If the value of c equals that of the macro EOF, the operation fails and the input stream is unchanged. I don't have a copy of C89 handy, but I didn't see any mention of the behavior in the "changes from the previous edition" section of C99. So it's possible that there's an implementation that is unhappy with ungetc(EOF), but unless we know of one specifically, it seems pretty safe. Given that and the similar explicit rule for EOF via isspace(), I think the original code actually behaves fine. Of course, we do not use the standard isspace() anyway. Our implementation will cast the EOF to an unsigned char. If it's "-1", that ends up as 255, which matches no classes. But if the platform has an oddball EOF like 288, that would confuse our isspace(). -Peff
Re: [PATCH 10/26] Check for EOF while parsing mails
Hi Peff, On Thu, 27 Apr 2017, Jeff King wrote: > On Wed, Apr 26, 2017 at 10:20:16PM +0200, Johannes Schindelin wrote: > > > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c > > index 30681681c13..c0d88f97512 100644 > > --- a/builtin/mailsplit.c > > +++ b/builtin/mailsplit.c > > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char > > *dir, int allow_bare, > > > > do { > > peek = fgetc(f); > > - } while (isspace(peek)); > > + } while (peek >= 0 && isspace(peek)); > > ungetc(peek, f); > > Are we guaranteed that EOF is a negative number? No, you're right. > Also, what is the behavior of ungetc when we pass it EOF? According to the documentation, it would cast EOF to an unsigned char and push that back. Definitely incorrect. > It looks like POSIX does what we want (pushing EOF is a noop, and the > stream retains its feof() status), but I don't know if there are other > implementations to worry about. That's not what my man page here says: ungetc() pushes c back to stream, cast to unsigned char, where it is available for subsequent read operations. Pushed-back characters will be returned in reverse order; only one pushback is guaranteed. > Perhaps: > > /* soak up whitespace */ > while ((peek = fgetc(f)) != EOF) { > if (!isspace(peek)) { > ungetc(peek, f); > break; > } > } > > would be more portable. True. I changed it slightly differently, please see my reply to Hannes. Thanks, Dscho
Re: [PATCH 10/26] Check for EOF while parsing mails
Hi Hannes, On Thu, 27 Apr 2017, Johannes Sixt wrote: > Am 26.04.2017 um 22:20 schrieb Johannes Schindelin: > > > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c > > index 30681681c13..c0d88f97512 100644 > > --- a/builtin/mailsplit.c > > +++ b/builtin/mailsplit.c > > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, > > int allow_bare, > > > > do { > > peek = fgetc(f); > > - } while (isspace(peek)); > > + } while (peek >= 0 && isspace(peek)); > > ungetc(peek, f); > > > > if (strbuf_getwholeline(, f, '\n')) { > > diff --git a/mailinfo.c b/mailinfo.c > > index 68037758f2f..60dcad7b714 100644 > > --- a/mailinfo.c > > +++ b/mailinfo.c > > @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, > > const char *patch) > > > > do { > > peek = fgetc(mi->input); > > - } while (isspace(peek)); > > + } while (peek >= 0 && isspace(peek)); > > ungetc(peek, mi->input); > > > > /* process the email header */ > > > > Why? isspace(EOF) is well-defined. So let's look at the man page on Linux: These functions check whether c, which must have the value of an unsigned char or EOF, [...] That is the only mention of it. I find it highly unobvious whether EOF should be treated as a space or not. So let's look at the MSDN page (https://msdn.microsoft.com/en-us/library/y13z34da.aspx) whether they talk more about EOF: The behavior of isspace and _isspace_l is undefined if c is not EOF or in the range 0 through 0xFF, inclusive. That's it. So I kind of *guess* that EOF is treated as not being a whitespace character (why does this make me think of politics now? Focus, Johannes, focus...). But the mathematician in me protests: why would we be able to decide the character class of a character that does not exist? Technically, you are correct, of course. The specs of fgetc() specify quite clearly that either an unsigned char cast to an int is returned, or EOF on end-of-file *or error*. And a quick test verifies that isspace(EOF) returns 0. But then, I guess I misunderstood what Coverity complained about: maybe the problem was not so much the isspace() call but that EOF is not being handled correctly. We pass it, unchecked, to ungetc(). It appears that I (or Coverity, if you will), missed another instance where we simply passed EOF unchecked to ungetc(). The next iteration will have it completely reworked: I no longer guard the isspace() behind an `!= EOF` check, but rather handle an early EOF as I think it should be handled. Extra eyes very welcome (this is the fixup! patch): -- snip -- diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index c0d88f97512..9b3efc8e987 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -232,8 +232,9 @@ static int split_mbox(const char *file, const char *dir, int allow_bare, do { peek = fgetc(f); - } while (peek >= 0 && isspace(peek)); - ungetc(peek, f); + } while (isspace(peek)); + if (peek != EOF) + ungetc(peek, f); if (strbuf_getwholeline(, f, '\n')) { /* empty stdin is OK */ diff --git a/mailinfo.c b/mailinfo.c index 60dcad7b714..a319911b510 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -882,7 +882,10 @@ static int read_one_header_line(struct strbuf *line, FILE *in) for (;;) { int peek; - peek = fgetc(in); ungetc(peek, in); + peek = fgetc(in); + if (peek == EOF) + break; + ungetc(peek, in); if (peek != ' ' && peek != '\t') break; if (strbuf_getline_lf(, in)) @@ -1094,14 +1097,18 @@ int mailinfo(struct mailinfo *mi, const char *msg, const char *patch) return -1; } - mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data))); - mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data))); - do { peek = fgetc(mi->input); - } while (peek >= 0 && isspace(peek)); + if (peek == EOF) { + fclose(cmitmsg); + return error("empty patch: '%s'", patch); + } + } while (isspace(peek)); ungetc(peek, mi->input); + mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data))); + mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data))); + /* process the email header */ while (read_one_header_line(, mi->input)) check_header(mi, , mi->p_hdr_data, 1); -- snap -- In the first hunk, I simply rely on the code after ungetc() to figure out that there are no headers and to handle that case as before. The second hunk handles the case where looking for a continuation line in the header section hits EOF; it is still a valid header, but we should avoid ungetc(EOF) to allow the next read to report
Re: [PATCH 10/26] Check for EOF while parsing mails
Hi Junio, On Wed, 26 Apr 2017, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > Reported via Coverity. > > > > Signed-off-by: Johannes Schindelin > > --- > > builtin/mailsplit.c | 2 +- > > mailinfo.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > Good find. I'd retitle with a prefix > > mailinfo & mailsplit: check for EOF while parsing > > so that it is clear that this patch is about lower level machinery > (as oppose to something like "git am"). True. Will be fixed in the next iteration, Dscho
Re: [PATCH 10/26] Check for EOF while parsing mails
On Wed, Apr 26, 2017 at 10:20:16PM +0200, Johannes Schindelin wrote: > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c > index 30681681c13..c0d88f97512 100644 > --- a/builtin/mailsplit.c > +++ b/builtin/mailsplit.c > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, > int allow_bare, > > do { > peek = fgetc(f); > - } while (isspace(peek)); > + } while (peek >= 0 && isspace(peek)); > ungetc(peek, f); Are we guaranteed that EOF is a negative number? Also, what is the behavior of ungetc when we pass it EOF? It looks like POSIX does what we want (pushing EOF is a noop, and the stream retains its feof() status), but I don't know if there are other implementations to worry about. Perhaps: /* soak up whitespace */ while ((peek = fgetc(f)) != EOF) { if (!isspace(peek)) { ungetc(peek, f); break; } } would be more portable. -Peff
Re: [PATCH 10/26] Check for EOF while parsing mails
Am 26.04.2017 um 22:20 schrieb Johannes Schindelin: Reported via Coverity. Signed-off-by: Johannes Schindelin--- builtin/mailsplit.c | 2 +- mailinfo.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 30681681c13..c0d88f97512 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, int allow_bare, do { peek = fgetc(f); - } while (isspace(peek)); + } while (peek >= 0 && isspace(peek)); ungetc(peek, f); if (strbuf_getwholeline(, f, '\n')) { diff --git a/mailinfo.c b/mailinfo.c index 68037758f2f..60dcad7b714 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, const char *patch) do { peek = fgetc(mi->input); - } while (isspace(peek)); + } while (peek >= 0 && isspace(peek)); ungetc(peek, mi->input); /* process the email header */ Why? isspace(EOF) is well-defined. -- Hannes
Re: [PATCH 10/26] Check for EOF while parsing mails
Johannes Schindelinwrites: > Reported via Coverity. > > Signed-off-by: Johannes Schindelin > --- > builtin/mailsplit.c | 2 +- > mailinfo.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) Good find. I'd retitle with a prefix mailinfo & mailsplit: check for EOF while parsing so that it is clear that this patch is about lower level machinery (as oppose to something like "git am"). Thanks. > > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c > index 30681681c13..c0d88f97512 100644 > --- a/builtin/mailsplit.c > +++ b/builtin/mailsplit.c > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, > int allow_bare, > > do { > peek = fgetc(f); > - } while (isspace(peek)); > + } while (peek >= 0 && isspace(peek)); > ungetc(peek, f); > > if (strbuf_getwholeline(, f, '\n')) { > diff --git a/mailinfo.c b/mailinfo.c > index 68037758f2f..60dcad7b714 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, > const char *patch) > > do { > peek = fgetc(mi->input); > - } while (isspace(peek)); > + } while (peek >= 0 && isspace(peek)); > ungetc(peek, mi->input); > > /* process the email header */
[PATCH 10/26] Check for EOF while parsing mails
Reported via Coverity. Signed-off-by: Johannes Schindelin--- builtin/mailsplit.c | 2 +- mailinfo.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 30681681c13..c0d88f97512 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, int allow_bare, do { peek = fgetc(f); - } while (isspace(peek)); + } while (peek >= 0 && isspace(peek)); ungetc(peek, f); if (strbuf_getwholeline(, f, '\n')) { diff --git a/mailinfo.c b/mailinfo.c index 68037758f2f..60dcad7b714 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, const char *patch) do { peek = fgetc(mi->input); - } while (isspace(peek)); + } while (peek >= 0 && isspace(peek)); ungetc(peek, mi->input); /* process the email header */ -- 2.12.2.windows.2.800.gede8f145e06