Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Jeff King
On Fri, Apr 28, 2017 at 03:33:41PM +0200, Johannes Schindelin wrote:

> As usual, EOF is defined as -1 in Git for Windows' context, meaning that
> we look at the last entry of the sane_ctype array, which returns 0 for any
> sane_istest(x,mask) test for x >= 0x80:
> 
> /* Nothing in the 128.. range */
> 
> So it would appear that it happens to work, but I doubt that it was
> intentional.

Yeah, that was the same analysis I came to. Even though it works, it is
a potential portability problem if a platform defines EOF in a weird
way. The "right" thing in such a case is probably checking explicitly
for EOF, but I'd hate to add an extra conditional to sane_istest(). It's
a fairly hot code path. So I'm fine with punting on that until we see
evidence of such a system.

> Having said that, it is really curious why Coverity should get confused by
> the code and not realize that casting a negative number to (unsigned char)
> will make it valid as an index for the sane_ctype array.

Yeah, that is the part that confuses me, too. It seems like an easy
case to get right.

Oh well. If you are tweaking it to handle EOF separately for other
reasons, then the Coverity question goes away entirely.

-Peff


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Johannes Schindelin
Hi Peff,

On Fri, 28 Apr 2017, Jeff King wrote:

> On Fri, Apr 28, 2017 at 12:44:52PM +0200, Johannes Schindelin wrote:
> 
> > > Also, what is the behavior of ungetc when we pass it EOF?
> > 
> > According to the documentation, it would cast EOF to an unsigned char and
> > push that back. Definitely incorrect.
> > 
> > > It looks like POSIX does what we want (pushing EOF is a noop, and the
> > > stream retains its feof() status), but I don't know if there are other
> > > implementations to worry about.
> > 
> > That's not what my man page here says:
> > 
> > ungetc()  pushes  c  back to stream, cast to unsigned char, where
> > it is available for subsequent read operations.  Pushed-back
> > characters will be returned in reverse order; only one pushback is
> > guaranteed.
> 
> POSIX covers this this case explicitly:
> 
>   If the value of c equals that of the macro EOF, the operation shall
>   fail and the input stream shall be left unchanged.
> 
> That comes straight from C99, which says:
> 
>   If the value of c equals that of the macro EOF, the operation fails
>   and the input stream is unchanged.
> 
> I don't have a copy of C89 handy, but I didn't see any mention of the
> behavior in the "changes from the previous edition" section of C99.

Yeah. I'd still be more comfortable with being explicit in this case, also
because our documentation states explicitly that we do not necessarily
live by the POSIX standard.

Ciao,
Dscho


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Johannes Schindelin
Hi Peff,

On Fri, 28 Apr 2017, Jeff King wrote:

> On Fri, Apr 28, 2017 at 12:41:02PM +0200, Johannes Schindelin wrote:
> 
> > But then, I guess I misunderstood what Coverity complained about:
> > maybe the problem was not so much the isspace() call but that EOF is
> > not being handled correctly. We pass it, unchecked, to ungetc().
> > 
> > It appears that I (or Coverity, if you will), missed another instance
> > where we simply passed EOF unchecked to ungetc().
> 
> I think that is also fine according to the standard.
> 
> Do you happen to have the exact error from Coverity?

Wow, that was unnecessarily hard. It is a major hassle to get to any
scan other than the latest one.

But I did it. Call me tenatious.

The report says this:

233do {
   2. negative_return_fn: Function mingw_fgetc(f) returns a negative number.
   3. var_assign: Assigning: signed variable peek = mingw_fgetc.
234peek = fgetc(f);
   CID 1049734: Negative array index read (NEGATIVE_RETURNS)
   4.  negative_returns: Using variable peek as an index to array sane_ctype.
235} while (isspace(peek));
236ungetc(peek, f);

So part of the thing is that we use mingw_fgetc() instead of fgetc().
However, the return value is *still* the one from the "real" fgetc(), even
if we intercept what appears to be a Ctrl+C from an interactive console.

> I'm wondering if it is complaining about some aspect of our custom
> isspace() when used with EOF.

That would appear to be the real issue, yes, and I should have
double-checked the claim that POSIX isspace() handles EOF properly: we
override isspace() with our own version, after all:

#define isspace(x) sane_istest(x,GIT_SPACE)

where

#define sane_istest(x,mask) \
((sane_ctype[(unsigned char)(x)] & (mask)) != 0)

(rewrapped for readability)

As usual, EOF is defined as -1 in Git for Windows' context, meaning that
we look at the last entry of the sane_ctype array, which returns 0 for any
sane_istest(x,mask) test for x >= 0x80:

/* Nothing in the 128.. range */

So it would appear that it happens to work, but I doubt that it was
intentional.

Having said that, it is really curious why Coverity should get confused by
the code and not realize that casting a negative number to (unsigned char)
will make it valid as an index for the sane_ctype array.

I double-checked, and there is no override for the isspace() function in
what Coverity calls a "model file" (i.e. pseudo code intended to helping
Coverity realize where it can stop reporting false positives).

> > The next iteration will have it completely reworked: I no longer guard
> > the isspace() behind an `!= EOF` check, but rather handle an early EOF
> > as I think it should be handled. Extra eyes very welcome (this is the
> > fixup!  patch):
> 
> I do think handling EOF explicitly is probably a better strategy anyway,
> as it lets us tell when we have an empty patch.

I agree, I came to the same conclusion independently.

Ciao,
Dscho


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Jeff King
On Fri, Apr 28, 2017 at 12:41:02PM +0200, Johannes Schindelin wrote:

> > Why? isspace(EOF) is well-defined.
> 
> So let's look at the man page on Linux:
> 
>   These functions check whether c,  which  must  have  the  value  of  an
>   unsigned char or EOF, [...]
> 
> That is the only mention of it. I find it highly unobvious whether EOF
> should be treated as a space or not. So let's look at the MSDN page
> (https://msdn.microsoft.com/en-us/library/y13z34da.aspx) whether they talk
> more about EOF:
> 
>   The behavior of isspace and _isspace_l is undefined if c is not
>   EOF or in the range 0 through 0xFF, inclusive.
> 
> That's it. So I kind of *guess* that EOF is treated as not being a
> whitespace character (why does this make me think of politics now? Focus,
> Johannes, focus...). But the mathematician in me protests: why would we
> be able to decide the character class of a character that does not exist?

This behavior actually comes from the C standard. From C99, 7.4:

  1 The header  declares several functions useful for
classifying and mapping characters. 166) In all cases the argument
is an int, the value of which shall be representable as an unsigned
char or shall equal the value of the macro EOF. If the argument has
any other value, the behavior is undefined.

It doesn't explicitly say that EOF returns false for these functions,
but it does seem implied. For instance:

  The isspace function tests for any character that is a standard
  white-space character or is one of a locale-specific set of characters
  for which isalnum is false. The standard white-space characters are
  the following: space (' '), form feed ('\f'), new-line ('\n'),
  carriage return ('\r'), horizontal tab ('\t'), and vertical tab
  ('\v'). In the "C" locale, isspace returns true only for the standard
  white-space characters.

So I think it is a reasonable thing to depend on.  But as I said
elsewhere in the thread, we don't actually use the standard isspace()
anyway.

> But then, I guess I misunderstood what Coverity complained about: maybe
> the problem was not so much the isspace() call but that EOF is not being
> handled correctly. We pass it, unchecked, to ungetc().
> 
> It appears that I (or Coverity, if you will), missed another instance
> where we simply passed EOF unchecked to ungetc().

I think that is also fine according to the standard.

Do you happen to have the exact error from Coverity? I'm wondering if it
is complaining about some aspect of our custom isspace() when used with
EOF.

> The next iteration will have it completely reworked: I no longer guard the
> isspace() behind an `!= EOF` check, but rather handle an early EOF as I
> think it should be handled. Extra eyes very welcome (this is the fixup!
> patch):

I do think handling EOF explicitly is probably a better strategy anyway,
as it lets us tell when we have an empty patch.

-Peff


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Jeff King
On Fri, Apr 28, 2017 at 12:44:52PM +0200, Johannes Schindelin wrote:

> > Also, what is the behavior of ungetc when we pass it EOF?
> 
> According to the documentation, it would cast EOF to an unsigned char and
> push that back. Definitely incorrect.
> 
> > It looks like POSIX does what we want (pushing EOF is a noop, and the
> > stream retains its feof() status), but I don't know if there are other
> > implementations to worry about.
> 
> That's not what my man page here says:
> 
>   ungetc()  pushes  c  back to stream, cast to unsigned char, where
>   it is available for subsequent read operations.  Pushed-back
>   characters will be returned in reverse order; only one pushback is
>   guaranteed.

POSIX covers this this case explicitly:

  If the value of c equals that of the macro EOF, the operation shall
  fail and the input stream shall be left unchanged.

That comes straight from C99, which says:

  If the value of c equals that of the macro EOF, the operation fails
  and the input stream is unchanged.

I don't have a copy of C89 handy, but I didn't see any mention of the
behavior in the "changes from the previous edition" section of C99.

So it's possible that there's an implementation that is unhappy with
ungetc(EOF), but unless we know of one specifically, it seems pretty
safe. Given that and the similar explicit rule for EOF via isspace(), I
think the original code actually behaves fine.

Of course, we do not use the standard isspace() anyway. Our
implementation will cast the EOF to an unsigned char. If it's "-1", that
ends up as 255, which matches no classes. But if the platform has an
oddball EOF like 288, that would confuse our isspace().

-Peff


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Johannes Schindelin
Hi Peff,

On Thu, 27 Apr 2017, Jeff King wrote:

> On Wed, Apr 26, 2017 at 10:20:16PM +0200, Johannes Schindelin wrote:
> 
> > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> > index 30681681c13..c0d88f97512 100644
> > --- a/builtin/mailsplit.c
> > +++ b/builtin/mailsplit.c
> > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char 
> > *dir, int allow_bare,
> >  
> > do {
> > peek = fgetc(f);
> > -   } while (isspace(peek));
> > +   } while (peek >= 0 && isspace(peek));
> > ungetc(peek, f);
> 
> Are we guaranteed that EOF is a negative number?

No, you're right.

> Also, what is the behavior of ungetc when we pass it EOF?

According to the documentation, it would cast EOF to an unsigned char and
push that back. Definitely incorrect.

> It looks like POSIX does what we want (pushing EOF is a noop, and the
> stream retains its feof() status), but I don't know if there are other
> implementations to worry about.

That's not what my man page here says:

ungetc()  pushes  c  back to stream, cast to unsigned char, where
it is available for subsequent read operations.  Pushed-back
characters will be returned in reverse order; only one pushback is
guaranteed.

> Perhaps:
> 
>   /* soak up whitespace */
>   while ((peek = fgetc(f)) != EOF) {
>   if (!isspace(peek)) {
>   ungetc(peek, f);
>   break;
>   }
>   }
> 
> would be more portable.

True. I changed it slightly differently, please see my reply to Hannes.

Thanks,
Dscho


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Johannes Schindelin
Hi Hannes,

On Thu, 27 Apr 2017, Johannes Sixt wrote:

> Am 26.04.2017 um 22:20 schrieb Johannes Schindelin:
>
> > diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> > index 30681681c13..c0d88f97512 100644
> > --- a/builtin/mailsplit.c
> > +++ b/builtin/mailsplit.c
> > @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir,
> > int allow_bare,
> >
> >   do {
> > peek = fgetc(f);
> > -   } while (isspace(peek));
> > +   } while (peek >= 0 && isspace(peek));
> >   ungetc(peek, f);
> >
> > if (strbuf_getwholeline(, f, '\n')) {
> > diff --git a/mailinfo.c b/mailinfo.c
> > index 68037758f2f..60dcad7b714 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg,
> > const char *patch)
> >
> >   do {
> > peek = fgetc(mi->input);
> > -   } while (isspace(peek));
> > +   } while (peek >= 0 && isspace(peek));
> >   ungetc(peek, mi->input);
> >
> >   /* process the email header */
> >
> 
> Why? isspace(EOF) is well-defined.

So let's look at the man page on Linux:

These functions check whether c,  which  must  have  the  value  of  an
unsigned char or EOF, [...]

That is the only mention of it. I find it highly unobvious whether EOF
should be treated as a space or not. So let's look at the MSDN page
(https://msdn.microsoft.com/en-us/library/y13z34da.aspx) whether they talk
more about EOF:

The behavior of isspace and _isspace_l is undefined if c is not
EOF or in the range 0 through 0xFF, inclusive.

That's it. So I kind of *guess* that EOF is treated as not being a
whitespace character (why does this make me think of politics now? Focus,
Johannes, focus...). But the mathematician in me protests: why would we
be able to decide the character class of a character that does not exist?

Technically, you are correct, of course. The specs of fgetc() specify
quite clearly that either an unsigned char cast to an int is returned, or
EOF on end-of-file *or error*. And a quick test verifies that isspace(EOF)
returns 0.

But then, I guess I misunderstood what Coverity complained about: maybe
the problem was not so much the isspace() call but that EOF is not being
handled correctly. We pass it, unchecked, to ungetc().

It appears that I (or Coverity, if you will), missed another instance
where we simply passed EOF unchecked to ungetc().

The next iteration will have it completely reworked: I no longer guard the
isspace() behind an `!= EOF` check, but rather handle an early EOF as I
think it should be handled. Extra eyes very welcome (this is the fixup!
patch):

-- snip --
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index c0d88f97512..9b3efc8e987 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -232,8 +232,9 @@ static int split_mbox(const char *file, const char *dir, 
int allow_bare,
 
do {
peek = fgetc(f);
-   } while (peek >= 0 && isspace(peek));
-   ungetc(peek, f);
+   } while (isspace(peek));
+   if (peek != EOF)
+   ungetc(peek, f);
 
if (strbuf_getwholeline(, f, '\n')) {
/* empty stdin is OK */
diff --git a/mailinfo.c b/mailinfo.c
index 60dcad7b714..a319911b510 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -882,7 +882,10 @@ static int read_one_header_line(struct strbuf *line, FILE 
*in)
for (;;) {
int peek;
 
-   peek = fgetc(in); ungetc(peek, in);
+   peek = fgetc(in);
+   if (peek == EOF)
+   break;
+   ungetc(peek, in);
if (peek != ' ' && peek != '\t')
break;
if (strbuf_getline_lf(, in))
@@ -1094,14 +1097,18 @@ int mailinfo(struct mailinfo *mi, const char *msg, 
const char *patch)
return -1;
}
 
-   mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data)));
-   mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data)));
-
do {
peek = fgetc(mi->input);
-   } while (peek >= 0 && isspace(peek));
+   if (peek == EOF) {
+   fclose(cmitmsg);
+   return error("empty patch: '%s'", patch);
+   }
+   } while (isspace(peek));
ungetc(peek, mi->input);
 
+   mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data)));
+   mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data)));
+
/* process the email header */
while (read_one_header_line(, mi->input))
check_header(mi, , mi->p_hdr_data, 1);
-- snap --

In the first hunk, I simply rely on the code after ungetc() to figure out
that there are no headers and to handle that case as before.

The second hunk handles the case where looking for a continuation line in
the header section hits EOF; it is still a valid header, but we should
avoid ungetc(EOF) to allow the next read to report 

Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-28 Thread Johannes Schindelin
Hi Junio,

On Wed, 26 Apr 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Reported via Coverity.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  builtin/mailsplit.c | 2 +-
> >  mailinfo.c  | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Good find.  I'd retitle with a prefix
> 
>   mailinfo & mailsplit: check for EOF while parsing
> 
> so that it is clear that this patch is about lower level machinery
> (as oppose to something like "git am").

True. Will be fixed in the next iteration,
Dscho


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-27 Thread Jeff King
On Wed, Apr 26, 2017 at 10:20:16PM +0200, Johannes Schindelin wrote:

> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 30681681c13..c0d88f97512 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, 
> int allow_bare,
>  
>   do {
>   peek = fgetc(f);
> - } while (isspace(peek));
> + } while (peek >= 0 && isspace(peek));
>   ungetc(peek, f);

Are we guaranteed that EOF is a negative number?  Also, what is the
behavior of ungetc when we pass it EOF?

It looks like POSIX does what we want (pushing EOF is a noop, and the
stream retains its feof() status), but I don't know if there are other
implementations to worry about.

Perhaps:

  /* soak up whitespace */
  while ((peek = fgetc(f)) != EOF) {
if (!isspace(peek)) {
ungetc(peek, f);
break;
}
  }

would be more portable.

-Peff


Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-27 Thread Johannes Sixt

Am 26.04.2017 um 22:20 schrieb Johannes Schindelin:

Reported via Coverity.

Signed-off-by: Johannes Schindelin 
---
 builtin/mailsplit.c | 2 +-
 mailinfo.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 30681681c13..c0d88f97512 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, 
int allow_bare,

do {
peek = fgetc(f);
-   } while (isspace(peek));
+   } while (peek >= 0 && isspace(peek));
ungetc(peek, f);

if (strbuf_getwholeline(, f, '\n')) {
diff --git a/mailinfo.c b/mailinfo.c
index 68037758f2f..60dcad7b714 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, const 
char *patch)

do {
peek = fgetc(mi->input);
-   } while (isspace(peek));
+   } while (peek >= 0 && isspace(peek));
ungetc(peek, mi->input);

/* process the email header */



Why? isspace(EOF) is well-defined.

-- Hannes



Re: [PATCH 10/26] Check for EOF while parsing mails

2017-04-27 Thread Junio C Hamano
Johannes Schindelin  writes:

> Reported via Coverity.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/mailsplit.c | 2 +-
>  mailinfo.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Good find.  I'd retitle with a prefix

mailinfo & mailsplit: check for EOF while parsing

so that it is clear that this patch is about lower level machinery
(as oppose to something like "git am").

Thanks.

>
> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 30681681c13..c0d88f97512 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, 
> int allow_bare,
>  
>   do {
>   peek = fgetc(f);
> - } while (isspace(peek));
> + } while (peek >= 0 && isspace(peek));
>   ungetc(peek, f);
>  
>   if (strbuf_getwholeline(, f, '\n')) {
> diff --git a/mailinfo.c b/mailinfo.c
> index 68037758f2f..60dcad7b714 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, 
> const char *patch)
>  
>   do {
>   peek = fgetc(mi->input);
> - } while (isspace(peek));
> + } while (peek >= 0 && isspace(peek));
>   ungetc(peek, mi->input);
>  
>   /* process the email header */


[PATCH 10/26] Check for EOF while parsing mails

2017-04-26 Thread Johannes Schindelin
Reported via Coverity.

Signed-off-by: Johannes Schindelin 
---
 builtin/mailsplit.c | 2 +-
 mailinfo.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 30681681c13..c0d88f97512 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -232,7 +232,7 @@ static int split_mbox(const char *file, const char *dir, 
int allow_bare,
 
do {
peek = fgetc(f);
-   } while (isspace(peek));
+   } while (peek >= 0 && isspace(peek));
ungetc(peek, f);
 
if (strbuf_getwholeline(, f, '\n')) {
diff --git a/mailinfo.c b/mailinfo.c
index 68037758f2f..60dcad7b714 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -1099,7 +1099,7 @@ int mailinfo(struct mailinfo *mi, const char *msg, const 
char *patch)
 
do {
peek = fgetc(mi->input);
-   } while (isspace(peek));
+   } while (peek >= 0 && isspace(peek));
ungetc(peek, mi->input);
 
/* process the email header */
-- 
2.12.2.windows.2.800.gede8f145e06