On Fri, Apr 28, 2017 at 12:41:02PM +0200, Johannes Schindelin wrote:

> > Why? isspace(EOF) is well-defined.
> 
> So let's look at the man page on Linux:
> 
>       These functions check whether c,  which  must  have  the  value  of  an
>       unsigned char or EOF, [...]
> 
> That is the only mention of it. I find it highly unobvious whether EOF
> should be treated as a space or not. So let's look at the MSDN page
> (https://msdn.microsoft.com/en-us/library/y13z34da.aspx) whether they talk
> more about EOF:
> 
>       The behavior of isspace and _isspace_l is undefined if c is not
>       EOF or in the range 0 through 0xFF, inclusive.
> 
> That's it. So I kind of *guess* that EOF is treated as not being a
> whitespace character (why does this make me think of politics now? Focus,
> Johannes, focus...). But the mathematician in me protests: why would we
> be able to decide the character class of a character that does not exist?

This behavior actually comes from the C standard. From C99, 7.4:

  1 The header <ctype.h> declares several functions useful for
    classifying and mapping characters. 166) In all cases the argument
    is an int, the value of which shall be representable as an unsigned
    char or shall equal the value of the macro EOF. If the argument has
    any other value, the behavior is undefined.

It doesn't explicitly say that EOF returns false for these functions,
but it does seem implied. For instance:

  The isspace function tests for any character that is a standard
  white-space character or is one of a locale-specific set of characters
  for which isalnum is false. The standard white-space characters are
  the following: space (' '), form feed ('\f'), new-line ('\n'),
  carriage return ('\r'), horizontal tab ('\t'), and vertical tab
  ('\v'). In the "C" locale, isspace returns true only for the standard
  white-space characters.

So I think it is a reasonable thing to depend on.  But as I said
elsewhere in the thread, we don't actually use the standard isspace()
anyway.

> But then, I guess I misunderstood what Coverity complained about: maybe
> the problem was not so much the isspace() call but that EOF is not being
> handled correctly. We pass it, unchecked, to ungetc().
> 
> It appears that I (or Coverity, if you will), missed another instance
> where we simply passed EOF unchecked to ungetc().

I think that is also fine according to the standard.

Do you happen to have the exact error from Coverity? I'm wondering if it
is complaining about some aspect of our custom isspace() when used with
EOF.

> The next iteration will have it completely reworked: I no longer guard the
> isspace() behind an `!= EOF` check, but rather handle an early EOF as I
> think it should be handled. Extra eyes very welcome (this is the fixup!
> patch):

I do think handling EOF explicitly is probably a better strategy anyway,
as it lets us tell when we have an empty patch.

-Peff

Reply via email to