[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-18 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from Martin Sebor  ---
I forgot this strndup pitfall that POSIX cautions about in its APPLICATION
USAGE and that the warning helps avoid:

  Implementations are free to malloc() a buffer containing either (size + 1)
bytes or (strnlen(s, size) + 1) bytes.  Applications should not assume that
strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size.

Most implementations, including Glibc, only allocate strnlen (s, size) (i.e.,
less than size if s is shorter).

Since the only motivating test case here is strndup and since it turned out
that the patch submitted for this report was based on a misunderstanding of the
warning (https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591926.html) and
didn't do anything for strndup I'm going to resolve this as invalid.

If you want to raise problems about the warning for strnlen or strncmp please
open separate bugs and attach test cases, preferably from real code.  None of
those provided by Steve Grubb appears to have anything to do with strnlen or
strncmp.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-17 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

--- Comment #9 from David Malcolm  ---
(In reply to Siddhesh Poyarekar from comment #8)
> (In reply to Martin Sebor from comment #7)
> > Moving warnings into the analyzer and scaling it up to be able to run by
> > default, during development, sounds like a good long-term plan.  Until that
> 
> That's not quite what I'm suggesting here.  I'm not a 100% convinced that
> those are the right heuristics at all; the size argument for strnlen,
> strndup and strncmp does not intend to describe the size of the passed
> strings.  It is only recommended security practice that the *n* variant
> functions be used instead of their unconstrained relatives to mitigate
> overflows.  In fact in more common cases the size argument (especially in
> case of strnlen and strncmp) may describe a completely different buffer or
> some other application-specific property.
> 
> This is different from the -Wformat-overflow, where there is a clear
> relationship between buffer, the format string and the string representation
> of input numbers and we're only tweaking is the optimism level of the
> warnings.  So it is not just a question of levels of verosity/paranoia.
> 
> In that context, using size to describe the underlying buffer of the source
> only makes sense only for a subset of uses, making this heuristic quite
> noisy.  So what I'm actually saying is: the heuristic is too noisy but if we
> insist on keeping it, it makes sense as an analyzer warning where the user
> *chooses* to look for pessimistic scenarios and is more tolerant of noisy
> heuristics.

Right now -fanalyzer enables all of the various -Wanalyzer-* warnings by
default [1], and in theory all of them only emit a diagnostic for the case when
the analyzer "thinks" there's a definite problem.  There may be bugs in the
analyzer, of course.  I'm a bit wary of the above sentence, as it suggests that
the analyzer should be the place to put noisy diagnostics.

Looking at the GCC UX guidelines:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html
note "The user’s attention is important": if a diagnostic is too noisy, the
user will either turn it off, or stop paying attention.

The distinction I've been making for -fanalyzer is that -fanalyzer enables a
more expensive (path-based) analysis of the user's code, and will slow down the
user's compile-time, with various warnings tied to that, i.e. I've been
messaging it primarily as a compile-time tradeoff for extra warnings that
otherwise would be too slow to implement, rather than a signal:noise ratio
tradeoff.  -fanalyzer can generate false positives, but I've been trying to
drive that down via bugfixes (it's also relatively new code)

[1] apart from -Wanalyzer-too-complex, but that's more of an implementation
detail.

> 
> > happens, rather than gratuitously removing warnings that we've added over
> > the years, just because they fall short of the ideal 100% efficacy (as has
> > been known and documented), making them easier to control seems like a
> > better approach.
> 
> It's not just a matter of efficacy here IMO.  The heuristic for strnlen,
> strncmp and strndup overreads is too loose for it to be taken seriously.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-14 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

--- Comment #8 from Siddhesh Poyarekar  ---
(In reply to Martin Sebor from comment #7)
> Moving warnings into the analyzer and scaling it up to be able to run by
> default, during development, sounds like a good long-term plan.  Until that

That's not quite what I'm suggesting here.  I'm not a 100% convinced that those
are the right heuristics at all; the size argument for strnlen, strndup and
strncmp does not intend to describe the size of the passed strings.  It is only
recommended security practice that the *n* variant functions be used instead of
their unconstrained relatives to mitigate overflows.  In fact in more common
cases the size argument (especially in case of strnlen and strncmp) may
describe a completely different buffer or some other application-specific
property.

This is different from the -Wformat-overflow, where there is a clear
relationship between buffer, the format string and the string representation of
input numbers and we're only tweaking is the optimism level of the warnings. 
So it is not just a question of levels of verosity/paranoia.

In that context, using size to describe the underlying buffer of the source
only makes sense only for a subset of uses, making this heuristic quite noisy. 
So what I'm actually saying is: the heuristic is too noisy but if we insist on
keeping it, it makes sense as an analyzer warning where the user *chooses* to
look for pessimistic scenarios and is more tolerant of noisy heuristics.

> happens, rather than gratuitously removing warnings that we've added over
> the years, just because they fall short of the ideal 100% efficacy (as has
> been known and documented), making them easier to control seems like a
> better approach.

It's not just a matter of efficacy here IMO.  The heuristic for strnlen,
strncmp and strndup overreads is too loose for it to be taken seriously.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-14 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

--- Comment #7 from Martin Sebor  ---
Moving warnings into the analyzer and scaling it up to be able to run by
default, during development, sounds like a good long-term plan.  Until that
happens, rather than gratuitously removing warnings that we've added over the
years, just because they fall short of the ideal 100% efficacy (as has been
known and documented), making them easier to control seems like a better
approach.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-14 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

--- Comment #6 from Siddhesh Poyarekar  ---
(In reply to Martin Sebor from comment #5)
> It would be useful to separate these warnings into multiple levels: level 1
> for invalid code, and higher levels for suspicious (or pointless) code,
> similarly to -Wformat-overflow.

I think the analyzer is a great level for the higher level heuristics, with ME
warnings only sticking to level 1. Adding levels within ME warnings seems
unnecessary.  ISTM that users tend to *expect* false positives (to some sane
extent) when doing static analysis but are much less tolerant of those during
usual builds.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-14 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

--- Comment #5 from Martin Sebor  ---
It would be useful to separate these warnings into multiple levels: level 1 for
invalid code, and higher levels for suspicious (or pointless) code, similarly
to -Wformat-overflow.

[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp

2022-03-09 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854

Siddhesh Poyarekar  changed:

   What|Removed |Added

Summary|-Wstringop-overread should  |-Wstringop-overread should
   |not warn for strnlen and|not warn for strnlen,
   |strndup |strndup and strncmp
   Assignee|unassigned at gcc dot gnu.org  |siddhesh at gcc dot 
gnu.org

--- Comment #4 from Siddhesh Poyarekar  ---
(In reply to Martin Sebor from comment #3)
> The same warning is also issued for some calls to memchr(), strncmp() and
> probably other built-in functions as well.  For example:
> 
>   const char a[] = "123";
> 
>   char* f (int i)
>   {
> return __builtin_memchr (a + i, '3', 7);
>   }
> 
>   warning: ‘__builtin_memchr’ specified bound 7 exceeds source size 4
> [-Wstringop-overread]
>   5 |   return __builtin_memchr (a + i, '3', 7);
> |  ^~~~
>   y.c:1:12: note: source object allocated here
>   1 | const char a[] = "123";
> |^

The warning is arguably legitimate (not this one of course, since it's evident
at compile time that the operation is safe but for non-const `a` it may not be)
for memchr because it operates on an object, not string and will traverse all
of the object for a matching char to the extent of the object size.  The
*string* functions are not the same in that context.

> In all these cases the warnings are intentional (i.e., it's a feature, and
> so not a regression).  Their purpose is to point out what could be a coding
> mistake.  With strndup(), since the use case for it rather than strdup() is
> to copy an initial substring, specifying a bound that's larger than the
> length of the string is pointless.
> 
> In general, the GCC manual documents warnings as
> 
>   diagnostic messages that report constructions that are not inherently
> erroneous but that are risky or suggest there may have been an error. 

I don't think these strnlen or strndup warnings point to *risky* or potentially
erroneous code; at best they point to instances where a specific protection is
absent, i.e. the behaviour reduces to strlen and strdup respectively, which is
much more benign than what it currently suggests.  That kind of warning seems
more suited to a static analyzer and not a compiler IMO.

Besides, the core cause of a potential overread here is not because the size
argument is larger but because the string may not be NULL terminated.  It may
make more sense to invest in that aspect of risky behaviour than the length for
these functions.

> So not all instances of any warning should be expected to indicate errors. 
> In fact, many are known to be benign by design (for example, all instances
> of -Wchar-subscripts are benign when -funsigned-char is used; many instance
> of -Waddress are also benign, as are many -Wparentheses, etc.).  And

They're not clubbed in with potentially important warnings though, which is a
key distinction.  For example, one could mark -Wstirngop-overread as important
warnings but not -Wparentheses, but the high noise could make it difficult to
actually do so.

> although most middle end warnings tend to be designed to trigger only for
> invalid statements (i.e., statements that have undefined behavior if reached
> during execution), some do also trigger for code that's strictly valid but
> suspect.  Besides the -Wstringop-overread examples here, other such warnings
> include dynamic allocation calls with sizes in excess of PTRDIFF_MAX
> (-Walloc-size-larger-than), returning the address of a local variable
> (-Wreturn-local-addr), or in GCC 12, storing the address of a local variable
> in an extern pointer (-Wdangling-pointer).
> 
> Deciding what code is suspect enough to warn and under what option (-Wall or
> -Wextra) is a judgment call; different people have very different ideas.

I'm testing a patch to resolve this.  I noticed strncmp late but it seems to
match this category as well, where a too-long length only reduces the max
protection provided by the n-versions of the string functions.