[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 Martin Sebor changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #10 from Martin Sebor --- I forgot this strndup pitfall that POSIX cautions about in its APPLICATION USAGE and that the warning helps avoid: Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size. Most implementations, including Glibc, only allocate strnlen (s, size) (i.e., less than size if s is shorter). Since the only motivating test case here is strndup and since it turned out that the patch submitted for this report was based on a misunderstanding of the warning (https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591926.html) and didn't do anything for strndup I'm going to resolve this as invalid. If you want to raise problems about the warning for strnlen or strncmp please open separate bugs and attach test cases, preferably from real code. None of those provided by Steve Grubb appears to have anything to do with strnlen or strncmp.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 --- Comment #9 from David Malcolm --- (In reply to Siddhesh Poyarekar from comment #8) > (In reply to Martin Sebor from comment #7) > > Moving warnings into the analyzer and scaling it up to be able to run by > > default, during development, sounds like a good long-term plan. Until that > > That's not quite what I'm suggesting here. I'm not a 100% convinced that > those are the right heuristics at all; the size argument for strnlen, > strndup and strncmp does not intend to describe the size of the passed > strings. It is only recommended security practice that the *n* variant > functions be used instead of their unconstrained relatives to mitigate > overflows. In fact in more common cases the size argument (especially in > case of strnlen and strncmp) may describe a completely different buffer or > some other application-specific property. > > This is different from the -Wformat-overflow, where there is a clear > relationship between buffer, the format string and the string representation > of input numbers and we're only tweaking is the optimism level of the > warnings. So it is not just a question of levels of verosity/paranoia. > > In that context, using size to describe the underlying buffer of the source > only makes sense only for a subset of uses, making this heuristic quite > noisy. So what I'm actually saying is: the heuristic is too noisy but if we > insist on keeping it, it makes sense as an analyzer warning where the user > *chooses* to look for pessimistic scenarios and is more tolerant of noisy > heuristics. Right now -fanalyzer enables all of the various -Wanalyzer-* warnings by default [1], and in theory all of them only emit a diagnostic for the case when the analyzer "thinks" there's a definite problem. There may be bugs in the analyzer, of course. I'm a bit wary of the above sentence, as it suggests that the analyzer should be the place to put noisy diagnostics. Looking at the GCC UX guidelines: https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html note "The user’s attention is important": if a diagnostic is too noisy, the user will either turn it off, or stop paying attention. The distinction I've been making for -fanalyzer is that -fanalyzer enables a more expensive (path-based) analysis of the user's code, and will slow down the user's compile-time, with various warnings tied to that, i.e. I've been messaging it primarily as a compile-time tradeoff for extra warnings that otherwise would be too slow to implement, rather than a signal:noise ratio tradeoff. -fanalyzer can generate false positives, but I've been trying to drive that down via bugfixes (it's also relatively new code) [1] apart from -Wanalyzer-too-complex, but that's more of an implementation detail. > > > happens, rather than gratuitously removing warnings that we've added over > > the years, just because they fall short of the ideal 100% efficacy (as has > > been known and documented), making them easier to control seems like a > > better approach. > > It's not just a matter of efficacy here IMO. The heuristic for strnlen, > strncmp and strndup overreads is too loose for it to be taken seriously.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 --- Comment #8 from Siddhesh Poyarekar --- (In reply to Martin Sebor from comment #7) > Moving warnings into the analyzer and scaling it up to be able to run by > default, during development, sounds like a good long-term plan. Until that That's not quite what I'm suggesting here. I'm not a 100% convinced that those are the right heuristics at all; the size argument for strnlen, strndup and strncmp does not intend to describe the size of the passed strings. It is only recommended security practice that the *n* variant functions be used instead of their unconstrained relatives to mitigate overflows. In fact in more common cases the size argument (especially in case of strnlen and strncmp) may describe a completely different buffer or some other application-specific property. This is different from the -Wformat-overflow, where there is a clear relationship between buffer, the format string and the string representation of input numbers and we're only tweaking is the optimism level of the warnings. So it is not just a question of levels of verosity/paranoia. In that context, using size to describe the underlying buffer of the source only makes sense only for a subset of uses, making this heuristic quite noisy. So what I'm actually saying is: the heuristic is too noisy but if we insist on keeping it, it makes sense as an analyzer warning where the user *chooses* to look for pessimistic scenarios and is more tolerant of noisy heuristics. > happens, rather than gratuitously removing warnings that we've added over > the years, just because they fall short of the ideal 100% efficacy (as has > been known and documented), making them easier to control seems like a > better approach. It's not just a matter of efficacy here IMO. The heuristic for strnlen, strncmp and strndup overreads is too loose for it to be taken seriously.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 --- Comment #7 from Martin Sebor --- Moving warnings into the analyzer and scaling it up to be able to run by default, during development, sounds like a good long-term plan. Until that happens, rather than gratuitously removing warnings that we've added over the years, just because they fall short of the ideal 100% efficacy (as has been known and documented), making them easier to control seems like a better approach.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 --- Comment #6 from Siddhesh Poyarekar --- (In reply to Martin Sebor from comment #5) > It would be useful to separate these warnings into multiple levels: level 1 > for invalid code, and higher levels for suspicious (or pointless) code, > similarly to -Wformat-overflow. I think the analyzer is a great level for the higher level heuristics, with ME warnings only sticking to level 1. Adding levels within ME warnings seems unnecessary. ISTM that users tend to *expect* false positives (to some sane extent) when doing static analysis but are much less tolerant of those during usual builds.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 --- Comment #5 from Martin Sebor --- It would be useful to separate these warnings into multiple levels: level 1 for invalid code, and higher levels for suspicious (or pointless) code, similarly to -Wformat-overflow.
[Bug middle-end/104854] -Wstringop-overread should not warn for strnlen, strndup and strncmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104854 Siddhesh Poyarekar changed: What|Removed |Added Summary|-Wstringop-overread should |-Wstringop-overread should |not warn for strnlen and|not warn for strnlen, |strndup |strndup and strncmp Assignee|unassigned at gcc dot gnu.org |siddhesh at gcc dot gnu.org --- Comment #4 from Siddhesh Poyarekar --- (In reply to Martin Sebor from comment #3) > The same warning is also issued for some calls to memchr(), strncmp() and > probably other built-in functions as well. For example: > > const char a[] = "123"; > > char* f (int i) > { > return __builtin_memchr (a + i, '3', 7); > } > > warning: ‘__builtin_memchr’ specified bound 7 exceeds source size 4 > [-Wstringop-overread] > 5 | return __builtin_memchr (a + i, '3', 7); > | ^~~~ > y.c:1:12: note: source object allocated here > 1 | const char a[] = "123"; > |^ The warning is arguably legitimate (not this one of course, since it's evident at compile time that the operation is safe but for non-const `a` it may not be) for memchr because it operates on an object, not string and will traverse all of the object for a matching char to the extent of the object size. The *string* functions are not the same in that context. > In all these cases the warnings are intentional (i.e., it's a feature, and > so not a regression). Their purpose is to point out what could be a coding > mistake. With strndup(), since the use case for it rather than strdup() is > to copy an initial substring, specifying a bound that's larger than the > length of the string is pointless. > > In general, the GCC manual documents warnings as > > diagnostic messages that report constructions that are not inherently > erroneous but that are risky or suggest there may have been an error. I don't think these strnlen or strndup warnings point to *risky* or potentially erroneous code; at best they point to instances where a specific protection is absent, i.e. the behaviour reduces to strlen and strdup respectively, which is much more benign than what it currently suggests. That kind of warning seems more suited to a static analyzer and not a compiler IMO. Besides, the core cause of a potential overread here is not because the size argument is larger but because the string may not be NULL terminated. It may make more sense to invest in that aspect of risky behaviour than the length for these functions. > So not all instances of any warning should be expected to indicate errors. > In fact, many are known to be benign by design (for example, all instances > of -Wchar-subscripts are benign when -funsigned-char is used; many instance > of -Waddress are also benign, as are many -Wparentheses, etc.). And They're not clubbed in with potentially important warnings though, which is a key distinction. For example, one could mark -Wstirngop-overread as important warnings but not -Wparentheses, but the high noise could make it difficult to actually do so. > although most middle end warnings tend to be designed to trigger only for > invalid statements (i.e., statements that have undefined behavior if reached > during execution), some do also trigger for code that's strictly valid but > suspect. Besides the -Wstringop-overread examples here, other such warnings > include dynamic allocation calls with sizes in excess of PTRDIFF_MAX > (-Walloc-size-larger-than), returning the address of a local variable > (-Wreturn-local-addr), or in GCC 12, storing the address of a local variable > in an extern pointer (-Wdangling-pointer). > > Deciding what code is suspect enough to warn and under what option (-Wall or > -Wextra) is a judgment call; different people have very different ideas. I'm testing a patch to resolve this. I noticed strncmp late but it seems to match this category as well, where a too-long length only reduces the max protection provided by the n-versions of the string functions.