On 10/8/21 11:56 AM, Andrew MacLeod wrote:
On 10/8/21 12:51 PM, Martin Sebor via Gcc-patches wrote:


I.e., in the test:

void g (char *s1, char *s2)
{
  char b[1025];
  size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
  if (n + d + 1 >= 1025)
    return;

  sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).

So the warning operates on the assumption that no info is OK, but improved information causes them to break because it can't figure out what to do with it?

The idea is that input that appears unconstrained might have been
constrained somewhere else that we can't see, but constrained input
suggests it may not be constrained enough.   In the above, pointing
s1 and s2 at arrays same size as b, there's a decent chance that
the strings stored in them could be as long as fits (otherwise why
use such big arrays?) which would overflow the destination.


Does this ever work when there is more than 1 string in the sprintf?  It seems that its the inherent lack of being able to associate an expression with a predicate that is the problem here.  If this is a single string, then an accurate  range should be able to come up with an accurate answer.  But as soon as there is a second string, this is bound to fail unless the strings are known to be 1/2 their size, and likewise if there were 3 strings, 1/3 their size...

Right.  The logic is of course not bulletproof which is why we
integrated the sprintf pass with strlen: to get at the actual
string lengths when they're available instead of relying solely
on the worst case array size approximation.  (The array size
heuristic still applies when we don't have any strlen info.)
Even with the strlen info we don't get full accuracy because
the string lengths may be just lower bounds (e.g., as a result
of memcpy(a, "123", 3), strlen(a) no less than 3 but may be
as long as sizeof a - 1, and the warning uses the upper bound).

This, by the way, isn't just about strings.  It's the same for
numbers:

  sprintf (a, "%i %i", i, j);

will warn if i and j are in some constrained range whose upper
bound would result in overflowing a.


Should we even be attempting to warn for multiple strings if we aren't going to be able to calculate them accurately? It seems like a recipe for a lot of false positives.   And then once we figure out how to combine the range info with the appropriate predicates, turn it back on?

It's been this way since the warning was introduced in GCC 7
and the false positives haven't been too bad (we have just
12 in Bugzilla).  Even with perfect ranges zero false positive
rate isn't achievable with the current design (or any design),
just like we can never come close to zero false negatives.

Every now and then it seems that a three level warning might
have been better than two, with level 1 using an even more
conservative approach.  But the most conservative approach is
next to useless: it would have to assume strings of length
zero (or one), all integers between 0 and 9, and floats have
few fractional digits.  That rarely happens.  It's all based
on judgment calls.

Martin

Reply via email to