Re: [PATCH] Convert strlen pass from evrp to ranger.

Martin Sebor via Gcc-patches Fri, 08 Oct 2021 13:28:02 -0700

On 10/8/21 11:56 AM, Andrew MacLeod wrote:

On 10/8/21 12:51 PM, Martin Sebor via Gcc-patches wrote:



I.e., in the test:

void g (char *s1, char *s2)
{
  char b[1025];
  size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
  if (n + d + 1 >= 1025)
    return;

  sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).

So the warning operates on the assumption that no info is OK, butimproved information causes them to break because it can't figure outwhat to do with it?


The idea is that input that appears unconstrained might have been
constrained somewhere else that we can't see, but constrained input
suggests it may not be constrained enough.   In the above, pointing
s1 and s2 at arrays same size as b, there's a decent chance that
the strings stored in them could be as long as fits (otherwise why
use such big arrays?) which would overflow the destination.

Does this ever work when there is more than 1 string in the sprintf? Itseems that its the inherent lack of being able to associate anexpression with a predicate that is the problem here. If this is asingle string, then an accurate range should be able to come up with anaccurate answer. But as soon as there is a second string, this is boundto fail unless the strings are known to be 1/2 their size, and likewiseif there were 3 strings, 1/3 their size...


Right.  The logic is of course not bulletproof which is why we
integrated the sprintf pass with strlen: to get at the actual
string lengths when they're available instead of relying solely
on the worst case array size approximation.  (The array size
heuristic still applies when we don't have any strlen info.)
Even with the strlen info we don't get full accuracy because
the string lengths may be just lower bounds (e.g., as a result
of memcpy(a, "123", 3), strlen(a) no less than 3 but may be
as long as sizeof a - 1, and the warning uses the upper bound).

This, by the way, isn't just about strings.  It's the same for
numbers:

  sprintf (a, "%i %i", i, j);

will warn if i and j are in some constrained range whose upper
bound would result in overflowing a.

Should we even be attempting to warn for multiple strings if we aren'tgoing to be able to calculate them accurately? It seems like a recipefor a lot of false positives. And then once we figure out how tocombine the range info with the appropriate predicates, turn it back on?


It's been this way since the warning was introduced in GCC 7
and the false positives haven't been too bad (we have just
12 in Bugzilla).  Even with perfect ranges zero false positive
rate isn't achievable with the current design (or any design),
just like we can never come close to zero false negatives.

Every now and then it seems that a three level warning might
have been better than two, with level 1 using an even more
conservative approach.  But the most conservative approach is
next to useless: it would have to assume strings of length
zero (or one), all integers between 0 and 9, and floats have
few fractional digits.  That rarely happens.  It's all based
on judgment calls.

Martin

Re: [PATCH] Convert strlen pass from evrp to ranger.

Reply via email to