On 08/03/2018 01:00 AM, Jeff Law wrote:
On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
On 07/24/18 23:46, Jeff Law wrote:
On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
Hi!

This patch makes strlen range computations more conservative.

Firstly if there is a visible type cast from type A to B before passing
then value to strlen, don't expect the type layout of B to restrict the
possible return value range of strlen.
Why do you think this is the right thing to do?  ie, is there language
in the standards that makes you think the code as it stands today is
incorrect from a conformance standpoint?  Is there a significant body of
code that is affected in an adverse way by the current code?  If so,
what code?



I think if you have an object, of an effective type A say char[100], then
you can cast the address of A to B, say typedef char (*B)[2] for instance
and then to const char *, say for use in strlen.  I may be wrong, but I think
that we should at least try not to pick up char[2] from B, but instead
use A for strlen ranges, or leave this range open.  Currently the range
info for strlen is [0..1] in this case, even if we see the type cast
in the generic tree.
ISTM that you're essentially saying that the cast to const char *
destroys any type information we can exploit here.  But if that's the
case, then I don't think we can even derive a range of [0,99].  What's
to say that "A" didn't result from a similar cast of some object that
was char[200] that happened out of the scope of what we could see during
the strlen range computation?

If that is what you're arguing, then I think there's a re-evaluation
that needs to happen WRT strlen range computation/

And just to be clear, I do see this as a significant correctness question.

Martin, thoughts?

The argument is that given:

  struct S { char a[4], b; };

  char a[8] = "1234567";

this is valid and should pass:

  __attribute__ ((noipa))
  void f (struct S *p)
  {
    assert (7 == strlen (p->a));
  }

  int main (void)
  {
    f ((struct S*)a);
  }

(This is the basic premise behind pr86259.)

This argument is wrong and the code is invalid.  For the access
to p->a to be valid p must point to an object of struct S (it
doesn't) and the p->a array must hold a nul-terminated string
(it also doesn't).

This should not be surprising because the following equivalent
code behaves the same way:

  __attribute__ ((noipa))
  void f (struct S *p)
  {
    int n = 0;
    for (int i = 0; p->a[i]; ++i)
      ++n;
    if (3 != n)
      __builtin_abort ();
  }

and also because for write accesses, GCC (helpfully) enforces
the restriction with _FORTIFY_SOURCE=2:

  __attribute__ ((noipa))
  void f (struct S *p)
  {
    strcpy (p->a, "1234");   // aborts
  }

I care less about the optimization than I do about the basic
premise that it's essential to respect subobject boundaries(*).
It would make little sense to undo the strlen optimization
without also undoing the optimization for the direct array
access case.  Undoing either would raise the question about
the validity of the _FORRTIFY_SOURCE=2 behavior.  That would
be a huge step backwards in terms of code security.  If we
did some of these but not others, it would make the behavior
inconsistent and surprising, all to accommodate one instance
of invalid code.

If we had a valid test case where the strlen optimization
leads to invalid code, or even if there were a significant
number of bug reports showing that it breaks an invalid
but common idiom, I would certainly feel compelled to
make it right somehow.  But there has been just one bug
report with clearly invalid code that should be easily
corrected.

Martin

[*] I also care deeply about all the warnings that depend
on it to avoid false positives: that's pretty much all those
I have implemented in the middle-end: -Wformat-{overflow,
truncation}, -Wstringop-{overflow,truncation}, and likely
even -Wrestrict.

Reply via email to