Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-07-10 Thread Martin Sebor

On 7/2/19 2:59 PM, Jeff Law wrote:

On 6/30/19 3:50 PM, Martin Sebor wrote:

On 6/26/19 6:11 PM, Jeff Law wrote:

[ Another big snip ]



Is there a strong need for an overloaded operator?  Our guidelines
generally discourage operator overloads.  This one seems on the border
to me.  Others may have different ideas where the line is.  If we really
don't need the overloaded operator, then we can avoid the controversy
completely.


The copy assignment operator is necessary for the type to be stored
in a hash_map.  Otherwise, the implicitly generated assignment will
be used instead which is unsafe in vec and results in memory
corription (I opened bug 90904 for this).  After working around this
bug by providing the assignment operator I then ran into the hash_map
bug 90959 that also results in memory corruption.  I spent a few fun
hours tracking down the GCC miscompilations that they led to.

OK.  THanks for explaining why we need the copy assignment operator.




The golden rule in C++ is that every type should either be safe to
copy and assign with the expected semantics or made not assignable
and not copyable by declaring the copy ctor and copy assignment
operator private (or deleted in more modern C++).  It would be very
helpful to detect this kind of problem (classes that aren't safely
assignable and copyable because they acquire/release resources in
ctors/dtors).  Not just to us in GCC but I'm sure to others as well.
It's something I've been hoping to implement for a while now.

You've got my blessing to do that :-)



So how do you deal with the case where one operand of a
{MIN,MAX,COND}_EXPR is the address of a local, but the other is not?  It
seems like we'll end up returning true from this function in that case
and potentially trigger changing the return value to NULL.  Or am I
missing something?


I only briefly considered MIN/MAX but because in C and C++ the less
than and greater than expressions are only defined for pointers into
the same array/object or subobject and I didn't ponder it too hard.
(In C they're undefined otherwise, in C++ the result is unspecified.)

But you bring it up because you think the address should be returned
regardless, even if the expression is invalid (or if it's COND_EXPR
with mixed local/nonlocal addresses) so I've changed it to do that
for all these explicitly handled codes and added tests to verify it.

THanks.

So in the thread for strlen/sprintf the issue around unbounded walks up
through PHI argument's SSA_NAME_DEF_STMT was raised.

I think we've got two options here.

One would be to hold this patch until we sort out what the bounds should
look like, then adjust this patch to do something similar.

Or go forward with this patch, then come back and adjust the walker once
we have settled on solution in the other thread.

I'm OK with either.  Your choice.


I committed the patch as is until the question of which algorithms
to put the limit on is answered (as discussed in the thread Re:
sprintf/strlen integration).

Martin


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-07-02 Thread Jeff Law
On 6/30/19 3:50 PM, Martin Sebor wrote:
> On 6/26/19 6:11 PM, Jeff Law wrote:
[ Another big snip ]
> 
>> Is there a strong need for an overloaded operator?  Our guidelines
>> generally discourage operator overloads.  This one seems on the border
>> to me.  Others may have different ideas where the line is.  If we really
>> don't need the overloaded operator, then we can avoid the controversy
>> completely.
> 
> The copy assignment operator is necessary for the type to be stored
> in a hash_map.  Otherwise, the implicitly generated assignment will
> be used instead which is unsafe in vec and results in memory
> corription (I opened bug 90904 for this).  After working around this
> bug by providing the assignment operator I then ran into the hash_map
> bug 90959 that also results in memory corruption.  I spent a few fun
> hours tracking down the GCC miscompilations that they led to.
OK.  THanks for explaining why we need the copy assignment operator.


> 
> The golden rule in C++ is that every type should either be safe to
> copy and assign with the expected semantics or made not assignable
> and not copyable by declaring the copy ctor and copy assignment
> operator private (or deleted in more modern C++).  It would be very
> helpful to detect this kind of problem (classes that aren't safely
> assignable and copyable because they acquire/release resources in
> ctors/dtors).  Not just to us in GCC but I'm sure to others as well.
> It's something I've been hoping to implement for a while now.
You've got my blessing to do that :-)

>>
>> So how do you deal with the case where one operand of a
>> {MIN,MAX,COND}_EXPR is the address of a local, but the other is not?  It
>> seems like we'll end up returning true from this function in that case
>> and potentially trigger changing the return value to NULL.  Or am I
>> missing something?
> 
> I only briefly considered MIN/MAX but because in C and C++ the less
> than and greater than expressions are only defined for pointers into
> the same array/object or subobject and I didn't ponder it too hard.
> (In C they're undefined otherwise, in C++ the result is unspecified.)
> 
> But you bring it up because you think the address should be returned
> regardless, even if the expression is invalid (or if it's COND_EXPR
> with mixed local/nonlocal addresses) so I've changed it to do that
> for all these explicitly handled codes and added tests to verify it.
THanks.

So in the thread for strlen/sprintf the issue around unbounded walks up
through PHI argument's SSA_NAME_DEF_STMT was raised.

I think we've got two options here.

One would be to hold this patch until we sort out what the bounds should
look like, then adjust this patch to do something similar.

Or go forward with this patch, then come back and adjust the walker once
we have settled on solution in the other thread.

I'm OK with either.  Your choice.

jeff


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-30 Thread Martin Sebor

On 6/26/19 6:11 PM, Jeff Law wrote:

On 6/18/19 9:19 PM, Martin Sebor wrote:

On 6/14/19 2:59 PM, Jeff Law wrote:

[ big snip ]

A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just shoved a quick assert into optimize_stmt to abort if we
encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
instantly building libgcc :-)

COmpile the attached code with -O2 -m32.


I wasn't saying it's not valid Gimple, just that I hadn't seen it
come up here despite compiling Glibc and the kernel with the patched
GCC.  The only codes I saw are these:

   addr_expr, array_ref, bit_and_expr, component_ref, max_expr,
   mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name,
   and var_decl

The only one here that's really surprising is the MAX_EXPR.  But it is
what it is.



What I was asking for is a test case that makes COND_EXPR come up
in this context.  But I managed to find one by triggering the ICE
with GDB.  The file reduced to the following test case:

Sorry.  email can be a tough medium to nail down specific details.



   extern struct S s;   // S has to be an incomplete type

   void* f (int n)
   {
 void *p;
 int x = 0;

 for (int i = n; i >= 0; i--)
   {
 p = 
 if (p == (void*)-1)
    x = 1;
 else if (p)
    return p;
   }

 return x ? (void*)-1 : 0;
   }

and the dump:

    [local count: 59055800]:
   # x_10 = PHI <1(5), 0(2)>
   _5 = x_10 != 0 ? -1B : 0B;

    [local count: 114863532]:
   # _3 = PHI <(4), _5(6), (3)>
   return _3;

It seems a little odd that the COND_EXPR disappears when either
of the constant addresses becomes the address of an object (or
the result of alloca), and also when the number of iterations
of the loop is constant.  That's probably why it so rarely comes
up in this context.

Going into phiopt2 we have:

;;   basic block 6, loop depth 0
;;pred:   5
   if (x_1 != 0)
 goto ; [71.00%]
   else
 goto ; [29.00%]
;;succ:   8
;;7

;;   basic block 7, loop depth 0
;;pred:   6
;;succ:   8

;;   basic block 8, loop depth 0
;;pred:   3
;;7
;;6
   # _3 = PHI <(3), 0B(7), -1B(6)>
   return _3;

The subgraph starting at block #6 is a classic case for turning branchy
code into straightline code using a COND_EXPR on the RHS of an
assignment.  So you end up with something like this:

;;   basic block 6, loop depth 0
;;pred:   5
   _5 = x_1 != 0 ? -1B : 0B;
;;succ:   7

;;   basic block 7, loop depth 0
;;pred:   3
;;6
   # _3 = PHI <(3), _5(6)>
   return _3;


Now for this specific case within phiopt we are limited to cases there
the result is 0/1 or 0/-1.  That's why you get something different when
you exchange one of the constants for the address of an object, or
anything else for that matter.

This is all a bit academic -- the key point is that we can have a
COND_EXPR on the RHS of an assignment.  That's allowed by gimple.

Sadly this is also likely one of the places where target characteristics
come into play -- targets define a BRANCH_COST which can significantly
change the decisions for the initial generation of conditionals.  It's
one of the things that makes writing  tests for jump threading, if
conversion and other optimizations so damn painful -- on one target
we'll have a series of conditional jumps, on anothers we may have a
series of logicals, potentially with COND_EXPRs.




That said, even though I've seen a few references to COND_EXPR
in the midle-end I have been assuming that they, in general, do
get transformed into PHIs.  But checking the internals manual
I found in Section 12.6.3 this:

   A C ?: expression is converted into an if statement with each
   branch assigning to the same temporary. ... The GIMPLE level
   if-conversion pass re-introduces ?: expression, if appropriate.
   It is used to vectorize loops with conditions using vector
   conditional operations.

This GDB test case is likely the result of this reintroduction.

Nope.  It happens much earlier in the pipeline :-)




And in a more general sense, this kind of permissiveness is not future
proof.  What happens if someone needs to add another EXPR node that is
valid on the RHS where such recursion is undesirable?  How are they
supposed to know that we've got this permissive recursive call and that
it's going to do the wrong thing?  And if it's an EXPR node with no
arguments, then we're going to do a read out of the bounds of the object
and all bets are off at that point (yes we have zero operand EXPR nodes,
but thankfully I don't think they should up in the contexts you care about).



Sure.  When things are hardcoded 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-26 Thread Jeff Law
On 6/18/19 9:19 PM, Martin Sebor wrote:
> On 6/14/19 2:59 PM, Jeff Law wrote:
[ big snip ]
>> A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
>> need to consider here -- what is and what is not valid gimple.  And its
>> more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
>> standard practice for if-conversion.
>>
>> Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
>> well, I just shoved a quick assert into optimize_stmt to abort if we
>> encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
>> instantly building libgcc :-)
>>
>> COmpile the attached code with -O2 -m32.
> 
> I wasn't saying it's not valid Gimple, just that I hadn't seen it
> come up here despite compiling Glibc and the kernel with the patched
> GCC.  The only codes I saw are these:
> 
>   addr_expr, array_ref, bit_and_expr, component_ref, max_expr,
>   mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name,
>   and var_decl
The only one here that's really surprising is the MAX_EXPR.  But it is
what it is.

> 
> What I was asking for is a test case that makes COND_EXPR come up
> in this context.  But I managed to find one by triggering the ICE
> with GDB.  The file reduced to the following test case:
Sorry.  email can be a tough medium to nail down specific details.

> 
>   extern struct S s;   // S has to be an incomplete type
> 
>   void* f (int n)
>   {
> void *p;
> int x = 0;
> 
> for (int i = n; i >= 0; i--)
>   {
> p = 
> if (p == (void*)-1)
>    x = 1;
> else if (p)
>    return p;
>   }
> 
> return x ? (void*)-1 : 0;
>   }
> 
> and the dump:
> 
>    [local count: 59055800]:
>   # x_10 = PHI <1(5), 0(2)>
>   _5 = x_10 != 0 ? -1B : 0B;
> 
>    [local count: 114863532]:
>   # _3 = PHI <(4), _5(6), (3)>
>   return _3;
> 
> It seems a little odd that the COND_EXPR disappears when either
> of the constant addresses becomes the address of an object (or
> the result of alloca), and also when the number of iterations
> of the loop is constant.  That's probably why it so rarely comes
> up in this context.
Going into phiopt2 we have:

;;   basic block 6, loop depth 0
;;pred:   5
  if (x_1 != 0)
goto ; [71.00%]
  else
goto ; [29.00%]
;;succ:   8
;;7

;;   basic block 7, loop depth 0
;;pred:   6
;;succ:   8

;;   basic block 8, loop depth 0
;;pred:   3
;;7
;;6
  # _3 = PHI <(3), 0B(7), -1B(6)>
  return _3;

The subgraph starting at block #6 is a classic case for turning branchy
code into straightline code using a COND_EXPR on the RHS of an
assignment.  So you end up with something like this:

;;   basic block 6, loop depth 0
;;pred:   5
  _5 = x_1 != 0 ? -1B : 0B;
;;succ:   7

;;   basic block 7, loop depth 0
;;pred:   3
;;6
  # _3 = PHI <(3), _5(6)>
  return _3;


Now for this specific case within phiopt we are limited to cases there
the result is 0/1 or 0/-1.  That's why you get something different when
you exchange one of the constants for the address of an object, or
anything else for that matter.

This is all a bit academic -- the key point is that we can have a
COND_EXPR on the RHS of an assignment.  That's allowed by gimple.

Sadly this is also likely one of the places where target characteristics
come into play -- targets define a BRANCH_COST which can significantly
change the decisions for the initial generation of conditionals.  It's
one of the things that makes writing  tests for jump threading, if
conversion and other optimizations so damn painful -- on one target
we'll have a series of conditional jumps, on anothers we may have a
series of logicals, potentially with COND_EXPRs.


> 
> That said, even though I've seen a few references to COND_EXPR
> in the midle-end I have been assuming that they, in general, do
> get transformed into PHIs.  But checking the internals manual
> I found in Section 12.6.3 this:
> 
>   A C ?: expression is converted into an if statement with each
>   branch assigning to the same temporary. ... The GIMPLE level
>   if-conversion pass re-introduces ?: expression, if appropriate.
>   It is used to vectorize loops with conditions using vector
>   conditional operations.
> 
> This GDB test case is likely the result of this reintroduction.
Nope.  It happens much earlier in the pipeline :-)


>>
>> And in a more general sense, this kind of permissiveness is not future
>> proof.  What happens if someone needs to add another EXPR node that is
>> valid on the RHS where such recursion is undesirable?  How are they
>> supposed to know that we've got this permissive recursive call and that
>> it's going to do the wrong thing?  And if it's an EXPR node with no
>> arguments, then we're going to do a read out of the bounds of the object
>> and all bets are off at that point (yes we have zero operand EXPR nodes,
>> but thankfully I don't 

[PING] [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-26 Thread Martin Sebor

Ping: did my reply and updated patch resolve your concerns?
  https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01106.html

On 6/18/19 9:19 PM, Martin Sebor wrote:

On 6/14/19 2:59 PM, Jeff Law wrote:

On 6/4/19 1:40 PM, Martin Sebor wrote:

On 6/3/19 5:24 PM, Martin Sebor wrote:

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

void* f (int i)
{
  struct S { int a[2]; } s[2];
  return >a[i];
}

[2] The following is not diagnosed even with the patch:

void sink (void*);

void* f (int i)
{
  char a[6];
  char *p = __builtin_strcpy (a, "123");
  sink (p);
  return p;
}

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

 PR c/71924
 * gimple-ssa-isolate-paths.c (is_addr_local): New function.
 (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
 (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
 (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

 PR c/71924
 * gcc.dg/Wreturn-local-addr-2.c: New test.
 * gcc.dg/Walloca-4.c: Prune expected warnings.
 * gcc.dg/pr41551.c: Same.
 * gcc.dg/pr59523.c: Same.
 * gcc.dg/tree-ssa/pr88775-2.c: Same.
 * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
 return false;
   }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it 
non-selective

about the codes where we recurse?

Consider

    ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to 
recurse
on both and logically and the result), BIT_AND (maybe we masked off 
some

bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-18 Thread Martin Sebor

On 6/14/19 2:59 PM, Jeff Law wrote:

On 6/4/19 1:40 PM, Martin Sebor wrote:

On 6/3/19 5:24 PM, Martin Sebor wrote:

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

    void* f (int i)
    {
  struct S { int a[2]; } s[2];
  return >a[i];
    }

[2] The following is not diagnosed even with the patch:

    void sink (void*);

    void* f (int i)
    {
  char a[6];
  char *p = __builtin_strcpy (a, "123");
  sink (p);
  return p;
    }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

 PR c/71924
 * gimple-ssa-isolate-paths.c (is_addr_local): New function.
 (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
 (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
 (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

 PR c/71924
 * gcc.dg/Wreturn-local-addr-2.c: New test.
 * gcc.dg/Walloca-4.c: Prune expected warnings.
 * gcc.dg/pr41551.c: Same.
 * gcc.dg/pr59523.c: Same.
 * gcc.dg/tree-ssa/pr88775-2.c: Same.
 * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
     return false;
   }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

    ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse
on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just shoved a quick assert into optimize_stmt to abort if we
encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
instantly building libgcc :-)

COmpile 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-04 Thread Martin Sebor

On 6/3/19 5:24 PM, Martin Sebor wrote:

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

   void* f (int i)
   {
 struct S { int a[2]; } s[2];
 return >a[i];
   }

[2] The following is not diagnosed even with the patch:

   void sink (void*);

   void* f (int i)
   {
 char a[6];
 char *p = __builtin_strcpy (a, "123");
 sink (p);
 return p;
   }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca 
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an 
address of a local array plus offset


gcc/ChangeLog:

PR c/71924
* gimple-ssa-isolate-paths.c (is_addr_local): New function.
(warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
(find_implicit_erroneous_behavior): Call 
warn_return_addr_local_phi_arg.

(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

PR c/71924
* gcc.dg/Wreturn-local-addr-2.c: New test.
* gcc.dg/Walloca-4.c: Prune expected warnings.
* gcc.dg/pr41551.c: Same.
* gcc.dg/pr59523.c: Same.
* gcc.dg/tree-ssa/pr88775-2.c: Same.
* gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c 
b/gcc/gimple-ssa-isolate-paths.c

index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple 
*stmt)

    return false;
  }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

   ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse
on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

I've added tests to exercise all C expressions that evaluate to
pointers.  I don't know of any others where what you bring up
should be a concern and I don't want to try to hardwire tests for
any that I can't to exercise in the testsuite or don't know how.
If you know of some I'm happy to add them and adjust the code.


+
+  if (code == GIMPLE_PHI && pmaybe)
+    {
+  unsigned count = 0;
+  gphi *phi_stmt = as_a  (def_stmt);
+
+  unsigned nargs = gimple_phi_num_args (phi_stmt);
+  for (unsigned i = 0; i < nargs; ++i)
+    {
+  if (!visited->add (phi_stmt))
+   

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-04 Thread Richard Biener
On Mon, Jun 3, 2019 at 1:27 PM Richard Biener
 wrote:
>
> On Mon, Jun 3, 2019 at 11:37 AM Richard Biener
>  wrote:
> >
> > On Fri, May 31, 2019 at 5:35 PM Jeff Law  wrote:>
> > > On 5/30/19 4:56 PM, Martin Sebor wrote:
> > > > On 5/30/19 10:15 AM, Jeff Law wrote:
> > > >> On 5/30/19 9:34 AM, Martin Sebor wrote:
> > > >>
> > > > If the alias oracle can be used to give the same results without
> > > > excessive false positives then I think it would be fine to make
> > > > use of it.  Is that something you consider a prerequisite for
> > > > this change or should I look into it as a followup?
> > >  I think we should explore it a bit before making a final decision.  
> > >  It
> > >  may guide us for other work in this space (like detecting escaping
> > >  locals).   I think a dirty prototype to see if it's even in the right
> > >  ballpark would make sense.
> > > >>>
> > > >>> Okay, let me look into it.
> > > >> Sounds good.  Again, go with a quick prototype to see if it's likely
> > > >> feasible.  The tests you've added should dramatically help evaluating 
> > > >> if
> > > >> the oracle is up to the task.
> > > >
> > > > So to expand on what I said on the phone when we spoke: the problem
> > > > I quickly ran into with the prototype is that I wasn't able to find
> > > > a way to identify pointers to alloca/VLA storage.
> > > Your analysis matches my very quick read of the aliasing code.  It may
> > > be the case that the Steensgaard patent got in the way here.
> > >
> > > >
> > > > In the the points-to solution for the pointer being returned they
> > > > both have the vars_contains_escaped_heap flag set.  That seems like
> > > > an omission that shouldn't be hard to fix, but on its own, I don't
> > > > think it would be sufficient.
> > > RIght.  In theory the result of an alloca call shouldn't alias anything
> > > in the heap -- but there were some implementations of alloca that were
> > > built on top of malloc (ugh).  That flag may be catering to that case.
> > >
> > > But in the case of a __builtin_alloca that shouldn't apply.  Hmm.  That
> > > ultimately might be a bug.
> > >
> > > >
> > > > In the IL a VLA is represented as a pointer to an array, but when
> > > > returning a pointer into a VLA (at some offset so it's an SSA_NAME),
> > > > the pointer's point-to solution doesn't include the VLA pointer or
> > > > (AFAICS) make it possible to tell even that it is a VLA.  For example
> > > > here:
> > > >
> > > >   f (int n)
> > > >   {
> > > > int * p;
> > > > int[0:D.1912] * a.1;
> > > > sizetype _1;
> > > > void * saved_stack.3_3;
> > > > sizetype _6;
> > > >
> > > >  [local count: 1073741824]:
> > > > saved_stack.3_3 = __builtin_stack_save ();
> > > > _1 = (sizetype) n_2(D);
> > > > _6 = _1 * 4;
> > > > a.1_8 = __builtin_alloca_with_align (_6, 32);
> > > > p_9 = a.1_8 + _6;
> > > > __builtin_stack_restore (saved_stack.3_3);
> > > > return p_9;
> > > >   }
> > > >
> > > > p_9's solution's is:
> > > >
> > > >   p_9, points-to vars: { D.1925 } (escaped, escaped heap)
> > > >
> > > > I couldn't find out how to determine that D.1925 is a VLA (or even
> > > > what it is).  It's not among the function's local variables that
> > > > FOR_EACH_LOCAL_DECL iterates over.
> > > It's possible that decl was created internally as part of the alias
> > > oracle's analysis.
> >
> > Yes.  Note that only the UID was reserved the fake decl doesn't
> > live on.
> >
> > Note that for the testcase above the "local" alloca storage escapes
> > which means you run into a catch-22 here given points-to computes
> > a conservative correct solution and  you want to detect escaping
> > locals.  Usually detecting a pointer to local storage can be done
> > by using ptr_deref_may_alias_global_p but of course in this
> > case the storage was marked global by PTA itself (and our PTA
> > is not flow-sensitive and it doesn't distinguish an escape through
> > a return stmt from an escape through a call which is relevant
> > even for local storage).
> >
> > Feature-wise the PTA solver is missing sth like a "filter"
> > we could put in front of return stmts that doesn't let
> > addresses of locals leak.  The simplest way of implementing
> > this might be to not include 'returns' in the constraints at all
> > (in non-IPA mode) and handle them by post-processing the
> > solver result.  That gets us some additional flow-sensitivity
> > and a way to filter locals.  Let me see if I can cook up this.
> >
> > That may ultimatively also help the warning code where you
> > then can use ptr_deref_may_alias_global_p.
> >
> > Sth like the attached - completely untested (the
> > is_global_var check is likely too simplistic...).  It does
> > the job on alloca for me.
> >
> > p_5, points-to NULL, points-to vars: { D.1913 }
> > _6, points-to NULL, points-to vars: { D.1913 }
> >
> > foo (int n)
> > {
> >   void * p;
> >   long unsigned int _1;
> >   void * _6;
> >
> >   

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-03 Thread Martin Sebor

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

   void* f (int i)
   {
 struct S { int a[2]; } s[2];
 return >a[i];
   }

[2] The following is not diagnosed even with the patch:

   void sink (void*);

   void* f (int i)
   {
 char a[6];
 char *p = __builtin_strcpy (a, "123");
 sink (p);
 return p;
   }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of 
a local array plus offset

gcc/ChangeLog:

PR c/71924
* gimple-ssa-isolate-paths.c (is_addr_local): New function.
(warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
(find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

PR c/71924
* gcc.dg/Wreturn-local-addr-2.c: New test.
* gcc.dg/Walloca-4.c: Prune expected warnings.
* gcc.dg/pr41551.c: Same.
* gcc.dg/pr59523.c: Same.
* gcc.dg/tree-ssa/pr88775-2.c: Same.
* gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
return false;
  }
  
+/* Return true if EXPR is a expression of pointer type that refers

+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+  hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+   {
+ tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+ if (POINTER_TYPE_P (type))
+   {
+ tree ptr = gimple_assign_rhs1 (def_stmt);
+ return is_addr_local (ptr, ploc, pmaybe, visited);
+   }
+ return false;
+   }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

   ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse
on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

I've added tests to exercise all C expressions that evaluate to
pointers.  I don't know of any others where what you bring up
should be a concern and I don't want to try to hardwire tests for
any that I can't to exercise in the testsuite or don't know how.
If you know of some I'm happy to add them and adjust the code.


+
+  if (code == GIMPLE_PHI && pmaybe)
+   {
+ unsigned count = 0;
+ gphi *phi_stmt = as_a  (def_stmt);
+
+ unsigned nargs = gimple_phi_num_args (phi_stmt);
+ for (unsigned i = 0; i < nargs; ++i)
+  

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-03 Thread Jeff Law
On 6/3/19 3:37 AM, Richard Biener wrote:
> On Fri, May 31, 2019 at 5:35 PM Jeff Law  wrote:>
>> On 5/30/19 4:56 PM, Martin Sebor wrote:
>>> On 5/30/19 10:15 AM, Jeff Law wrote:
 On 5/30/19 9:34 AM, Martin Sebor wrote:

>>> If the alias oracle can be used to give the same results without
>>> excessive false positives then I think it would be fine to make
>>> use of it.  Is that something you consider a prerequisite for
>>> this change or should I look into it as a followup?
>> I think we should explore it a bit before making a final decision.  It
>> may guide us for other work in this space (like detecting escaping
>> locals).   I think a dirty prototype to see if it's even in the right
>> ballpark would make sense.
>
> Okay, let me look into it.
 Sounds good.  Again, go with a quick prototype to see if it's likely
 feasible.  The tests you've added should dramatically help evaluating if
 the oracle is up to the task.
>>>
>>> So to expand on what I said on the phone when we spoke: the problem
>>> I quickly ran into with the prototype is that I wasn't able to find
>>> a way to identify pointers to alloca/VLA storage.
>> Your analysis matches my very quick read of the aliasing code.  It may
>> be the case that the Steensgaard patent got in the way here.
>>
>>>
>>> In the the points-to solution for the pointer being returned they
>>> both have the vars_contains_escaped_heap flag set.  That seems like
>>> an omission that shouldn't be hard to fix, but on its own, I don't
>>> think it would be sufficient.
>> RIght.  In theory the result of an alloca call shouldn't alias anything
>> in the heap -- but there were some implementations of alloca that were
>> built on top of malloc (ugh).  That flag may be catering to that case.
>>
>> But in the case of a __builtin_alloca that shouldn't apply.  Hmm.  That
>> ultimately might be a bug.
>>
>>>
>>> In the IL a VLA is represented as a pointer to an array, but when
>>> returning a pointer into a VLA (at some offset so it's an SSA_NAME),
>>> the pointer's point-to solution doesn't include the VLA pointer or
>>> (AFAICS) make it possible to tell even that it is a VLA.  For example
>>> here:
>>>
>>>   f (int n)
>>>   {
>>> int * p;
>>> int[0:D.1912] * a.1;
>>> sizetype _1;
>>> void * saved_stack.3_3;
>>> sizetype _6;
>>>
>>>  [local count: 1073741824]:
>>> saved_stack.3_3 = __builtin_stack_save ();
>>> _1 = (sizetype) n_2(D);
>>> _6 = _1 * 4;
>>> a.1_8 = __builtin_alloca_with_align (_6, 32);
>>> p_9 = a.1_8 + _6;
>>> __builtin_stack_restore (saved_stack.3_3);
>>> return p_9;
>>>   }
>>>
>>> p_9's solution's is:
>>>
>>>   p_9, points-to vars: { D.1925 } (escaped, escaped heap)
>>>
>>> I couldn't find out how to determine that D.1925 is a VLA (or even
>>> what it is).  It's not among the function's local variables that
>>> FOR_EACH_LOCAL_DECL iterates over.
>> It's possible that decl was created internally as part of the alias
>> oracle's analysis.
> 
> Yes.  Note that only the UID was reserved the fake decl doesn't
> live on.
> 
> Note that for the testcase above the "local" alloca storage escapes
> which means you run into a catch-22 here given points-to computes
> a conservative correct solution and  you want to detect escaping
> locals.  Usually detecting a pointer to local storage can be done
> by using ptr_deref_may_alias_global_p but of course in this
> case the storage was marked global by PTA itself (and our PTA
> is not flow-sensitive and it doesn't distinguish an escape through
> a return stmt from an escape through a call which is relevant
> even for local storage).
Good point.  The inability to tell why something escaped is another
significant hurdle.

> 
> Feature-wise the PTA solver is missing sth like a "filter"
> we could put in front of return stmts that doesn't let
> addresses of locals leak.  The simplest way of implementing
> this might be to not include 'returns' in the constraints at all
> (in non-IPA mode) and handle them by post-processing the
> solver result.  That gets us some additional flow-sensitivity
> and a way to filter locals.  Let me see if I can cook up this.
Another thought would be to somehow flag how the pointer escaped.  ie,
was it as an argument, return value or stored to memory?  Though in the
end that level of detail may not be useful, so collecting it may not be
worth the effort.

> 
> That may ultimatively also help the warning code where you
> then can use ptr_deref_may_alias_global_p.
Another thought would be to use the alias oracle to prune what we look
at.  If we ask the oracle and the value doesn't escape, then it's not
worth walking up the use-def chain to see where it came from.


Jeff


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-03 Thread Richard Biener
On Mon, Jun 3, 2019 at 11:37 AM Richard Biener
 wrote:
>
> On Fri, May 31, 2019 at 5:35 PM Jeff Law  wrote:>
> > On 5/30/19 4:56 PM, Martin Sebor wrote:
> > > On 5/30/19 10:15 AM, Jeff Law wrote:
> > >> On 5/30/19 9:34 AM, Martin Sebor wrote:
> > >>
> > > If the alias oracle can be used to give the same results without
> > > excessive false positives then I think it would be fine to make
> > > use of it.  Is that something you consider a prerequisite for
> > > this change or should I look into it as a followup?
> >  I think we should explore it a bit before making a final decision.  It
> >  may guide us for other work in this space (like detecting escaping
> >  locals).   I think a dirty prototype to see if it's even in the right
> >  ballpark would make sense.
> > >>>
> > >>> Okay, let me look into it.
> > >> Sounds good.  Again, go with a quick prototype to see if it's likely
> > >> feasible.  The tests you've added should dramatically help evaluating if
> > >> the oracle is up to the task.
> > >
> > > So to expand on what I said on the phone when we spoke: the problem
> > > I quickly ran into with the prototype is that I wasn't able to find
> > > a way to identify pointers to alloca/VLA storage.
> > Your analysis matches my very quick read of the aliasing code.  It may
> > be the case that the Steensgaard patent got in the way here.
> >
> > >
> > > In the the points-to solution for the pointer being returned they
> > > both have the vars_contains_escaped_heap flag set.  That seems like
> > > an omission that shouldn't be hard to fix, but on its own, I don't
> > > think it would be sufficient.
> > RIght.  In theory the result of an alloca call shouldn't alias anything
> > in the heap -- but there were some implementations of alloca that were
> > built on top of malloc (ugh).  That flag may be catering to that case.
> >
> > But in the case of a __builtin_alloca that shouldn't apply.  Hmm.  That
> > ultimately might be a bug.
> >
> > >
> > > In the IL a VLA is represented as a pointer to an array, but when
> > > returning a pointer into a VLA (at some offset so it's an SSA_NAME),
> > > the pointer's point-to solution doesn't include the VLA pointer or
> > > (AFAICS) make it possible to tell even that it is a VLA.  For example
> > > here:
> > >
> > >   f (int n)
> > >   {
> > > int * p;
> > > int[0:D.1912] * a.1;
> > > sizetype _1;
> > > void * saved_stack.3_3;
> > > sizetype _6;
> > >
> > >  [local count: 1073741824]:
> > > saved_stack.3_3 = __builtin_stack_save ();
> > > _1 = (sizetype) n_2(D);
> > > _6 = _1 * 4;
> > > a.1_8 = __builtin_alloca_with_align (_6, 32);
> > > p_9 = a.1_8 + _6;
> > > __builtin_stack_restore (saved_stack.3_3);
> > > return p_9;
> > >   }
> > >
> > > p_9's solution's is:
> > >
> > >   p_9, points-to vars: { D.1925 } (escaped, escaped heap)
> > >
> > > I couldn't find out how to determine that D.1925 is a VLA (or even
> > > what it is).  It's not among the function's local variables that
> > > FOR_EACH_LOCAL_DECL iterates over.
> > It's possible that decl was created internally as part of the alias
> > oracle's analysis.
>
> Yes.  Note that only the UID was reserved the fake decl doesn't
> live on.
>
> Note that for the testcase above the "local" alloca storage escapes
> which means you run into a catch-22 here given points-to computes
> a conservative correct solution and  you want to detect escaping
> locals.  Usually detecting a pointer to local storage can be done
> by using ptr_deref_may_alias_global_p but of course in this
> case the storage was marked global by PTA itself (and our PTA
> is not flow-sensitive and it doesn't distinguish an escape through
> a return stmt from an escape through a call which is relevant
> even for local storage).
>
> Feature-wise the PTA solver is missing sth like a "filter"
> we could put in front of return stmts that doesn't let
> addresses of locals leak.  The simplest way of implementing
> this might be to not include 'returns' in the constraints at all
> (in non-IPA mode) and handle them by post-processing the
> solver result.  That gets us some additional flow-sensitivity
> and a way to filter locals.  Let me see if I can cook up this.
>
> That may ultimatively also help the warning code where you
> then can use ptr_deref_may_alias_global_p.
>
> Sth like the attached - completely untested (the
> is_global_var check is likely too simplistic...).  It does
> the job on alloca for me.
>
> p_5, points-to NULL, points-to vars: { D.1913 }
> _6, points-to NULL, points-to vars: { D.1913 }
>
> foo (int n)
> {
>   void * p;
>   long unsigned int _1;
>   void * _6;
>
>:
>   _1 = (long unsigned int) n_2(D);
>   p_5 = __builtin_alloca (_1);
>   _6 = p_5;
>   return p_5;

I am testing the following fixed and more complete patch (still
the actual return values we process optimally is restricted).
I'm not sure whether it will really help real-world testcases
since the 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-03 Thread Richard Biener
On Fri, May 31, 2019 at 5:35 PM Jeff Law  wrote:>
> On 5/30/19 4:56 PM, Martin Sebor wrote:
> > On 5/30/19 10:15 AM, Jeff Law wrote:
> >> On 5/30/19 9:34 AM, Martin Sebor wrote:
> >>
> > If the alias oracle can be used to give the same results without
> > excessive false positives then I think it would be fine to make
> > use of it.  Is that something you consider a prerequisite for
> > this change or should I look into it as a followup?
>  I think we should explore it a bit before making a final decision.  It
>  may guide us for other work in this space (like detecting escaping
>  locals).   I think a dirty prototype to see if it's even in the right
>  ballpark would make sense.
> >>>
> >>> Okay, let me look into it.
> >> Sounds good.  Again, go with a quick prototype to see if it's likely
> >> feasible.  The tests you've added should dramatically help evaluating if
> >> the oracle is up to the task.
> >
> > So to expand on what I said on the phone when we spoke: the problem
> > I quickly ran into with the prototype is that I wasn't able to find
> > a way to identify pointers to alloca/VLA storage.
> Your analysis matches my very quick read of the aliasing code.  It may
> be the case that the Steensgaard patent got in the way here.
>
> >
> > In the the points-to solution for the pointer being returned they
> > both have the vars_contains_escaped_heap flag set.  That seems like
> > an omission that shouldn't be hard to fix, but on its own, I don't
> > think it would be sufficient.
> RIght.  In theory the result of an alloca call shouldn't alias anything
> in the heap -- but there were some implementations of alloca that were
> built on top of malloc (ugh).  That flag may be catering to that case.
>
> But in the case of a __builtin_alloca that shouldn't apply.  Hmm.  That
> ultimately might be a bug.
>
> >
> > In the IL a VLA is represented as a pointer to an array, but when
> > returning a pointer into a VLA (at some offset so it's an SSA_NAME),
> > the pointer's point-to solution doesn't include the VLA pointer or
> > (AFAICS) make it possible to tell even that it is a VLA.  For example
> > here:
> >
> >   f (int n)
> >   {
> > int * p;
> > int[0:D.1912] * a.1;
> > sizetype _1;
> > void * saved_stack.3_3;
> > sizetype _6;
> >
> >  [local count: 1073741824]:
> > saved_stack.3_3 = __builtin_stack_save ();
> > _1 = (sizetype) n_2(D);
> > _6 = _1 * 4;
> > a.1_8 = __builtin_alloca_with_align (_6, 32);
> > p_9 = a.1_8 + _6;
> > __builtin_stack_restore (saved_stack.3_3);
> > return p_9;
> >   }
> >
> > p_9's solution's is:
> >
> >   p_9, points-to vars: { D.1925 } (escaped, escaped heap)
> >
> > I couldn't find out how to determine that D.1925 is a VLA (or even
> > what it is).  It's not among the function's local variables that
> > FOR_EACH_LOCAL_DECL iterates over.
> It's possible that decl was created internally as part of the alias
> oracle's analysis.

Yes.  Note that only the UID was reserved the fake decl doesn't
live on.

Note that for the testcase above the "local" alloca storage escapes
which means you run into a catch-22 here given points-to computes
a conservative correct solution and  you want to detect escaping
locals.  Usually detecting a pointer to local storage can be done
by using ptr_deref_may_alias_global_p but of course in this
case the storage was marked global by PTA itself (and our PTA
is not flow-sensitive and it doesn't distinguish an escape through
a return stmt from an escape through a call which is relevant
even for local storage).

Feature-wise the PTA solver is missing sth like a "filter"
we could put in front of return stmts that doesn't let
addresses of locals leak.  The simplest way of implementing
this might be to not include 'returns' in the constraints at all
(in non-IPA mode) and handle them by post-processing the
solver result.  That gets us some additional flow-sensitivity
and a way to filter locals.  Let me see if I can cook up this.

That may ultimatively also help the warning code where you
then can use ptr_deref_may_alias_global_p.

Sth like the attached - completely untested (the
is_global_var check is likely too simplistic...).  It does
the job on alloca for me.

p_5, points-to NULL, points-to vars: { D.1913 }
_6, points-to NULL, points-to vars: { D.1913 }

foo (int n)
{
  void * p;
  long unsigned int _1;
  void * _6;

   :
  _1 = (long unsigned int) n_2(D);
  p_5 = __builtin_alloca (_1);
  _6 = p_5;
  return p_5;


Richard.

> See make_heapvar in tree-ssa-structalias.c
> >
> > By replacing the VLA in the test case with a call to malloc I get
> > this for the returned p_7:
> >
> >   p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap)
> >
> > Again, I see no way to quickly tell that this pointer points into
> > the block returned from malloc.
> If there's a way to make that determination it'd have to be on the
> variable since the pt_solution flag bits don't carry 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-31 Thread Jeff Law
On 5/22/19 3:34 PM, Martin Sebor wrote:
> -Wreturn-local-addr detects a subset of instances of returning
> the address of a local object from a function but the warning
> doesn't try to handle alloca or VLAs, or some non-trivial cases
> of ordinary automatic variables[1].
> 
> The attached patch extends the implementation of the warning to
> detect those.  It still doesn't detect instances where the address
> is the result of a built-in such strcpy[2].
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> [1] For example, this is only diagnosed with the patch:
> 
>   void* f (int i)
>   {
> struct S { int a[2]; } s[2];
> return >a[i];
>   }
> 
> [2] The following is not diagnosed even with the patch:
> 
>   void sink (void*);
> 
>   void* f (int i)
>   {
> char a[6];
> char *p = __builtin_strcpy (a, "123");
> sink (p);
> return p;
>   }
> 
> I would expect detecting to be possible and useful.  Maybe as
> a follow-up.
> 
> gcc-71924.diff
> 
> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address 
> of a local array plus offset
> 
> gcc/ChangeLog:
> 
>   PR c/71924
>   * gimple-ssa-isolate-paths.c (is_addr_local): New function.
>   (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
>   (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
>   (find_explicit_erroneous_behavior): Call warn_return_addr_local.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/71924
>   * gcc.dg/Wreturn-local-addr-2.c: New test.
>   * gcc.dg/Walloca-4.c: Prune expected warnings.
>   * gcc.dg/pr41551.c: Same.
>   * gcc.dg/pr59523.c: Same.
>   * gcc.dg/tree-ssa/pr88775-2.c: Same.
>   * gcc.dg/winline-7.c: Same.
> 
> diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
> index 33fe352bb23..2933ecf502e 100644
> --- a/gcc/gimple-ssa-isolate-paths.c
> +++ b/gcc/gimple-ssa-isolate-paths.c
> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
>return false;
>  }
>  
> +/* Return true if EXPR is a expression of pointer type that refers
> +   to the address of a variable with automatic storage duration.
> +   If so, set *PLOC to the location of the object or the call that
> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
> +   also consider PHI statements and set *PMAYBE when some but not
> +   all arguments of such statements refer to local variables, and
> +   to clear it otherwise.  */
> +
> +static bool
> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
> +hash_set *visited = NULL)
> +{
> +  if (TREE_CODE (exp) == SSA_NAME)
> +{
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
> +  enum gimple_code code = gimple_code (def_stmt);
> +
> +  if (is_gimple_assign (def_stmt))
> + {
> +   tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
> +   if (POINTER_TYPE_P (type))
> + {
> +   tree ptr = gimple_assign_rhs1 (def_stmt);
> +   return is_addr_local (ptr, ploc, pmaybe, visited);
> + }
> +   return false;
> + }
So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

  ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse
on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.



> +
> +  if (code == GIMPLE_PHI && pmaybe)
> + {
> +   unsigned count = 0;
> +   gphi *phi_stmt = as_a  (def_stmt);
> +
> +   unsigned nargs = gimple_phi_num_args (phi_stmt);
> +   for (unsigned i = 0; i < nargs; ++i)
> + {
> +   if (!visited->add (phi_stmt))
> + {
> +   tree arg = gimple_phi_arg_def (phi_stmt, i);
> +   if (is_addr_local (arg, ploc, pmaybe, visited))
> + ++count;
> + }
> + }
So imagine

p = phi (x1, x1, x2)

Where both x1 and x2 would pass is_addr_local.  I think this code would
incorrectly set pmaybe.

We process the first instance of x1, find it is local and bump count.
When we encounter the second instance, it's in our map and we do
nothing.  THen we process x2 and bump count again.  So count would be 2.
 But count is going to be less than nargs so we'll set *pmaybe to true.

ISTM you'd have to record the result for each phi argument and query
that to 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-31 Thread Jeff Law
On 5/30/19 4:56 PM, Martin Sebor wrote:
> On 5/30/19 10:15 AM, Jeff Law wrote:
>> On 5/30/19 9:34 AM, Martin Sebor wrote:
>>
> If the alias oracle can be used to give the same results without
> excessive false positives then I think it would be fine to make
> use of it.  Is that something you consider a prerequisite for
> this change or should I look into it as a followup?
 I think we should explore it a bit before making a final decision.  It
 may guide us for other work in this space (like detecting escaping
 locals).   I think a dirty prototype to see if it's even in the right
 ballpark would make sense.
>>>
>>> Okay, let me look into it.
>> Sounds good.  Again, go with a quick prototype to see if it's likely
>> feasible.  The tests you've added should dramatically help evaluating if
>> the oracle is up to the task.
> 
> So to expand on what I said on the phone when we spoke: the problem
> I quickly ran into with the prototype is that I wasn't able to find
> a way to identify pointers to alloca/VLA storage.
Your analysis matches my very quick read of the aliasing code.  It may
be the case that the Steensgaard patent got in the way here.


> 
> In the the points-to solution for the pointer being returned they
> both have the vars_contains_escaped_heap flag set.  That seems like
> an omission that shouldn't be hard to fix, but on its own, I don't
> think it would be sufficient.
RIght.  In theory the result of an alloca call shouldn't alias anything
in the heap -- but there were some implementations of alloca that were
built on top of malloc (ugh).  That flag may be catering to that case.

But in the case of a __builtin_alloca that shouldn't apply.  Hmm.  That
ultimately might be a bug.



> 
> In the IL a VLA is represented as a pointer to an array, but when
> returning a pointer into a VLA (at some offset so it's an SSA_NAME),
> the pointer's point-to solution doesn't include the VLA pointer or
> (AFAICS) make it possible to tell even that it is a VLA.  For example
> here:
> 
>   f (int n)
>   {
>     int * p;
>     int[0:D.1912] * a.1;
>     sizetype _1;
>     void * saved_stack.3_3;
>     sizetype _6;
> 
>      [local count: 1073741824]:
>     saved_stack.3_3 = __builtin_stack_save ();
>     _1 = (sizetype) n_2(D);
>     _6 = _1 * 4;
>     a.1_8 = __builtin_alloca_with_align (_6, 32);
>     p_9 = a.1_8 + _6;
>     __builtin_stack_restore (saved_stack.3_3);
>     return p_9;
>   }
> 
> p_9's solution's is:
> 
>   p_9, points-to vars: { D.1925 } (escaped, escaped heap)
> 
> I couldn't find out how to determine that D.1925 is a VLA (or even
> what it is).  It's not among the function's local variables that
> FOR_EACH_LOCAL_DECL iterates over.
It's possible that decl was created internally as part of the alias
oracle's analysis.

See make_heapvar in tree-ssa-structalias.c
> 
> By replacing the VLA in the test case with a call to malloc I get
> this for the returned p_7:
> 
>   p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap)
> 
> Again, I see no way to quickly tell that this pointer points into
> the block returned from malloc.
If there's a way to make that determination it'd have to be on the
variable since the pt_solution flag bits don't carry a storage class
directly.

You might try to get a handle on those decls and dump them to see if
there's anything easily identifiable.  But it may be easier to dig into
the code that creates them.  A real quick scan of the aliasing code also
shows the "variable_info" structure.  It's private to the aliasing code,
but might guide you at things to look at.

Regardless, I don't see an immediate path forward  using the oracle to
identify objects in the stack for your patch.  WHich is unfortunate.




Jeff


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Jeff Law
On 5/30/19 11:23 AM, Jason Merrill wrote:
> On Thu, May 30, 2019 at 12:16 PM Jeff Law  wrote:
>>
>> On 5/30/19 9:34 AM, Martin Sebor wrote:
>>
> If the alias oracle can be used to give the same results without
> excessive false positives then I think it would be fine to make
> use of it.  Is that something you consider a prerequisite for
> this change or should I look into it as a followup?
 I think we should explore it a bit before making a final decision.  It
 may guide us for other work in this space (like detecting escaping
 locals).   I think a dirty prototype to see if it's even in the right
 ballpark would make sense.
>>>
>>> Okay, let me look into it.
>> Sounds good.  Again, go with a quick prototype to see if it's likely
>> feasible.  The tests you've added should dramatically help evaluating if
>> the oracle is up to the task.
>>
>>>
> FWIW, I'm working on enhancing this to detect returning freed
> pointers (under a different option).  That seems like a good
> opportunity to also look into making use of the alias oracle.
 Yes.  I think there's two interesting cases here to ponder.  If we free
 a pointer that must point to a local, then we can warn & trap.  If we
 free a pointer that may point to a local, then we can only warn (unless
 we can isolate the path).
>>>
>>> I wasn't actually thinking of freeing locals but it sounds like
>>> a useful enhancement as well.  Thanks for the suggestion! :)
>>>
>>> To be clear, what I'm working on is detecting:
>>>
>>>   void* f (void *p)
>>>   {
>>> free (p);   // note: pointer was freed here
>>> // ...
>>> return p;   // warning: returning a freed pointer
>>>   }
>> Ah, we were talking about different things. Though what you're doing
>> might be better modeled in a true global static analyzer as a
>> use-after-free problem.  My sense is that translation-unit local version
>> of that problem really isn't that useful in practice.  THough I guess
>> there isn't anything bad with having a TU local version.
>>
>>
>>>
> Besides these enhancements, there's also a request to diagnose
> dereferencing pointers to compound literals whose lifetime has
> ended (PR 89990), or more generally, those to any such local
> object.  It's about preventing essentially the same problem
> (accessing bad data or corrupting others) but it seems that
> both the analysis and the handling will be sufficiently
> different to consider implementing it somewhere else.  What
> are your thoughts on that?
 I think the tough problem here is we lose binding scopes as we lower
 into gimple, so solving it in the general case may be tough.  We've
 started adding some clobbers into the IL to denote object death points,
 but I'm not sure if they're sufficient to implement this kind of warning.
>>>
>>> I was afraid that might be a problem.
>> Way back in the early days of tree-ssa we kept the binding scopes.  But
>> that proved problematical in various ways.  Mostly they just got in the
>> way of analysis an optimization and we spent far too much time in the
>> optimizers working around them or removing those which were empty.
>>
>> They'd be helpful in this kind of analysis, stack slot sharing vs the
>> inliner and a couple other problems.  I don't know if the pendulum has
>> moved far enough to revisit :-)
> 
> Why wouldn't clobbers be sufficient?
I haven't looked into the clobbers in any detail.  If we're aggressively
emitting them at the end of object life, then they may be sufficient to
start tackling these issues.

jeff


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Martin Sebor

On 5/30/19 10:15 AM, Jeff Law wrote:

On 5/30/19 9:34 AM, Martin Sebor wrote:


If the alias oracle can be used to give the same results without
excessive false positives then I think it would be fine to make
use of it.  Is that something you consider a prerequisite for
this change or should I look into it as a followup?

I think we should explore it a bit before making a final decision.  It
may guide us for other work in this space (like detecting escaping
locals).   I think a dirty prototype to see if it's even in the right
ballpark would make sense.


Okay, let me look into it.

Sounds good.  Again, go with a quick prototype to see if it's likely
feasible.  The tests you've added should dramatically help evaluating if
the oracle is up to the task.


So to expand on what I said on the phone when we spoke: the problem
I quickly ran into with the prototype is that I wasn't able to find
a way to identify pointers to alloca/VLA storage.

In the the points-to solution for the pointer being returned they
both have the vars_contains_escaped_heap flag set.  That seems like
an omission that shouldn't be hard to fix, but on its own, I don't
think it would be sufficient.

In the IL a VLA is represented as a pointer to an array, but when
returning a pointer into a VLA (at some offset so it's an SSA_NAME),
the pointer's point-to solution doesn't include the VLA pointer or
(AFAICS) make it possible to tell even that it is a VLA.  For example
here:

  f (int n)
  {
int * p;
int[0:D.1912] * a.1;
sizetype _1;
void * saved_stack.3_3;
sizetype _6;

 [local count: 1073741824]:
saved_stack.3_3 = __builtin_stack_save ();
_1 = (sizetype) n_2(D);
_6 = _1 * 4;
a.1_8 = __builtin_alloca_with_align (_6, 32);
p_9 = a.1_8 + _6;
__builtin_stack_restore (saved_stack.3_3);
return p_9;
  }

p_9's solution's is:

  p_9, points-to vars: { D.1925 } (escaped, escaped heap)

I couldn't find out how to determine that D.1925 is a VLA (or even
what it is).  It's not among the function's local variables that
FOR_EACH_LOCAL_DECL iterates over.

By replacing the VLA in the test case with a call to malloc I get
this for the returned p_7:

  p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap)

Again, I see no way to quickly tell that this pointer points into
the block returned from malloc.

I'm sure Richard will correct me if there is a simple way to do it
but I couldn't find one.

If there is a way to identify that the returned pointer is for
a VLA (or alloca), for parity with the current patch we also
need to find the location of the VLA declaration or the alloca
call so that the warning could point to it.  Unless there's
a straight path from the mysterious D.1925 above to that
decl/call statement, finding it would likely require some
sor of traversal.  At that point I wouldn't be surprised if
the complexity of the solution didn't approach or even exceed
the current approach.

Martin


FWIW, I'm working on enhancing this to detect returning freed
pointers (under a different option).  That seems like a good
opportunity to also look into making use of the alias oracle.

Yes.  I think there's two interesting cases here to ponder.  If we free
a pointer that must point to a local, then we can warn & trap.  If we
free a pointer that may point to a local, then we can only warn (unless
we can isolate the path).


I wasn't actually thinking of freeing locals but it sounds like
a useful enhancement as well.  Thanks for the suggestion! :)

To be clear, what I'm working on is detecting:

   void* f (void *p)
   {
     free (p);   // note: pointer was freed here
     // ...
     return p;   // warning: returning a freed pointer
   }

Ah, we were talking about different things. Though what you're doing
might be better modeled in a true global static analyzer as a
use-after-free problem.  My sense is that translation-unit local version
of that problem really isn't that useful in practice.  THough I guess
there isn't anything bad with having a TU local version.





Besides these enhancements, there's also a request to diagnose
dereferencing pointers to compound literals whose lifetime has
ended (PR 89990), or more generally, those to any such local
object.  It's about preventing essentially the same problem
(accessing bad data or corrupting others) but it seems that
both the analysis and the handling will be sufficiently
different to consider implementing it somewhere else.  What
are your thoughts on that?

I think the tough problem here is we lose binding scopes as we lower
into gimple, so solving it in the general case may be tough.  We've
started adding some clobbers into the IL to denote object death points,
but I'm not sure if they're sufficient to implement this kind of warning.


I was afraid that might be a problem.

Way back in the early days of tree-ssa we kept the binding scopes.  But
that proved problematical in various ways.  Mostly they just got in the
way of analysis an 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Jason Merrill
On Thu, May 30, 2019 at 12:16 PM Jeff Law  wrote:
>
> On 5/30/19 9:34 AM, Martin Sebor wrote:
>
> >>> If the alias oracle can be used to give the same results without
> >>> excessive false positives then I think it would be fine to make
> >>> use of it.  Is that something you consider a prerequisite for
> >>> this change or should I look into it as a followup?
> >> I think we should explore it a bit before making a final decision.  It
> >> may guide us for other work in this space (like detecting escaping
> >> locals).   I think a dirty prototype to see if it's even in the right
> >> ballpark would make sense.
> >
> > Okay, let me look into it.
> Sounds good.  Again, go with a quick prototype to see if it's likely
> feasible.  The tests you've added should dramatically help evaluating if
> the oracle is up to the task.
>
> >
> >>> FWIW, I'm working on enhancing this to detect returning freed
> >>> pointers (under a different option).  That seems like a good
> >>> opportunity to also look into making use of the alias oracle.
> >> Yes.  I think there's two interesting cases here to ponder.  If we free
> >> a pointer that must point to a local, then we can warn & trap.  If we
> >> free a pointer that may point to a local, then we can only warn (unless
> >> we can isolate the path).
> >
> > I wasn't actually thinking of freeing locals but it sounds like
> > a useful enhancement as well.  Thanks for the suggestion! :)
> >
> > To be clear, what I'm working on is detecting:
> >
> >   void* f (void *p)
> >   {
> > free (p);   // note: pointer was freed here
> > // ...
> > return p;   // warning: returning a freed pointer
> >   }
> Ah, we were talking about different things. Though what you're doing
> might be better modeled in a true global static analyzer as a
> use-after-free problem.  My sense is that translation-unit local version
> of that problem really isn't that useful in practice.  THough I guess
> there isn't anything bad with having a TU local version.
>
>
> >
> >>> Besides these enhancements, there's also a request to diagnose
> >>> dereferencing pointers to compound literals whose lifetime has
> >>> ended (PR 89990), or more generally, those to any such local
> >>> object.  It's about preventing essentially the same problem
> >>> (accessing bad data or corrupting others) but it seems that
> >>> both the analysis and the handling will be sufficiently
> >>> different to consider implementing it somewhere else.  What
> >>> are your thoughts on that?
> >> I think the tough problem here is we lose binding scopes as we lower
> >> into gimple, so solving it in the general case may be tough.  We've
> >> started adding some clobbers into the IL to denote object death points,
> >> but I'm not sure if they're sufficient to implement this kind of warning.
> >
> > I was afraid that might be a problem.
> Way back in the early days of tree-ssa we kept the binding scopes.  But
> that proved problematical in various ways.  Mostly they just got in the
> way of analysis an optimization and we spent far too much time in the
> optimizers working around them or removing those which were empty.
>
> They'd be helpful in this kind of analysis, stack slot sharing vs the
> inliner and a couple other problems.  I don't know if the pendulum has
> moved far enough to revisit :-)

Why wouldn't clobbers be sufficient?

Jaason


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Jeff Law
On 5/30/19 9:34 AM, Martin Sebor wrote:

>>> If the alias oracle can be used to give the same results without
>>> excessive false positives then I think it would be fine to make
>>> use of it.  Is that something you consider a prerequisite for
>>> this change or should I look into it as a followup?
>> I think we should explore it a bit before making a final decision.  It
>> may guide us for other work in this space (like detecting escaping
>> locals).   I think a dirty prototype to see if it's even in the right
>> ballpark would make sense.
> 
> Okay, let me look into it.
Sounds good.  Again, go with a quick prototype to see if it's likely
feasible.  The tests you've added should dramatically help evaluating if
the oracle is up to the task.

> 
>>> FWIW, I'm working on enhancing this to detect returning freed
>>> pointers (under a different option).  That seems like a good
>>> opportunity to also look into making use of the alias oracle.
>> Yes.  I think there's two interesting cases here to ponder.  If we free
>> a pointer that must point to a local, then we can warn & trap.  If we
>> free a pointer that may point to a local, then we can only warn (unless
>> we can isolate the path).
> 
> I wasn't actually thinking of freeing locals but it sounds like
> a useful enhancement as well.  Thanks for the suggestion! :)
> 
> To be clear, what I'm working on is detecting:
> 
>   void* f (void *p)
>   {
>     free (p);   // note: pointer was freed here
>     // ...
>     return p;   // warning: returning a freed pointer
>   }
Ah, we were talking about different things. Though what you're doing
might be better modeled in a true global static analyzer as a
use-after-free problem.  My sense is that translation-unit local version
of that problem really isn't that useful in practice.  THough I guess
there isn't anything bad with having a TU local version.


> 
>>> Besides these enhancements, there's also a request to diagnose
>>> dereferencing pointers to compound literals whose lifetime has
>>> ended (PR 89990), or more generally, those to any such local
>>> object.  It's about preventing essentially the same problem
>>> (accessing bad data or corrupting others) but it seems that
>>> both the analysis and the handling will be sufficiently
>>> different to consider implementing it somewhere else.  What
>>> are your thoughts on that?
>> I think the tough problem here is we lose binding scopes as we lower
>> into gimple, so solving it in the general case may be tough.  We've
>> started adding some clobbers into the IL to denote object death points,
>> but I'm not sure if they're sufficient to implement this kind of warning.
> 
> I was afraid that might be a problem.
Way back in the early days of tree-ssa we kept the binding scopes.  But
that proved problematical in various ways.  Mostly they just got in the
way of analysis an optimization and we spent far too much time in the
optimizers working around them or removing those which were empty.

They'd be helpful in this kind of analysis, stack slot sharing vs the
inliner and a couple other problems.  I don't know if the pendulum has
moved far enough to revisit :-)

jeff


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Martin Sebor

On 5/30/19 9:13 AM, Jeff Law wrote:

On 5/30/19 8:52 AM, Martin Sebor wrote:

On 5/29/19 11:45 AM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

    void* f (int i)
    {
  struct S { int a[2]; } s[2];
  return >a[i];
    }

[2] The following is not diagnosed even with the patch:

    void sink (void*);

    void* f (int i)
    {
  char a[6];
  char *p = __builtin_strcpy (a, "123");
  sink (p);
  return p;
    }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

 PR c/71924
 * gimple-ssa-isolate-paths.c (is_addr_local): New function.
 (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
 (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
 (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

 PR c/71924
 * gcc.dg/Wreturn-local-addr-2.c: New test.
 * gcc.dg/Walloca-4.c: Prune expected warnings.
 * gcc.dg/pr41551.c: Same.
 * gcc.dg/pr59523.c: Same.
 * gcc.dg/tree-ssa/pr88775-2.c: Same.
 * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
     return false;
   }
   +/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == ADDR_EXPR)
+    {
+  tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
+  if (TREE_CODE (baseaddr) == MEM_REF)
+    return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe,
visited);
+
+  if ((!VAR_P (baseaddr)
+   || is_global_var (baseaddr))
+  && TREE_CODE (baseaddr) != PARM_DECL)
+    return false;
+
+  *ploc = DECL_SOURCE_LOCATION (baseaddr);
+  return true;
+    }
+
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }
+
+  if (code == GIMPLE_CALL
+  && gimple_call_builtin_p (def_stmt))
+    {
+  tree fn = gimple_call_fndecl (def_stmt);
+  int code = DECL_FUNCTION_CODE (fn);
+  if (code != BUILT_IN_ALLOCA
+  && code != BUILT_IN_ALLOCA_WITH_ALIGN)
+    return false;
+
+  *ploc = gimple_location (def_stmt);
+  return true;
+    }
+
+  if (code == GIMPLE_PHI && pmaybe)
+    {
+  unsigned count = 0;
+  gphi *phi_stmt = as_a  (def_stmt);
+
+  unsigned nargs = gimple_phi_num_args (phi_stmt);
+  for (unsigned i = 0; i < nargs; ++i)
+    {
+  if (!visited->add (phi_stmt))
+    {
+  tree arg = gimple_phi_arg_def (phi_stmt, i);
+  if (is_addr_local (arg, ploc, pmaybe, visited))
+    ++count;
+    }
+    }
+
+  *pmaybe = count && count < nargs;
+  return count != 0;
+    }
+    }
+
+  return false;
+}

Is there some reason you didn't query the alias oracle here?  It would
seem a fairly natural fit.  Ultimately given a pointer (which will be an
SSA_NAME) you want to ask whether or not it conclusively points into the
stack.

That would seem to dramatically simplify is_addr_local.


I did think about it but decided against changing the existing
design (iterating over PHI arguments), for a couple of reasons:

1) It feels like a bigger change than my simple "bug fix" calls
    for.

I suspect the net result will be simpler though 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Jeff Law
On 5/30/19 8:52 AM, Martin Sebor wrote:
> On 5/29/19 11:45 AM, Jeff Law wrote:
>> On 5/22/19 3:34 PM, Martin Sebor wrote:
>>> -Wreturn-local-addr detects a subset of instances of returning
>>> the address of a local object from a function but the warning
>>> doesn't try to handle alloca or VLAs, or some non-trivial cases
>>> of ordinary automatic variables[1].
>>>
>>> The attached patch extends the implementation of the warning to
>>> detect those.  It still doesn't detect instances where the address
>>> is the result of a built-in such strcpy[2].
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> [1] For example, this is only diagnosed with the patch:
>>>
>>>    void* f (int i)
>>>    {
>>>  struct S { int a[2]; } s[2];
>>>  return >a[i];
>>>    }
>>>
>>> [2] The following is not diagnosed even with the patch:
>>>
>>>    void sink (void*);
>>>
>>>    void* f (int i)
>>>    {
>>>  char a[6];
>>>  char *p = __builtin_strcpy (a, "123");
>>>  sink (p);
>>>  return p;
>>>    }
>>>
>>> I would expect detecting to be possible and useful.  Maybe as
>>> a follow-up.
>>>
>>> gcc-71924.diff
>>>
>>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
>>> result
>>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
>>> address of a local array plus offset
>>>
>>> gcc/ChangeLog:
>>>
>>> PR c/71924
>>> * gimple-ssa-isolate-paths.c (is_addr_local): New function.
>>> (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
>>> (find_implicit_erroneous_behavior): Call
>>> warn_return_addr_local_phi_arg.
>>> (find_explicit_erroneous_behavior): Call warn_return_addr_local.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR c/71924
>>> * gcc.dg/Wreturn-local-addr-2.c: New test.
>>> * gcc.dg/Walloca-4.c: Prune expected warnings.
>>> * gcc.dg/pr41551.c: Same.
>>> * gcc.dg/pr59523.c: Same.
>>> * gcc.dg/tree-ssa/pr88775-2.c: Same.
>>> * gcc.dg/winline-7.c: Same.
>>>
>>> diff --git a/gcc/gimple-ssa-isolate-paths.c
>>> b/gcc/gimple-ssa-isolate-paths.c
>>> index 33fe352bb23..2933ecf502e 100644
>>> --- a/gcc/gimple-ssa-isolate-paths.c
>>> +++ b/gcc/gimple-ssa-isolate-paths.c
>>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
>>> *stmt)
>>>     return false;
>>>   }
>>>   +/* Return true if EXPR is a expression of pointer type that refers
>>> +   to the address of a variable with automatic storage duration.
>>> +   If so, set *PLOC to the location of the object or the call that
>>> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
>>> +   also consider PHI statements and set *PMAYBE when some but not
>>> +   all arguments of such statements refer to local variables, and
>>> +   to clear it otherwise.  */
>>> +
>>> +static bool
>>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
>>> +   hash_set *visited = NULL)
>>> +{
>>> +  if (TREE_CODE (exp) == ADDR_EXPR)
>>> +    {
>>> +  tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
>>> +  if (TREE_CODE (baseaddr) == MEM_REF)
>>> +    return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe,
>>> visited);
>>> +
>>> +  if ((!VAR_P (baseaddr)
>>> +   || is_global_var (baseaddr))
>>> +  && TREE_CODE (baseaddr) != PARM_DECL)
>>> +    return false;
>>> +
>>> +  *ploc = DECL_SOURCE_LOCATION (baseaddr);
>>> +  return true;
>>> +    }
>>> +
>>> +  if (TREE_CODE (exp) == SSA_NAME)
>>> +    {
>>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
>>> +  enum gimple_code code = gimple_code (def_stmt);
>>> +
>>> +  if (is_gimple_assign (def_stmt))
>>> +    {
>>> +  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
>>> +  if (POINTER_TYPE_P (type))
>>> +    {
>>> +  tree ptr = gimple_assign_rhs1 (def_stmt);
>>> +  return is_addr_local (ptr, ploc, pmaybe, visited);
>>> +    }
>>> +  return false;
>>> +    }
>>> +
>>> +  if (code == GIMPLE_CALL
>>> +  && gimple_call_builtin_p (def_stmt))
>>> +    {
>>> +  tree fn = gimple_call_fndecl (def_stmt);
>>> +  int code = DECL_FUNCTION_CODE (fn);
>>> +  if (code != BUILT_IN_ALLOCA
>>> +  && code != BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +    return false;
>>> +
>>> +  *ploc = gimple_location (def_stmt);
>>> +  return true;
>>> +    }
>>> +
>>> +  if (code == GIMPLE_PHI && pmaybe)
>>> +    {
>>> +  unsigned count = 0;
>>> +  gphi *phi_stmt = as_a  (def_stmt);
>>> +
>>> +  unsigned nargs = gimple_phi_num_args (phi_stmt);
>>> +  for (unsigned i = 0; i < nargs; ++i)
>>> +    {
>>> +  if (!visited->add (phi_stmt))
>>> +    {
>>> +  tree arg = gimple_phi_arg_def (phi_stmt, i);
>>> +  if (is_addr_local (arg, ploc, pmaybe, visited))
>>> +    ++count;
>>> +    }
>>> +    }
>>> +
>>> +  *pmaybe = count && count < nargs;
>>> +  return count != 0;
>>> +    }
>>> +    }
>>> +
>>> +  return false;
>>> +}
>> Is there 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-30 Thread Martin Sebor

On 5/29/19 11:45 AM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

   void* f (int i)
   {
 struct S { int a[2]; } s[2];
 return >a[i];
   }

[2] The following is not diagnosed even with the patch:

   void sink (void*);

   void* f (int i)
   {
 char a[6];
 char *p = __builtin_strcpy (a, "123");
 sink (p);
 return p;
   }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of 
a local array plus offset

gcc/ChangeLog:

PR c/71924
* gimple-ssa-isolate-paths.c (is_addr_local): New function.
(warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
(find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

PR c/71924
* gcc.dg/Wreturn-local-addr-2.c: New test.
* gcc.dg/Walloca-4.c: Prune expected warnings.
* gcc.dg/pr41551.c: Same.
* gcc.dg/pr59523.c: Same.
* gcc.dg/tree-ssa/pr88775-2.c: Same.
* gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
return false;
  }
  
+/* Return true if EXPR is a expression of pointer type that refers

+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+  hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == ADDR_EXPR)
+{
+  tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
+  if (TREE_CODE (baseaddr) == MEM_REF)
+   return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, 
visited);
+
+  if ((!VAR_P (baseaddr)
+  || is_global_var (baseaddr))
+ && TREE_CODE (baseaddr) != PARM_DECL)
+   return false;
+
+  *ploc = DECL_SOURCE_LOCATION (baseaddr);
+  return true;
+}
+
+  if (TREE_CODE (exp) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+   {
+ tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+ if (POINTER_TYPE_P (type))
+   {
+ tree ptr = gimple_assign_rhs1 (def_stmt);
+ return is_addr_local (ptr, ploc, pmaybe, visited);
+   }
+ return false;
+   }
+
+  if (code == GIMPLE_CALL
+ && gimple_call_builtin_p (def_stmt))
+   {
+ tree fn = gimple_call_fndecl (def_stmt);
+ int code = DECL_FUNCTION_CODE (fn);
+ if (code != BUILT_IN_ALLOCA
+ && code != BUILT_IN_ALLOCA_WITH_ALIGN)
+   return false;
+
+ *ploc = gimple_location (def_stmt);
+ return true;
+   }
+
+  if (code == GIMPLE_PHI && pmaybe)
+   {
+ unsigned count = 0;
+ gphi *phi_stmt = as_a  (def_stmt);
+
+ unsigned nargs = gimple_phi_num_args (phi_stmt);
+ for (unsigned i = 0; i < nargs; ++i)
+   {
+ if (!visited->add (phi_stmt))
+   {
+ tree arg = gimple_phi_arg_def (phi_stmt, i);
+ if (is_addr_local (arg, ploc, pmaybe, visited))
+   ++count;
+   }
+   }
+
+ *pmaybe = count && count < nargs;
+ return count != 0;
+   }
+}
+
+  return false;
+}

Is there some reason you didn't query the alias oracle here?  It would
seem a fairly natural fit.  Ultimately given a pointer (which will be an
SSA_NAME) you want to ask whether or not it conclusively points into the
stack.

That would seem to dramatically simplify is_addr_local.


I did think about it but decided against changing the existing
design (iterating over PHI arguments), for a couple of reasons:

1) It feels like a bigger change than 

Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-29 Thread Jeff Law
On 5/22/19 3:34 PM, Martin Sebor wrote:
> -Wreturn-local-addr detects a subset of instances of returning
> the address of a local object from a function but the warning
> doesn't try to handle alloca or VLAs, or some non-trivial cases
> of ordinary automatic variables[1].
> 
> The attached patch extends the implementation of the warning to
> detect those.  It still doesn't detect instances where the address
> is the result of a built-in such strcpy[2].
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> [1] For example, this is only diagnosed with the patch:
> 
>   void* f (int i)
>   {
> struct S { int a[2]; } s[2];
> return >a[i];
>   }
> 
> [2] The following is not diagnosed even with the patch:
> 
>   void sink (void*);
> 
>   void* f (int i)
>   {
> char a[6];
> char *p = __builtin_strcpy (a, "123");
> sink (p);
> return p;
>   }
> 
> I would expect detecting to be possible and useful.  Maybe as
> a follow-up.
> 
> gcc-71924.diff
> 
> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address 
> of a local array plus offset
> 
> gcc/ChangeLog:
> 
>   PR c/71924
>   * gimple-ssa-isolate-paths.c (is_addr_local): New function.
>   (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
>   (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
>   (find_explicit_erroneous_behavior): Call warn_return_addr_local.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/71924
>   * gcc.dg/Wreturn-local-addr-2.c: New test.
>   * gcc.dg/Walloca-4.c: Prune expected warnings.
>   * gcc.dg/pr41551.c: Same.
>   * gcc.dg/pr59523.c: Same.
>   * gcc.dg/tree-ssa/pr88775-2.c: Same.
>   * gcc.dg/winline-7.c: Same.
> 
> diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
> index 33fe352bb23..2933ecf502e 100644
> --- a/gcc/gimple-ssa-isolate-paths.c
> +++ b/gcc/gimple-ssa-isolate-paths.c
> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
>return false;
>  }
>  
> +/* Return true if EXPR is a expression of pointer type that refers
> +   to the address of a variable with automatic storage duration.
> +   If so, set *PLOC to the location of the object or the call that
> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
> +   also consider PHI statements and set *PMAYBE when some but not
> +   all arguments of such statements refer to local variables, and
> +   to clear it otherwise.  */
> +
> +static bool
> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
> +hash_set *visited = NULL)
> +{
> +  if (TREE_CODE (exp) == ADDR_EXPR)
> +{
> +  tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
> +  if (TREE_CODE (baseaddr) == MEM_REF)
> + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, 
> visited);
> +
> +  if ((!VAR_P (baseaddr)
> +|| is_global_var (baseaddr))
> +   && TREE_CODE (baseaddr) != PARM_DECL)
> + return false;
> +
> +  *ploc = DECL_SOURCE_LOCATION (baseaddr);
> +  return true;
> +}
> +
> +  if (TREE_CODE (exp) == SSA_NAME)
> +{
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
> +  enum gimple_code code = gimple_code (def_stmt);
> +
> +  if (is_gimple_assign (def_stmt))
> + {
> +   tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
> +   if (POINTER_TYPE_P (type))
> + {
> +   tree ptr = gimple_assign_rhs1 (def_stmt);
> +   return is_addr_local (ptr, ploc, pmaybe, visited);
> + }
> +   return false;
> + }
> +
> +  if (code == GIMPLE_CALL
> +   && gimple_call_builtin_p (def_stmt))
> + {
> +   tree fn = gimple_call_fndecl (def_stmt);
> +   int code = DECL_FUNCTION_CODE (fn);
> +   if (code != BUILT_IN_ALLOCA
> +   && code != BUILT_IN_ALLOCA_WITH_ALIGN)
> + return false;
> +
> +   *ploc = gimple_location (def_stmt);
> +   return true;
> + }
> +
> +  if (code == GIMPLE_PHI && pmaybe)
> + {
> +   unsigned count = 0;
> +   gphi *phi_stmt = as_a  (def_stmt);
> +
> +   unsigned nargs = gimple_phi_num_args (phi_stmt);
> +   for (unsigned i = 0; i < nargs; ++i)
> + {
> +   if (!visited->add (phi_stmt))
> + {
> +   tree arg = gimple_phi_arg_def (phi_stmt, i);
> +   if (is_addr_local (arg, ploc, pmaybe, visited))
> + ++count;
> + }
> + }
> +
> +   *pmaybe = count && count < nargs;
> +   return count != 0;
> + }
> +}
> +
> +  return false;
> +}
Is there some reason you didn't query the alias oracle here?  It would
seem a fairly natural fit.  Ultimately given a pointer (which will be an
SSA_NAME) you want to ask whether or not it conclusively points into the
stack.

That would seem to dramatically simplify is_addr_local.

The rest looks pretty 

[PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-05-22 Thread Martin Sebor

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

  void* f (int i)
  {
struct S { int a[2]; } s[2];
return >a[i];
  }

[2] The following is not diagnosed even with the patch:

  void sink (void*);

  void* f (int i)
  {
char a[6];
char *p = __builtin_strcpy (a, "123");
sink (p);
return p;
  }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.
PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset

gcc/ChangeLog:

	PR c/71924
	* gimple-ssa-isolate-paths.c (is_addr_local): New function.
	(warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
	(find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
	(find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

	PR c/71924
	* gcc.dg/Wreturn-local-addr-2.c: New test.
	* gcc.dg/Walloca-4.c: Prune expected warnings.
	* gcc.dg/pr41551.c: Same.
	* gcc.dg/pr59523.c: Same.
	* gcc.dg/tree-ssa/pr88775-2.c: Same.
	* gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
   return false;
 }
 
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+	   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == ADDR_EXPR)
+{
+  tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
+  if (TREE_CODE (baseaddr) == MEM_REF)
+	return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, visited);
+
+  if ((!VAR_P (baseaddr)
+	   || is_global_var (baseaddr))
+	  && TREE_CODE (baseaddr) != PARM_DECL)
+	return false;
+
+  *ploc = DECL_SOURCE_LOCATION (baseaddr);
+  return true;
+}
+
+  if (TREE_CODE (exp) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+	{
+	  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+	  if (POINTER_TYPE_P (type))
+	{
+	  tree ptr = gimple_assign_rhs1 (def_stmt);
+	  return is_addr_local (ptr, ploc, pmaybe, visited);
+	}
+	  return false;
+	}
+
+  if (code == GIMPLE_CALL
+	  && gimple_call_builtin_p (def_stmt))
+	{
+	  tree fn = gimple_call_fndecl (def_stmt);
+	  int code = DECL_FUNCTION_CODE (fn);
+	  if (code != BUILT_IN_ALLOCA
+	  && code != BUILT_IN_ALLOCA_WITH_ALIGN)
+	return false;
+
+	  *ploc = gimple_location (def_stmt);
+	  return true;
+	}
+
+  if (code == GIMPLE_PHI && pmaybe)
+	{
+	  unsigned count = 0;
+	  gphi *phi_stmt = as_a  (def_stmt);
+
+	  unsigned nargs = gimple_phi_num_args (phi_stmt);
+	  for (unsigned i = 0; i < nargs; ++i)
+	{
+	  if (!visited->add (phi_stmt))
+		{
+		  tree arg = gimple_phi_arg_def (phi_stmt, i);
+		  if (is_addr_local (arg, ploc, pmaybe, visited))
+		++count;
+		}
+	}
+
+	  *pmaybe = count && count < nargs;
+	  return count != 0;
+	}
+}
+
+  return false;
+}
+
+/* Detect and diagnose returning the address of a local variable in
+   a PHI result LHS and argument OP and PHI edge E in basic block BB.  */
+
+static basic_block
+warn_return_addr_local_phi_arg (basic_block bb, basic_block duplicate,
+tree lhs, tree op, edge e)
+{
+  location_t origin;
+  if (!is_addr_local (op, ))
+return NULL;
+
+  gimple *use_stmt;
+  imm_use_iterator iter;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+{
+  greturn *return_stmt = dyn_cast  (use_stmt);
+  if (!return_stmt)
+	continue;
+
+  if (gimple_return_retval (return_stmt) != lhs)
+	continue;
+
+  {
+	auto_diagnostic_group d;
+	if (warning_at (gimple_location (use_stmt),
+			OPT_Wreturn_local_addr,
+			"function may return address "
+			"of local variable")
+	&& origin != UNKNOWN_LOCATION)
+	  inform (origin, "declared here");
+  }
+
+  if ((flag_isolate_erroneous_paths_dereference
+