Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 7/2/19 2:59 PM, Jeff Law wrote: On 6/30/19 3:50 PM, Martin Sebor wrote: On 6/26/19 6:11 PM, Jeff Law wrote: [ Another big snip ] Is there a strong need for an overloaded operator? Our guidelines generally discourage operator overloads. This one seems on the border to me. Others may have different ideas where the line is. If we really don't need the overloaded operator, then we can avoid the controversy completely. The copy assignment operator is necessary for the type to be stored in a hash_map. Otherwise, the implicitly generated assignment will be used instead which is unsafe in vec and results in memory corription (I opened bug 90904 for this). After working around this bug by providing the assignment operator I then ran into the hash_map bug 90959 that also results in memory corruption. I spent a few fun hours tracking down the GCC miscompilations that they led to. OK. THanks for explaining why we need the copy assignment operator. The golden rule in C++ is that every type should either be safe to copy and assign with the expected semantics or made not assignable and not copyable by declaring the copy ctor and copy assignment operator private (or deleted in more modern C++). It would be very helpful to detect this kind of problem (classes that aren't safely assignable and copyable because they acquire/release resources in ctors/dtors). Not just to us in GCC but I'm sure to others as well. It's something I've been hoping to implement for a while now. You've got my blessing to do that :-) So how do you deal with the case where one operand of a {MIN,MAX,COND}_EXPR is the address of a local, but the other is not? It seems like we'll end up returning true from this function in that case and potentially trigger changing the return value to NULL. Or am I missing something? I only briefly considered MIN/MAX but because in C and C++ the less than and greater than expressions are only defined for pointers into the same array/object or subobject and I didn't ponder it too hard. (In C they're undefined otherwise, in C++ the result is unspecified.) But you bring it up because you think the address should be returned regardless, even if the expression is invalid (or if it's COND_EXPR with mixed local/nonlocal addresses) so I've changed it to do that for all these explicitly handled codes and added tests to verify it. THanks. So in the thread for strlen/sprintf the issue around unbounded walks up through PHI argument's SSA_NAME_DEF_STMT was raised. I think we've got two options here. One would be to hold this patch until we sort out what the bounds should look like, then adjust this patch to do something similar. Or go forward with this patch, then come back and adjust the walker once we have settled on solution in the other thread. I'm OK with either. Your choice. I committed the patch as is until the question of which algorithms to put the limit on is answered (as discussed in the thread Re: sprintf/strlen integration). Martin
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/30/19 3:50 PM, Martin Sebor wrote: > On 6/26/19 6:11 PM, Jeff Law wrote: [ Another big snip ] > >> Is there a strong need for an overloaded operator? Our guidelines >> generally discourage operator overloads. This one seems on the border >> to me. Others may have different ideas where the line is. If we really >> don't need the overloaded operator, then we can avoid the controversy >> completely. > > The copy assignment operator is necessary for the type to be stored > in a hash_map. Otherwise, the implicitly generated assignment will > be used instead which is unsafe in vec and results in memory > corription (I opened bug 90904 for this). After working around this > bug by providing the assignment operator I then ran into the hash_map > bug 90959 that also results in memory corruption. I spent a few fun > hours tracking down the GCC miscompilations that they led to. OK. THanks for explaining why we need the copy assignment operator. > > The golden rule in C++ is that every type should either be safe to > copy and assign with the expected semantics or made not assignable > and not copyable by declaring the copy ctor and copy assignment > operator private (or deleted in more modern C++). It would be very > helpful to detect this kind of problem (classes that aren't safely > assignable and copyable because they acquire/release resources in > ctors/dtors). Not just to us in GCC but I'm sure to others as well. > It's something I've been hoping to implement for a while now. You've got my blessing to do that :-) >> >> So how do you deal with the case where one operand of a >> {MIN,MAX,COND}_EXPR is the address of a local, but the other is not? It >> seems like we'll end up returning true from this function in that case >> and potentially trigger changing the return value to NULL. Or am I >> missing something? > > I only briefly considered MIN/MAX but because in C and C++ the less > than and greater than expressions are only defined for pointers into > the same array/object or subobject and I didn't ponder it too hard. > (In C they're undefined otherwise, in C++ the result is unspecified.) > > But you bring it up because you think the address should be returned > regardless, even if the expression is invalid (or if it's COND_EXPR > with mixed local/nonlocal addresses) so I've changed it to do that > for all these explicitly handled codes and added tests to verify it. THanks. So in the thread for strlen/sprintf the issue around unbounded walks up through PHI argument's SSA_NAME_DEF_STMT was raised. I think we've got two options here. One would be to hold this patch until we sort out what the bounds should look like, then adjust this patch to do something similar. Or go forward with this patch, then come back and adjust the walker once we have settled on solution in the other thread. I'm OK with either. Your choice. jeff
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/26/19 6:11 PM, Jeff Law wrote: On 6/18/19 9:19 PM, Martin Sebor wrote: On 6/14/19 2:59 PM, Jeff Law wrote: [ big snip ] A COND_EXPR on the RHS of an assignment is valid gimple. That's what we need to consider here -- what is and what is not valid gimple. And its more likely that PHIs will be transformed into RHS COND_EXPRs -- that's standard practice for if-conversion. Gosh, how to get one? It happens all the time :-) Since I know DOM so well, I just shoved a quick assert into optimize_stmt to abort if we encounter a gimple assignment where the RHS is a COND_EXPR. It blew up instantly building libgcc :-) COmpile the attached code with -O2 -m32. I wasn't saying it's not valid Gimple, just that I hadn't seen it come up here despite compiling Glibc and the kernel with the patched GCC. The only codes I saw are these: addr_expr, array_ref, bit_and_expr, component_ref, max_expr, mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name, and var_decl The only one here that's really surprising is the MAX_EXPR. But it is what it is. What I was asking for is a test case that makes COND_EXPR come up in this context. But I managed to find one by triggering the ICE with GDB. The file reduced to the following test case: Sorry. email can be a tough medium to nail down specific details. extern struct S s; // S has to be an incomplete type void* f (int n) { void *p; int x = 0; for (int i = n; i >= 0; i--) { p = if (p == (void*)-1) x = 1; else if (p) return p; } return x ? (void*)-1 : 0; } and the dump: [local count: 59055800]: # x_10 = PHI <1(5), 0(2)> _5 = x_10 != 0 ? -1B : 0B; [local count: 114863532]: # _3 = PHI <(4), _5(6), (3)> return _3; It seems a little odd that the COND_EXPR disappears when either of the constant addresses becomes the address of an object (or the result of alloca), and also when the number of iterations of the loop is constant. That's probably why it so rarely comes up in this context. Going into phiopt2 we have: ;; basic block 6, loop depth 0 ;;pred: 5 if (x_1 != 0) goto ; [71.00%] else goto ; [29.00%] ;;succ: 8 ;;7 ;; basic block 7, loop depth 0 ;;pred: 6 ;;succ: 8 ;; basic block 8, loop depth 0 ;;pred: 3 ;;7 ;;6 # _3 = PHI <(3), 0B(7), -1B(6)> return _3; The subgraph starting at block #6 is a classic case for turning branchy code into straightline code using a COND_EXPR on the RHS of an assignment. So you end up with something like this: ;; basic block 6, loop depth 0 ;;pred: 5 _5 = x_1 != 0 ? -1B : 0B; ;;succ: 7 ;; basic block 7, loop depth 0 ;;pred: 3 ;;6 # _3 = PHI <(3), _5(6)> return _3; Now for this specific case within phiopt we are limited to cases there the result is 0/1 or 0/-1. That's why you get something different when you exchange one of the constants for the address of an object, or anything else for that matter. This is all a bit academic -- the key point is that we can have a COND_EXPR on the RHS of an assignment. That's allowed by gimple. Sadly this is also likely one of the places where target characteristics come into play -- targets define a BRANCH_COST which can significantly change the decisions for the initial generation of conditionals. It's one of the things that makes writing tests for jump threading, if conversion and other optimizations so damn painful -- on one target we'll have a series of conditional jumps, on anothers we may have a series of logicals, potentially with COND_EXPRs. That said, even though I've seen a few references to COND_EXPR in the midle-end I have been assuming that they, in general, do get transformed into PHIs. But checking the internals manual I found in Section 12.6.3 this: A C ?: expression is converted into an if statement with each branch assigning to the same temporary. ... The GIMPLE level if-conversion pass re-introduces ?: expression, if appropriate. It is used to vectorize loops with conditions using vector conditional operations. This GDB test case is likely the result of this reintroduction. Nope. It happens much earlier in the pipeline :-) And in a more general sense, this kind of permissiveness is not future proof. What happens if someone needs to add another EXPR node that is valid on the RHS where such recursion is undesirable? How are they supposed to know that we've got this permissive recursive call and that it's going to do the wrong thing? And if it's an EXPR node with no arguments, then we're going to do a read out of the bounds of the object and all bets are off at that point (yes we have zero operand EXPR nodes, but thankfully I don't think they should up in the contexts you care about). Sure. When things are hardcoded
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/18/19 9:19 PM, Martin Sebor wrote: > On 6/14/19 2:59 PM, Jeff Law wrote: [ big snip ] >> A COND_EXPR on the RHS of an assignment is valid gimple. That's what we >> need to consider here -- what is and what is not valid gimple. And its >> more likely that PHIs will be transformed into RHS COND_EXPRs -- that's >> standard practice for if-conversion. >> >> Gosh, how to get one? It happens all the time :-) Since I know DOM so >> well, I just shoved a quick assert into optimize_stmt to abort if we >> encounter a gimple assignment where the RHS is a COND_EXPR. It blew up >> instantly building libgcc :-) >> >> COmpile the attached code with -O2 -m32. > > I wasn't saying it's not valid Gimple, just that I hadn't seen it > come up here despite compiling Glibc and the kernel with the patched > GCC. The only codes I saw are these: > > addr_expr, array_ref, bit_and_expr, component_ref, max_expr, > mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name, > and var_decl The only one here that's really surprising is the MAX_EXPR. But it is what it is. > > What I was asking for is a test case that makes COND_EXPR come up > in this context. But I managed to find one by triggering the ICE > with GDB. The file reduced to the following test case: Sorry. email can be a tough medium to nail down specific details. > > extern struct S s; // S has to be an incomplete type > > void* f (int n) > { > void *p; > int x = 0; > > for (int i = n; i >= 0; i--) > { > p = > if (p == (void*)-1) > x = 1; > else if (p) > return p; > } > > return x ? (void*)-1 : 0; > } > > and the dump: > > [local count: 59055800]: > # x_10 = PHI <1(5), 0(2)> > _5 = x_10 != 0 ? -1B : 0B; > > [local count: 114863532]: > # _3 = PHI <(4), _5(6), (3)> > return _3; > > It seems a little odd that the COND_EXPR disappears when either > of the constant addresses becomes the address of an object (or > the result of alloca), and also when the number of iterations > of the loop is constant. That's probably why it so rarely comes > up in this context. Going into phiopt2 we have: ;; basic block 6, loop depth 0 ;;pred: 5 if (x_1 != 0) goto ; [71.00%] else goto ; [29.00%] ;;succ: 8 ;;7 ;; basic block 7, loop depth 0 ;;pred: 6 ;;succ: 8 ;; basic block 8, loop depth 0 ;;pred: 3 ;;7 ;;6 # _3 = PHI <(3), 0B(7), -1B(6)> return _3; The subgraph starting at block #6 is a classic case for turning branchy code into straightline code using a COND_EXPR on the RHS of an assignment. So you end up with something like this: ;; basic block 6, loop depth 0 ;;pred: 5 _5 = x_1 != 0 ? -1B : 0B; ;;succ: 7 ;; basic block 7, loop depth 0 ;;pred: 3 ;;6 # _3 = PHI <(3), _5(6)> return _3; Now for this specific case within phiopt we are limited to cases there the result is 0/1 or 0/-1. That's why you get something different when you exchange one of the constants for the address of an object, or anything else for that matter. This is all a bit academic -- the key point is that we can have a COND_EXPR on the RHS of an assignment. That's allowed by gimple. Sadly this is also likely one of the places where target characteristics come into play -- targets define a BRANCH_COST which can significantly change the decisions for the initial generation of conditionals. It's one of the things that makes writing tests for jump threading, if conversion and other optimizations so damn painful -- on one target we'll have a series of conditional jumps, on anothers we may have a series of logicals, potentially with COND_EXPRs. > > That said, even though I've seen a few references to COND_EXPR > in the midle-end I have been assuming that they, in general, do > get transformed into PHIs. But checking the internals manual > I found in Section 12.6.3 this: > > A C ?: expression is converted into an if statement with each > branch assigning to the same temporary. ... The GIMPLE level > if-conversion pass re-introduces ?: expression, if appropriate. > It is used to vectorize loops with conditions using vector > conditional operations. > > This GDB test case is likely the result of this reintroduction. Nope. It happens much earlier in the pipeline :-) >> >> And in a more general sense, this kind of permissiveness is not future >> proof. What happens if someone needs to add another EXPR node that is >> valid on the RHS where such recursion is undesirable? How are they >> supposed to know that we've got this permissive recursive call and that >> it's going to do the wrong thing? And if it's an EXPR node with no >> arguments, then we're going to do a read out of the bounds of the object >> and all bets are off at that point (yes we have zero operand EXPR nodes, >> but thankfully I don't
[PING] [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
Ping: did my reply and updated patch resolve your concerns? https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01106.html On 6/18/19 9:19 PM, Martin Sebor wrote: On 6/14/19 2:59 PM, Jeff Law wrote: On 6/4/19 1:40 PM, Martin Sebor wrote: On 6/3/19 5:24 PM, Martin Sebor wrote: On 5/31/19 2:46 PM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. I don't have a list of codes to test for. I initially contemplated enumerating them but in the end decided the pointer type check would be sufficient. I wouldn't expect a COND_EXPR here. Don't they get transformed into PHIs? In all my tests they do and and running the whole test suite with an assert that it doesn't come up doesn't expose any either. (I left the assert for COND_EXPR there.) If a COND_EXPR really can come up in a GIMPLE assignment here can you please show me how so I can add a test for it? A COND_EXPR on the RHS of an assignment is valid gimple. That's what we need to consider here -- what is and what is not valid gimple. And its more likely that PHIs will be transformed into RHS COND_EXPRs -- that's standard practice for if-conversion. Gosh, how to get one? It happens all the time :-) Since I know DOM so well, I just
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/14/19 2:59 PM, Jeff Law wrote: On 6/4/19 1:40 PM, Martin Sebor wrote: On 6/3/19 5:24 PM, Martin Sebor wrote: On 5/31/19 2:46 PM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. I don't have a list of codes to test for. I initially contemplated enumerating them but in the end decided the pointer type check would be sufficient. I wouldn't expect a COND_EXPR here. Don't they get transformed into PHIs? In all my tests they do and and running the whole test suite with an assert that it doesn't come up doesn't expose any either. (I left the assert for COND_EXPR there.) If a COND_EXPR really can come up in a GIMPLE assignment here can you please show me how so I can add a test for it? A COND_EXPR on the RHS of an assignment is valid gimple. That's what we need to consider here -- what is and what is not valid gimple. And its more likely that PHIs will be transformed into RHS COND_EXPRs -- that's standard practice for if-conversion. Gosh, how to get one? It happens all the time :-) Since I know DOM so well, I just shoved a quick assert into optimize_stmt to abort if we encounter a gimple assignment where the RHS is a COND_EXPR. It blew up instantly building libgcc :-) COmpile
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/3/19 5:24 PM, Martin Sebor wrote: On 5/31/19 2:46 PM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. I don't have a list of codes to test for. I initially contemplated enumerating them but in the end decided the pointer type check would be sufficient. I wouldn't expect a COND_EXPR here. Don't they get transformed into PHIs? In all my tests they do and and running the whole test suite with an assert that it doesn't come up doesn't expose any either. (I left the assert for COND_EXPR there.) If a COND_EXPR really can come up in a GIMPLE assignment here can you please show me how so I can add a test for it? I've added tests to exercise all C expressions that evaluate to pointers. I don't know of any others where what you bring up should be a concern and I don't want to try to hardwire tests for any that I can't to exercise in the testsuite or don't know how. If you know of some I'm happy to add them and adjust the code. + + if (code == GIMPLE_PHI && pmaybe) + { + unsigned count = 0; + gphi *phi_stmt = as_a (def_stmt); + + unsigned nargs = gimple_phi_num_args (phi_stmt); + for (unsigned i = 0; i < nargs; ++i) + { + if (!visited->add (phi_stmt)) +
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On Mon, Jun 3, 2019 at 1:27 PM Richard Biener wrote: > > On Mon, Jun 3, 2019 at 11:37 AM Richard Biener > wrote: > > > > On Fri, May 31, 2019 at 5:35 PM Jeff Law wrote:> > > > On 5/30/19 4:56 PM, Martin Sebor wrote: > > > > On 5/30/19 10:15 AM, Jeff Law wrote: > > > >> On 5/30/19 9:34 AM, Martin Sebor wrote: > > > >> > > > > If the alias oracle can be used to give the same results without > > > > excessive false positives then I think it would be fine to make > > > > use of it. Is that something you consider a prerequisite for > > > > this change or should I look into it as a followup? > > > I think we should explore it a bit before making a final decision. > > > It > > > may guide us for other work in this space (like detecting escaping > > > locals). I think a dirty prototype to see if it's even in the right > > > ballpark would make sense. > > > >>> > > > >>> Okay, let me look into it. > > > >> Sounds good. Again, go with a quick prototype to see if it's likely > > > >> feasible. The tests you've added should dramatically help evaluating > > > >> if > > > >> the oracle is up to the task. > > > > > > > > So to expand on what I said on the phone when we spoke: the problem > > > > I quickly ran into with the prototype is that I wasn't able to find > > > > a way to identify pointers to alloca/VLA storage. > > > Your analysis matches my very quick read of the aliasing code. It may > > > be the case that the Steensgaard patent got in the way here. > > > > > > > > > > > In the the points-to solution for the pointer being returned they > > > > both have the vars_contains_escaped_heap flag set. That seems like > > > > an omission that shouldn't be hard to fix, but on its own, I don't > > > > think it would be sufficient. > > > RIght. In theory the result of an alloca call shouldn't alias anything > > > in the heap -- but there were some implementations of alloca that were > > > built on top of malloc (ugh). That flag may be catering to that case. > > > > > > But in the case of a __builtin_alloca that shouldn't apply. Hmm. That > > > ultimately might be a bug. > > > > > > > > > > > In the IL a VLA is represented as a pointer to an array, but when > > > > returning a pointer into a VLA (at some offset so it's an SSA_NAME), > > > > the pointer's point-to solution doesn't include the VLA pointer or > > > > (AFAICS) make it possible to tell even that it is a VLA. For example > > > > here: > > > > > > > > f (int n) > > > > { > > > > int * p; > > > > int[0:D.1912] * a.1; > > > > sizetype _1; > > > > void * saved_stack.3_3; > > > > sizetype _6; > > > > > > > > [local count: 1073741824]: > > > > saved_stack.3_3 = __builtin_stack_save (); > > > > _1 = (sizetype) n_2(D); > > > > _6 = _1 * 4; > > > > a.1_8 = __builtin_alloca_with_align (_6, 32); > > > > p_9 = a.1_8 + _6; > > > > __builtin_stack_restore (saved_stack.3_3); > > > > return p_9; > > > > } > > > > > > > > p_9's solution's is: > > > > > > > > p_9, points-to vars: { D.1925 } (escaped, escaped heap) > > > > > > > > I couldn't find out how to determine that D.1925 is a VLA (or even > > > > what it is). It's not among the function's local variables that > > > > FOR_EACH_LOCAL_DECL iterates over. > > > It's possible that decl was created internally as part of the alias > > > oracle's analysis. > > > > Yes. Note that only the UID was reserved the fake decl doesn't > > live on. > > > > Note that for the testcase above the "local" alloca storage escapes > > which means you run into a catch-22 here given points-to computes > > a conservative correct solution and you want to detect escaping > > locals. Usually detecting a pointer to local storage can be done > > by using ptr_deref_may_alias_global_p but of course in this > > case the storage was marked global by PTA itself (and our PTA > > is not flow-sensitive and it doesn't distinguish an escape through > > a return stmt from an escape through a call which is relevant > > even for local storage). > > > > Feature-wise the PTA solver is missing sth like a "filter" > > we could put in front of return stmts that doesn't let > > addresses of locals leak. The simplest way of implementing > > this might be to not include 'returns' in the constraints at all > > (in non-IPA mode) and handle them by post-processing the > > solver result. That gets us some additional flow-sensitivity > > and a way to filter locals. Let me see if I can cook up this. > > > > That may ultimatively also help the warning code where you > > then can use ptr_deref_may_alias_global_p. > > > > Sth like the attached - completely untested (the > > is_global_var check is likely too simplistic...). It does > > the job on alloca for me. > > > > p_5, points-to NULL, points-to vars: { D.1913 } > > _6, points-to NULL, points-to vars: { D.1913 } > > > > foo (int n) > > { > > void * p; > > long unsigned int _1; > > void * _6; > > > >
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/31/19 2:46 PM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == SSA_NAME) +{ + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. I don't have a list of codes to test for. I initially contemplated enumerating them but in the end decided the pointer type check would be sufficient. I wouldn't expect a COND_EXPR here. Don't they get transformed into PHIs? In all my tests they do and and running the whole test suite with an assert that it doesn't come up doesn't expose any either. (I left the assert for COND_EXPR there.) If a COND_EXPR really can come up in a GIMPLE assignment here can you please show me how so I can add a test for it? I've added tests to exercise all C expressions that evaluate to pointers. I don't know of any others where what you bring up should be a concern and I don't want to try to hardwire tests for any that I can't to exercise in the testsuite or don't know how. If you know of some I'm happy to add them and adjust the code. + + if (code == GIMPLE_PHI && pmaybe) + { + unsigned count = 0; + gphi *phi_stmt = as_a (def_stmt); + + unsigned nargs = gimple_phi_num_args (phi_stmt); + for (unsigned i = 0; i < nargs; ++i) +
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 6/3/19 3:37 AM, Richard Biener wrote: > On Fri, May 31, 2019 at 5:35 PM Jeff Law wrote:> >> On 5/30/19 4:56 PM, Martin Sebor wrote: >>> On 5/30/19 10:15 AM, Jeff Law wrote: On 5/30/19 9:34 AM, Martin Sebor wrote: >>> If the alias oracle can be used to give the same results without >>> excessive false positives then I think it would be fine to make >>> use of it. Is that something you consider a prerequisite for >>> this change or should I look into it as a followup? >> I think we should explore it a bit before making a final decision. It >> may guide us for other work in this space (like detecting escaping >> locals). I think a dirty prototype to see if it's even in the right >> ballpark would make sense. > > Okay, let me look into it. Sounds good. Again, go with a quick prototype to see if it's likely feasible. The tests you've added should dramatically help evaluating if the oracle is up to the task. >>> >>> So to expand on what I said on the phone when we spoke: the problem >>> I quickly ran into with the prototype is that I wasn't able to find >>> a way to identify pointers to alloca/VLA storage. >> Your analysis matches my very quick read of the aliasing code. It may >> be the case that the Steensgaard patent got in the way here. >> >>> >>> In the the points-to solution for the pointer being returned they >>> both have the vars_contains_escaped_heap flag set. That seems like >>> an omission that shouldn't be hard to fix, but on its own, I don't >>> think it would be sufficient. >> RIght. In theory the result of an alloca call shouldn't alias anything >> in the heap -- but there were some implementations of alloca that were >> built on top of malloc (ugh). That flag may be catering to that case. >> >> But in the case of a __builtin_alloca that shouldn't apply. Hmm. That >> ultimately might be a bug. >> >>> >>> In the IL a VLA is represented as a pointer to an array, but when >>> returning a pointer into a VLA (at some offset so it's an SSA_NAME), >>> the pointer's point-to solution doesn't include the VLA pointer or >>> (AFAICS) make it possible to tell even that it is a VLA. For example >>> here: >>> >>> f (int n) >>> { >>> int * p; >>> int[0:D.1912] * a.1; >>> sizetype _1; >>> void * saved_stack.3_3; >>> sizetype _6; >>> >>> [local count: 1073741824]: >>> saved_stack.3_3 = __builtin_stack_save (); >>> _1 = (sizetype) n_2(D); >>> _6 = _1 * 4; >>> a.1_8 = __builtin_alloca_with_align (_6, 32); >>> p_9 = a.1_8 + _6; >>> __builtin_stack_restore (saved_stack.3_3); >>> return p_9; >>> } >>> >>> p_9's solution's is: >>> >>> p_9, points-to vars: { D.1925 } (escaped, escaped heap) >>> >>> I couldn't find out how to determine that D.1925 is a VLA (or even >>> what it is). It's not among the function's local variables that >>> FOR_EACH_LOCAL_DECL iterates over. >> It's possible that decl was created internally as part of the alias >> oracle's analysis. > > Yes. Note that only the UID was reserved the fake decl doesn't > live on. > > Note that for the testcase above the "local" alloca storage escapes > which means you run into a catch-22 here given points-to computes > a conservative correct solution and you want to detect escaping > locals. Usually detecting a pointer to local storage can be done > by using ptr_deref_may_alias_global_p but of course in this > case the storage was marked global by PTA itself (and our PTA > is not flow-sensitive and it doesn't distinguish an escape through > a return stmt from an escape through a call which is relevant > even for local storage). Good point. The inability to tell why something escaped is another significant hurdle. > > Feature-wise the PTA solver is missing sth like a "filter" > we could put in front of return stmts that doesn't let > addresses of locals leak. The simplest way of implementing > this might be to not include 'returns' in the constraints at all > (in non-IPA mode) and handle them by post-processing the > solver result. That gets us some additional flow-sensitivity > and a way to filter locals. Let me see if I can cook up this. Another thought would be to somehow flag how the pointer escaped. ie, was it as an argument, return value or stored to memory? Though in the end that level of detail may not be useful, so collecting it may not be worth the effort. > > That may ultimatively also help the warning code where you > then can use ptr_deref_may_alias_global_p. Another thought would be to use the alias oracle to prune what we look at. If we ask the oracle and the value doesn't escape, then it's not worth walking up the use-def chain to see where it came from. Jeff
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On Mon, Jun 3, 2019 at 11:37 AM Richard Biener wrote: > > On Fri, May 31, 2019 at 5:35 PM Jeff Law wrote:> > > On 5/30/19 4:56 PM, Martin Sebor wrote: > > > On 5/30/19 10:15 AM, Jeff Law wrote: > > >> On 5/30/19 9:34 AM, Martin Sebor wrote: > > >> > > > If the alias oracle can be used to give the same results without > > > excessive false positives then I think it would be fine to make > > > use of it. Is that something you consider a prerequisite for > > > this change or should I look into it as a followup? > > I think we should explore it a bit before making a final decision. It > > may guide us for other work in this space (like detecting escaping > > locals). I think a dirty prototype to see if it's even in the right > > ballpark would make sense. > > >>> > > >>> Okay, let me look into it. > > >> Sounds good. Again, go with a quick prototype to see if it's likely > > >> feasible. The tests you've added should dramatically help evaluating if > > >> the oracle is up to the task. > > > > > > So to expand on what I said on the phone when we spoke: the problem > > > I quickly ran into with the prototype is that I wasn't able to find > > > a way to identify pointers to alloca/VLA storage. > > Your analysis matches my very quick read of the aliasing code. It may > > be the case that the Steensgaard patent got in the way here. > > > > > > > > In the the points-to solution for the pointer being returned they > > > both have the vars_contains_escaped_heap flag set. That seems like > > > an omission that shouldn't be hard to fix, but on its own, I don't > > > think it would be sufficient. > > RIght. In theory the result of an alloca call shouldn't alias anything > > in the heap -- but there were some implementations of alloca that were > > built on top of malloc (ugh). That flag may be catering to that case. > > > > But in the case of a __builtin_alloca that shouldn't apply. Hmm. That > > ultimately might be a bug. > > > > > > > > In the IL a VLA is represented as a pointer to an array, but when > > > returning a pointer into a VLA (at some offset so it's an SSA_NAME), > > > the pointer's point-to solution doesn't include the VLA pointer or > > > (AFAICS) make it possible to tell even that it is a VLA. For example > > > here: > > > > > > f (int n) > > > { > > > int * p; > > > int[0:D.1912] * a.1; > > > sizetype _1; > > > void * saved_stack.3_3; > > > sizetype _6; > > > > > > [local count: 1073741824]: > > > saved_stack.3_3 = __builtin_stack_save (); > > > _1 = (sizetype) n_2(D); > > > _6 = _1 * 4; > > > a.1_8 = __builtin_alloca_with_align (_6, 32); > > > p_9 = a.1_8 + _6; > > > __builtin_stack_restore (saved_stack.3_3); > > > return p_9; > > > } > > > > > > p_9's solution's is: > > > > > > p_9, points-to vars: { D.1925 } (escaped, escaped heap) > > > > > > I couldn't find out how to determine that D.1925 is a VLA (or even > > > what it is). It's not among the function's local variables that > > > FOR_EACH_LOCAL_DECL iterates over. > > It's possible that decl was created internally as part of the alias > > oracle's analysis. > > Yes. Note that only the UID was reserved the fake decl doesn't > live on. > > Note that for the testcase above the "local" alloca storage escapes > which means you run into a catch-22 here given points-to computes > a conservative correct solution and you want to detect escaping > locals. Usually detecting a pointer to local storage can be done > by using ptr_deref_may_alias_global_p but of course in this > case the storage was marked global by PTA itself (and our PTA > is not flow-sensitive and it doesn't distinguish an escape through > a return stmt from an escape through a call which is relevant > even for local storage). > > Feature-wise the PTA solver is missing sth like a "filter" > we could put in front of return stmts that doesn't let > addresses of locals leak. The simplest way of implementing > this might be to not include 'returns' in the constraints at all > (in non-IPA mode) and handle them by post-processing the > solver result. That gets us some additional flow-sensitivity > and a way to filter locals. Let me see if I can cook up this. > > That may ultimatively also help the warning code where you > then can use ptr_deref_may_alias_global_p. > > Sth like the attached - completely untested (the > is_global_var check is likely too simplistic...). It does > the job on alloca for me. > > p_5, points-to NULL, points-to vars: { D.1913 } > _6, points-to NULL, points-to vars: { D.1913 } > > foo (int n) > { > void * p; > long unsigned int _1; > void * _6; > >: > _1 = (long unsigned int) n_2(D); > p_5 = __builtin_alloca (_1); > _6 = p_5; > return p_5; I am testing the following fixed and more complete patch (still the actual return values we process optimally is restricted). I'm not sure whether it will really help real-world testcases since the
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On Fri, May 31, 2019 at 5:35 PM Jeff Law wrote:> > On 5/30/19 4:56 PM, Martin Sebor wrote: > > On 5/30/19 10:15 AM, Jeff Law wrote: > >> On 5/30/19 9:34 AM, Martin Sebor wrote: > >> > > If the alias oracle can be used to give the same results without > > excessive false positives then I think it would be fine to make > > use of it. Is that something you consider a prerequisite for > > this change or should I look into it as a followup? > I think we should explore it a bit before making a final decision. It > may guide us for other work in this space (like detecting escaping > locals). I think a dirty prototype to see if it's even in the right > ballpark would make sense. > >>> > >>> Okay, let me look into it. > >> Sounds good. Again, go with a quick prototype to see if it's likely > >> feasible. The tests you've added should dramatically help evaluating if > >> the oracle is up to the task. > > > > So to expand on what I said on the phone when we spoke: the problem > > I quickly ran into with the prototype is that I wasn't able to find > > a way to identify pointers to alloca/VLA storage. > Your analysis matches my very quick read of the aliasing code. It may > be the case that the Steensgaard patent got in the way here. > > > > > In the the points-to solution for the pointer being returned they > > both have the vars_contains_escaped_heap flag set. That seems like > > an omission that shouldn't be hard to fix, but on its own, I don't > > think it would be sufficient. > RIght. In theory the result of an alloca call shouldn't alias anything > in the heap -- but there were some implementations of alloca that were > built on top of malloc (ugh). That flag may be catering to that case. > > But in the case of a __builtin_alloca that shouldn't apply. Hmm. That > ultimately might be a bug. > > > > > In the IL a VLA is represented as a pointer to an array, but when > > returning a pointer into a VLA (at some offset so it's an SSA_NAME), > > the pointer's point-to solution doesn't include the VLA pointer or > > (AFAICS) make it possible to tell even that it is a VLA. For example > > here: > > > > f (int n) > > { > > int * p; > > int[0:D.1912] * a.1; > > sizetype _1; > > void * saved_stack.3_3; > > sizetype _6; > > > > [local count: 1073741824]: > > saved_stack.3_3 = __builtin_stack_save (); > > _1 = (sizetype) n_2(D); > > _6 = _1 * 4; > > a.1_8 = __builtin_alloca_with_align (_6, 32); > > p_9 = a.1_8 + _6; > > __builtin_stack_restore (saved_stack.3_3); > > return p_9; > > } > > > > p_9's solution's is: > > > > p_9, points-to vars: { D.1925 } (escaped, escaped heap) > > > > I couldn't find out how to determine that D.1925 is a VLA (or even > > what it is). It's not among the function's local variables that > > FOR_EACH_LOCAL_DECL iterates over. > It's possible that decl was created internally as part of the alias > oracle's analysis. Yes. Note that only the UID was reserved the fake decl doesn't live on. Note that for the testcase above the "local" alloca storage escapes which means you run into a catch-22 here given points-to computes a conservative correct solution and you want to detect escaping locals. Usually detecting a pointer to local storage can be done by using ptr_deref_may_alias_global_p but of course in this case the storage was marked global by PTA itself (and our PTA is not flow-sensitive and it doesn't distinguish an escape through a return stmt from an escape through a call which is relevant even for local storage). Feature-wise the PTA solver is missing sth like a "filter" we could put in front of return stmts that doesn't let addresses of locals leak. The simplest way of implementing this might be to not include 'returns' in the constraints at all (in non-IPA mode) and handle them by post-processing the solver result. That gets us some additional flow-sensitivity and a way to filter locals. Let me see if I can cook up this. That may ultimatively also help the warning code where you then can use ptr_deref_may_alias_global_p. Sth like the attached - completely untested (the is_global_var check is likely too simplistic...). It does the job on alloca for me. p_5, points-to NULL, points-to vars: { D.1913 } _6, points-to NULL, points-to vars: { D.1913 } foo (int n) { void * p; long unsigned int _1; void * _6; : _1 = (long unsigned int) n_2(D); p_5 = __builtin_alloca (_1); _6 = p_5; return p_5; Richard. > See make_heapvar in tree-ssa-structalias.c > > > > By replacing the VLA in the test case with a call to malloc I get > > this for the returned p_7: > > > > p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap) > > > > Again, I see no way to quickly tell that this pointer points into > > the block returned from malloc. > If there's a way to make that determination it'd have to be on the > variable since the pt_solution flag bits don't carry
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/22/19 3:34 PM, Martin Sebor wrote: > -Wreturn-local-addr detects a subset of instances of returning > the address of a local object from a function but the warning > doesn't try to handle alloca or VLAs, or some non-trivial cases > of ordinary automatic variables[1]. > > The attached patch extends the implementation of the warning to > detect those. It still doesn't detect instances where the address > is the result of a built-in such strcpy[2]. > > Tested on x86_64-linux. > > Martin > > [1] For example, this is only diagnosed with the patch: > > void* f (int i) > { > struct S { int a[2]; } s[2]; > return >a[i]; > } > > [2] The following is not diagnosed even with the patch: > > void sink (void*); > > void* f (int i) > { > char a[6]; > char *p = __builtin_strcpy (a, "123"); > sink (p); > return p; > } > > I would expect detecting to be possible and useful. Maybe as > a follow-up. > > gcc-71924.diff > > PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result > PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address > of a local array plus offset > > gcc/ChangeLog: > > PR c/71924 > * gimple-ssa-isolate-paths.c (is_addr_local): New function. > (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. > (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. > (find_explicit_erroneous_behavior): Call warn_return_addr_local. > > gcc/testsuite/ChangeLog: > > PR c/71924 > * gcc.dg/Wreturn-local-addr-2.c: New test. > * gcc.dg/Walloca-4.c: Prune expected warnings. > * gcc.dg/pr41551.c: Same. > * gcc.dg/pr59523.c: Same. > * gcc.dg/tree-ssa/pr88775-2.c: Same. > * gcc.dg/winline-7.c: Same. > > diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c > index 33fe352bb23..2933ecf502e 100644 > --- a/gcc/gimple-ssa-isolate-paths.c > +++ b/gcc/gimple-ssa-isolate-paths.c > @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) >return false; > } > > +/* Return true if EXPR is a expression of pointer type that refers > + to the address of a variable with automatic storage duration. > + If so, set *PLOC to the location of the object or the call that > + allocated it (for alloca and VLAs). When PMAYBE is non-null, > + also consider PHI statements and set *PMAYBE when some but not > + all arguments of such statements refer to local variables, and > + to clear it otherwise. */ > + > +static bool > +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, > +hash_set *visited = NULL) > +{ > + if (TREE_CODE (exp) == SSA_NAME) > +{ > + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); > + enum gimple_code code = gimple_code (def_stmt); > + > + if (is_gimple_assign (def_stmt)) > + { > + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); > + if (POINTER_TYPE_P (type)) > + { > + tree ptr = gimple_assign_rhs1 (def_stmt); > + return is_addr_local (ptr, ploc, pmaybe, visited); > + } > + return false; > + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. > + > + if (code == GIMPLE_PHI && pmaybe) > + { > + unsigned count = 0; > + gphi *phi_stmt = as_a (def_stmt); > + > + unsigned nargs = gimple_phi_num_args (phi_stmt); > + for (unsigned i = 0; i < nargs; ++i) > + { > + if (!visited->add (phi_stmt)) > + { > + tree arg = gimple_phi_arg_def (phi_stmt, i); > + if (is_addr_local (arg, ploc, pmaybe, visited)) > + ++count; > + } > + } So imagine p = phi (x1, x1, x2) Where both x1 and x2 would pass is_addr_local. I think this code would incorrectly set pmaybe. We process the first instance of x1, find it is local and bump count. When we encounter the second instance, it's in our map and we do nothing. THen we process x2 and bump count again. So count would be 2. But count is going to be less than nargs so we'll set *pmaybe to true. ISTM you'd have to record the result for each phi argument and query that to
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 4:56 PM, Martin Sebor wrote: > On 5/30/19 10:15 AM, Jeff Law wrote: >> On 5/30/19 9:34 AM, Martin Sebor wrote: >> > If the alias oracle can be used to give the same results without > excessive false positives then I think it would be fine to make > use of it. Is that something you consider a prerequisite for > this change or should I look into it as a followup? I think we should explore it a bit before making a final decision. It may guide us for other work in this space (like detecting escaping locals). I think a dirty prototype to see if it's even in the right ballpark would make sense. >>> >>> Okay, let me look into it. >> Sounds good. Again, go with a quick prototype to see if it's likely >> feasible. The tests you've added should dramatically help evaluating if >> the oracle is up to the task. > > So to expand on what I said on the phone when we spoke: the problem > I quickly ran into with the prototype is that I wasn't able to find > a way to identify pointers to alloca/VLA storage. Your analysis matches my very quick read of the aliasing code. It may be the case that the Steensgaard patent got in the way here. > > In the the points-to solution for the pointer being returned they > both have the vars_contains_escaped_heap flag set. That seems like > an omission that shouldn't be hard to fix, but on its own, I don't > think it would be sufficient. RIght. In theory the result of an alloca call shouldn't alias anything in the heap -- but there were some implementations of alloca that were built on top of malloc (ugh). That flag may be catering to that case. But in the case of a __builtin_alloca that shouldn't apply. Hmm. That ultimately might be a bug. > > In the IL a VLA is represented as a pointer to an array, but when > returning a pointer into a VLA (at some offset so it's an SSA_NAME), > the pointer's point-to solution doesn't include the VLA pointer or > (AFAICS) make it possible to tell even that it is a VLA. For example > here: > > f (int n) > { > int * p; > int[0:D.1912] * a.1; > sizetype _1; > void * saved_stack.3_3; > sizetype _6; > > [local count: 1073741824]: > saved_stack.3_3 = __builtin_stack_save (); > _1 = (sizetype) n_2(D); > _6 = _1 * 4; > a.1_8 = __builtin_alloca_with_align (_6, 32); > p_9 = a.1_8 + _6; > __builtin_stack_restore (saved_stack.3_3); > return p_9; > } > > p_9's solution's is: > > p_9, points-to vars: { D.1925 } (escaped, escaped heap) > > I couldn't find out how to determine that D.1925 is a VLA (or even > what it is). It's not among the function's local variables that > FOR_EACH_LOCAL_DECL iterates over. It's possible that decl was created internally as part of the alias oracle's analysis. See make_heapvar in tree-ssa-structalias.c > > By replacing the VLA in the test case with a call to malloc I get > this for the returned p_7: > > p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap) > > Again, I see no way to quickly tell that this pointer points into > the block returned from malloc. If there's a way to make that determination it'd have to be on the variable since the pt_solution flag bits don't carry a storage class directly. You might try to get a handle on those decls and dump them to see if there's anything easily identifiable. But it may be easier to dig into the code that creates them. A real quick scan of the aliasing code also shows the "variable_info" structure. It's private to the aliasing code, but might guide you at things to look at. Regardless, I don't see an immediate path forward using the oracle to identify objects in the stack for your patch. WHich is unfortunate. Jeff
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 11:23 AM, Jason Merrill wrote: > On Thu, May 30, 2019 at 12:16 PM Jeff Law wrote: >> >> On 5/30/19 9:34 AM, Martin Sebor wrote: >> > If the alias oracle can be used to give the same results without > excessive false positives then I think it would be fine to make > use of it. Is that something you consider a prerequisite for > this change or should I look into it as a followup? I think we should explore it a bit before making a final decision. It may guide us for other work in this space (like detecting escaping locals). I think a dirty prototype to see if it's even in the right ballpark would make sense. >>> >>> Okay, let me look into it. >> Sounds good. Again, go with a quick prototype to see if it's likely >> feasible. The tests you've added should dramatically help evaluating if >> the oracle is up to the task. >> >>> > FWIW, I'm working on enhancing this to detect returning freed > pointers (under a different option). That seems like a good > opportunity to also look into making use of the alias oracle. Yes. I think there's two interesting cases here to ponder. If we free a pointer that must point to a local, then we can warn & trap. If we free a pointer that may point to a local, then we can only warn (unless we can isolate the path). >>> >>> I wasn't actually thinking of freeing locals but it sounds like >>> a useful enhancement as well. Thanks for the suggestion! :) >>> >>> To be clear, what I'm working on is detecting: >>> >>> void* f (void *p) >>> { >>> free (p); // note: pointer was freed here >>> // ... >>> return p; // warning: returning a freed pointer >>> } >> Ah, we were talking about different things. Though what you're doing >> might be better modeled in a true global static analyzer as a >> use-after-free problem. My sense is that translation-unit local version >> of that problem really isn't that useful in practice. THough I guess >> there isn't anything bad with having a TU local version. >> >> >>> > Besides these enhancements, there's also a request to diagnose > dereferencing pointers to compound literals whose lifetime has > ended (PR 89990), or more generally, those to any such local > object. It's about preventing essentially the same problem > (accessing bad data or corrupting others) but it seems that > both the analysis and the handling will be sufficiently > different to consider implementing it somewhere else. What > are your thoughts on that? I think the tough problem here is we lose binding scopes as we lower into gimple, so solving it in the general case may be tough. We've started adding some clobbers into the IL to denote object death points, but I'm not sure if they're sufficient to implement this kind of warning. >>> >>> I was afraid that might be a problem. >> Way back in the early days of tree-ssa we kept the binding scopes. But >> that proved problematical in various ways. Mostly they just got in the >> way of analysis an optimization and we spent far too much time in the >> optimizers working around them or removing those which were empty. >> >> They'd be helpful in this kind of analysis, stack slot sharing vs the >> inliner and a couple other problems. I don't know if the pendulum has >> moved far enough to revisit :-) > > Why wouldn't clobbers be sufficient? I haven't looked into the clobbers in any detail. If we're aggressively emitting them at the end of object life, then they may be sufficient to start tackling these issues. jeff
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 10:15 AM, Jeff Law wrote: On 5/30/19 9:34 AM, Martin Sebor wrote: If the alias oracle can be used to give the same results without excessive false positives then I think it would be fine to make use of it. Is that something you consider a prerequisite for this change or should I look into it as a followup? I think we should explore it a bit before making a final decision. It may guide us for other work in this space (like detecting escaping locals). I think a dirty prototype to see if it's even in the right ballpark would make sense. Okay, let me look into it. Sounds good. Again, go with a quick prototype to see if it's likely feasible. The tests you've added should dramatically help evaluating if the oracle is up to the task. So to expand on what I said on the phone when we spoke: the problem I quickly ran into with the prototype is that I wasn't able to find a way to identify pointers to alloca/VLA storage. In the the points-to solution for the pointer being returned they both have the vars_contains_escaped_heap flag set. That seems like an omission that shouldn't be hard to fix, but on its own, I don't think it would be sufficient. In the IL a VLA is represented as a pointer to an array, but when returning a pointer into a VLA (at some offset so it's an SSA_NAME), the pointer's point-to solution doesn't include the VLA pointer or (AFAICS) make it possible to tell even that it is a VLA. For example here: f (int n) { int * p; int[0:D.1912] * a.1; sizetype _1; void * saved_stack.3_3; sizetype _6; [local count: 1073741824]: saved_stack.3_3 = __builtin_stack_save (); _1 = (sizetype) n_2(D); _6 = _1 * 4; a.1_8 = __builtin_alloca_with_align (_6, 32); p_9 = a.1_8 + _6; __builtin_stack_restore (saved_stack.3_3); return p_9; } p_9's solution's is: p_9, points-to vars: { D.1925 } (escaped, escaped heap) I couldn't find out how to determine that D.1925 is a VLA (or even what it is). It's not among the function's local variables that FOR_EACH_LOCAL_DECL iterates over. By replacing the VLA in the test case with a call to malloc I get this for the returned p_7: p_7, points-to NULL, points-to vars: { D.1916 } (escaped, escaped heap) Again, I see no way to quickly tell that this pointer points into the block returned from malloc. I'm sure Richard will correct me if there is a simple way to do it but I couldn't find one. If there is a way to identify that the returned pointer is for a VLA (or alloca), for parity with the current patch we also need to find the location of the VLA declaration or the alloca call so that the warning could point to it. Unless there's a straight path from the mysterious D.1925 above to that decl/call statement, finding it would likely require some sor of traversal. At that point I wouldn't be surprised if the complexity of the solution didn't approach or even exceed the current approach. Martin FWIW, I'm working on enhancing this to detect returning freed pointers (under a different option). That seems like a good opportunity to also look into making use of the alias oracle. Yes. I think there's two interesting cases here to ponder. If we free a pointer that must point to a local, then we can warn & trap. If we free a pointer that may point to a local, then we can only warn (unless we can isolate the path). I wasn't actually thinking of freeing locals but it sounds like a useful enhancement as well. Thanks for the suggestion! :) To be clear, what I'm working on is detecting: void* f (void *p) { free (p); // note: pointer was freed here // ... return p; // warning: returning a freed pointer } Ah, we were talking about different things. Though what you're doing might be better modeled in a true global static analyzer as a use-after-free problem. My sense is that translation-unit local version of that problem really isn't that useful in practice. THough I guess there isn't anything bad with having a TU local version. Besides these enhancements, there's also a request to diagnose dereferencing pointers to compound literals whose lifetime has ended (PR 89990), or more generally, those to any such local object. It's about preventing essentially the same problem (accessing bad data or corrupting others) but it seems that both the analysis and the handling will be sufficiently different to consider implementing it somewhere else. What are your thoughts on that? I think the tough problem here is we lose binding scopes as we lower into gimple, so solving it in the general case may be tough. We've started adding some clobbers into the IL to denote object death points, but I'm not sure if they're sufficient to implement this kind of warning. I was afraid that might be a problem. Way back in the early days of tree-ssa we kept the binding scopes. But that proved problematical in various ways. Mostly they just got in the way of analysis an
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On Thu, May 30, 2019 at 12:16 PM Jeff Law wrote: > > On 5/30/19 9:34 AM, Martin Sebor wrote: > > >>> If the alias oracle can be used to give the same results without > >>> excessive false positives then I think it would be fine to make > >>> use of it. Is that something you consider a prerequisite for > >>> this change or should I look into it as a followup? > >> I think we should explore it a bit before making a final decision. It > >> may guide us for other work in this space (like detecting escaping > >> locals). I think a dirty prototype to see if it's even in the right > >> ballpark would make sense. > > > > Okay, let me look into it. > Sounds good. Again, go with a quick prototype to see if it's likely > feasible. The tests you've added should dramatically help evaluating if > the oracle is up to the task. > > > > >>> FWIW, I'm working on enhancing this to detect returning freed > >>> pointers (under a different option). That seems like a good > >>> opportunity to also look into making use of the alias oracle. > >> Yes. I think there's two interesting cases here to ponder. If we free > >> a pointer that must point to a local, then we can warn & trap. If we > >> free a pointer that may point to a local, then we can only warn (unless > >> we can isolate the path). > > > > I wasn't actually thinking of freeing locals but it sounds like > > a useful enhancement as well. Thanks for the suggestion! :) > > > > To be clear, what I'm working on is detecting: > > > > void* f (void *p) > > { > > free (p); // note: pointer was freed here > > // ... > > return p; // warning: returning a freed pointer > > } > Ah, we were talking about different things. Though what you're doing > might be better modeled in a true global static analyzer as a > use-after-free problem. My sense is that translation-unit local version > of that problem really isn't that useful in practice. THough I guess > there isn't anything bad with having a TU local version. > > > > > >>> Besides these enhancements, there's also a request to diagnose > >>> dereferencing pointers to compound literals whose lifetime has > >>> ended (PR 89990), or more generally, those to any such local > >>> object. It's about preventing essentially the same problem > >>> (accessing bad data or corrupting others) but it seems that > >>> both the analysis and the handling will be sufficiently > >>> different to consider implementing it somewhere else. What > >>> are your thoughts on that? > >> I think the tough problem here is we lose binding scopes as we lower > >> into gimple, so solving it in the general case may be tough. We've > >> started adding some clobbers into the IL to denote object death points, > >> but I'm not sure if they're sufficient to implement this kind of warning. > > > > I was afraid that might be a problem. > Way back in the early days of tree-ssa we kept the binding scopes. But > that proved problematical in various ways. Mostly they just got in the > way of analysis an optimization and we spent far too much time in the > optimizers working around them or removing those which were empty. > > They'd be helpful in this kind of analysis, stack slot sharing vs the > inliner and a couple other problems. I don't know if the pendulum has > moved far enough to revisit :-) Why wouldn't clobbers be sufficient? Jaason
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 9:34 AM, Martin Sebor wrote: >>> If the alias oracle can be used to give the same results without >>> excessive false positives then I think it would be fine to make >>> use of it. Is that something you consider a prerequisite for >>> this change or should I look into it as a followup? >> I think we should explore it a bit before making a final decision. It >> may guide us for other work in this space (like detecting escaping >> locals). I think a dirty prototype to see if it's even in the right >> ballpark would make sense. > > Okay, let me look into it. Sounds good. Again, go with a quick prototype to see if it's likely feasible. The tests you've added should dramatically help evaluating if the oracle is up to the task. > >>> FWIW, I'm working on enhancing this to detect returning freed >>> pointers (under a different option). That seems like a good >>> opportunity to also look into making use of the alias oracle. >> Yes. I think there's two interesting cases here to ponder. If we free >> a pointer that must point to a local, then we can warn & trap. If we >> free a pointer that may point to a local, then we can only warn (unless >> we can isolate the path). > > I wasn't actually thinking of freeing locals but it sounds like > a useful enhancement as well. Thanks for the suggestion! :) > > To be clear, what I'm working on is detecting: > > void* f (void *p) > { > free (p); // note: pointer was freed here > // ... > return p; // warning: returning a freed pointer > } Ah, we were talking about different things. Though what you're doing might be better modeled in a true global static analyzer as a use-after-free problem. My sense is that translation-unit local version of that problem really isn't that useful in practice. THough I guess there isn't anything bad with having a TU local version. > >>> Besides these enhancements, there's also a request to diagnose >>> dereferencing pointers to compound literals whose lifetime has >>> ended (PR 89990), or more generally, those to any such local >>> object. It's about preventing essentially the same problem >>> (accessing bad data or corrupting others) but it seems that >>> both the analysis and the handling will be sufficiently >>> different to consider implementing it somewhere else. What >>> are your thoughts on that? >> I think the tough problem here is we lose binding scopes as we lower >> into gimple, so solving it in the general case may be tough. We've >> started adding some clobbers into the IL to denote object death points, >> but I'm not sure if they're sufficient to implement this kind of warning. > > I was afraid that might be a problem. Way back in the early days of tree-ssa we kept the binding scopes. But that proved problematical in various ways. Mostly they just got in the way of analysis an optimization and we spent far too much time in the optimizers working around them or removing those which were empty. They'd be helpful in this kind of analysis, stack slot sharing vs the inliner and a couple other problems. I don't know if the pendulum has moved far enough to revisit :-) jeff
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 9:13 AM, Jeff Law wrote: On 5/30/19 8:52 AM, Martin Sebor wrote: On 5/29/19 11:45 AM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == ADDR_EXPR) + { + tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); + if (TREE_CODE (baseaddr) == MEM_REF) + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, visited); + + if ((!VAR_P (baseaddr) + || is_global_var (baseaddr)) + && TREE_CODE (baseaddr) != PARM_DECL) + return false; + + *ploc = DECL_SOURCE_LOCATION (baseaddr); + return true; + } + + if (TREE_CODE (exp) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } + + if (code == GIMPLE_CALL + && gimple_call_builtin_p (def_stmt)) + { + tree fn = gimple_call_fndecl (def_stmt); + int code = DECL_FUNCTION_CODE (fn); + if (code != BUILT_IN_ALLOCA + && code != BUILT_IN_ALLOCA_WITH_ALIGN) + return false; + + *ploc = gimple_location (def_stmt); + return true; + } + + if (code == GIMPLE_PHI && pmaybe) + { + unsigned count = 0; + gphi *phi_stmt = as_a (def_stmt); + + unsigned nargs = gimple_phi_num_args (phi_stmt); + for (unsigned i = 0; i < nargs; ++i) + { + if (!visited->add (phi_stmt)) + { + tree arg = gimple_phi_arg_def (phi_stmt, i); + if (is_addr_local (arg, ploc, pmaybe, visited)) + ++count; + } + } + + *pmaybe = count && count < nargs; + return count != 0; + } + } + + return false; +} Is there some reason you didn't query the alias oracle here? It would seem a fairly natural fit. Ultimately given a pointer (which will be an SSA_NAME) you want to ask whether or not it conclusively points into the stack. That would seem to dramatically simplify is_addr_local. I did think about it but decided against changing the existing design (iterating over PHI arguments), for a couple of reasons: 1) It feels like a bigger change than my simple "bug fix" calls for. I suspect the net result will be simpler though
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/30/19 8:52 AM, Martin Sebor wrote: > On 5/29/19 11:45 AM, Jeff Law wrote: >> On 5/22/19 3:34 PM, Martin Sebor wrote: >>> -Wreturn-local-addr detects a subset of instances of returning >>> the address of a local object from a function but the warning >>> doesn't try to handle alloca or VLAs, or some non-trivial cases >>> of ordinary automatic variables[1]. >>> >>> The attached patch extends the implementation of the warning to >>> detect those. It still doesn't detect instances where the address >>> is the result of a built-in such strcpy[2]. >>> >>> Tested on x86_64-linux. >>> >>> Martin >>> >>> [1] For example, this is only diagnosed with the patch: >>> >>> void* f (int i) >>> { >>> struct S { int a[2]; } s[2]; >>> return >a[i]; >>> } >>> >>> [2] The following is not diagnosed even with the patch: >>> >>> void sink (void*); >>> >>> void* f (int i) >>> { >>> char a[6]; >>> char *p = __builtin_strcpy (a, "123"); >>> sink (p); >>> return p; >>> } >>> >>> I would expect detecting to be possible and useful. Maybe as >>> a follow-up. >>> >>> gcc-71924.diff >>> >>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca >>> result >>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an >>> address of a local array plus offset >>> >>> gcc/ChangeLog: >>> >>> PR c/71924 >>> * gimple-ssa-isolate-paths.c (is_addr_local): New function. >>> (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. >>> (find_implicit_erroneous_behavior): Call >>> warn_return_addr_local_phi_arg. >>> (find_explicit_erroneous_behavior): Call warn_return_addr_local. >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR c/71924 >>> * gcc.dg/Wreturn-local-addr-2.c: New test. >>> * gcc.dg/Walloca-4.c: Prune expected warnings. >>> * gcc.dg/pr41551.c: Same. >>> * gcc.dg/pr59523.c: Same. >>> * gcc.dg/tree-ssa/pr88775-2.c: Same. >>> * gcc.dg/winline-7.c: Same. >>> >>> diff --git a/gcc/gimple-ssa-isolate-paths.c >>> b/gcc/gimple-ssa-isolate-paths.c >>> index 33fe352bb23..2933ecf502e 100644 >>> --- a/gcc/gimple-ssa-isolate-paths.c >>> +++ b/gcc/gimple-ssa-isolate-paths.c >>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple >>> *stmt) >>> return false; >>> } >>> +/* Return true if EXPR is a expression of pointer type that refers >>> + to the address of a variable with automatic storage duration. >>> + If so, set *PLOC to the location of the object or the call that >>> + allocated it (for alloca and VLAs). When PMAYBE is non-null, >>> + also consider PHI statements and set *PMAYBE when some but not >>> + all arguments of such statements refer to local variables, and >>> + to clear it otherwise. */ >>> + >>> +static bool >>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, >>> + hash_set *visited = NULL) >>> +{ >>> + if (TREE_CODE (exp) == ADDR_EXPR) >>> + { >>> + tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); >>> + if (TREE_CODE (baseaddr) == MEM_REF) >>> + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, >>> visited); >>> + >>> + if ((!VAR_P (baseaddr) >>> + || is_global_var (baseaddr)) >>> + && TREE_CODE (baseaddr) != PARM_DECL) >>> + return false; >>> + >>> + *ploc = DECL_SOURCE_LOCATION (baseaddr); >>> + return true; >>> + } >>> + >>> + if (TREE_CODE (exp) == SSA_NAME) >>> + { >>> + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); >>> + enum gimple_code code = gimple_code (def_stmt); >>> + >>> + if (is_gimple_assign (def_stmt)) >>> + { >>> + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); >>> + if (POINTER_TYPE_P (type)) >>> + { >>> + tree ptr = gimple_assign_rhs1 (def_stmt); >>> + return is_addr_local (ptr, ploc, pmaybe, visited); >>> + } >>> + return false; >>> + } >>> + >>> + if (code == GIMPLE_CALL >>> + && gimple_call_builtin_p (def_stmt)) >>> + { >>> + tree fn = gimple_call_fndecl (def_stmt); >>> + int code = DECL_FUNCTION_CODE (fn); >>> + if (code != BUILT_IN_ALLOCA >>> + && code != BUILT_IN_ALLOCA_WITH_ALIGN) >>> + return false; >>> + >>> + *ploc = gimple_location (def_stmt); >>> + return true; >>> + } >>> + >>> + if (code == GIMPLE_PHI && pmaybe) >>> + { >>> + unsigned count = 0; >>> + gphi *phi_stmt = as_a (def_stmt); >>> + >>> + unsigned nargs = gimple_phi_num_args (phi_stmt); >>> + for (unsigned i = 0; i < nargs; ++i) >>> + { >>> + if (!visited->add (phi_stmt)) >>> + { >>> + tree arg = gimple_phi_arg_def (phi_stmt, i); >>> + if (is_addr_local (arg, ploc, pmaybe, visited)) >>> + ++count; >>> + } >>> + } >>> + >>> + *pmaybe = count && count < nargs; >>> + return count != 0; >>> + } >>> + } >>> + >>> + return false; >>> +} >> Is there
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/29/19 11:45 AM, Jeff Law wrote: On 5/22/19 3:34 PM, Martin Sebor wrote: -Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. gcc-71924.diff PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == ADDR_EXPR) +{ + tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); + if (TREE_CODE (baseaddr) == MEM_REF) + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, visited); + + if ((!VAR_P (baseaddr) + || is_global_var (baseaddr)) + && TREE_CODE (baseaddr) != PARM_DECL) + return false; + + *ploc = DECL_SOURCE_LOCATION (baseaddr); + return true; +} + + if (TREE_CODE (exp) == SSA_NAME) +{ + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } + + if (code == GIMPLE_CALL + && gimple_call_builtin_p (def_stmt)) + { + tree fn = gimple_call_fndecl (def_stmt); + int code = DECL_FUNCTION_CODE (fn); + if (code != BUILT_IN_ALLOCA + && code != BUILT_IN_ALLOCA_WITH_ALIGN) + return false; + + *ploc = gimple_location (def_stmt); + return true; + } + + if (code == GIMPLE_PHI && pmaybe) + { + unsigned count = 0; + gphi *phi_stmt = as_a (def_stmt); + + unsigned nargs = gimple_phi_num_args (phi_stmt); + for (unsigned i = 0; i < nargs; ++i) + { + if (!visited->add (phi_stmt)) + { + tree arg = gimple_phi_arg_def (phi_stmt, i); + if (is_addr_local (arg, ploc, pmaybe, visited)) + ++count; + } + } + + *pmaybe = count && count < nargs; + return count != 0; + } +} + + return false; +} Is there some reason you didn't query the alias oracle here? It would seem a fairly natural fit. Ultimately given a pointer (which will be an SSA_NAME) you want to ask whether or not it conclusively points into the stack. That would seem to dramatically simplify is_addr_local. I did think about it but decided against changing the existing design (iterating over PHI arguments), for a couple of reasons: 1) It feels like a bigger change than
Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
On 5/22/19 3:34 PM, Martin Sebor wrote: > -Wreturn-local-addr detects a subset of instances of returning > the address of a local object from a function but the warning > doesn't try to handle alloca or VLAs, or some non-trivial cases > of ordinary automatic variables[1]. > > The attached patch extends the implementation of the warning to > detect those. It still doesn't detect instances where the address > is the result of a built-in such strcpy[2]. > > Tested on x86_64-linux. > > Martin > > [1] For example, this is only diagnosed with the patch: > > void* f (int i) > { > struct S { int a[2]; } s[2]; > return >a[i]; > } > > [2] The following is not diagnosed even with the patch: > > void sink (void*); > > void* f (int i) > { > char a[6]; > char *p = __builtin_strcpy (a, "123"); > sink (p); > return p; > } > > I would expect detecting to be possible and useful. Maybe as > a follow-up. > > gcc-71924.diff > > PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result > PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address > of a local array plus offset > > gcc/ChangeLog: > > PR c/71924 > * gimple-ssa-isolate-paths.c (is_addr_local): New function. > (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. > (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. > (find_explicit_erroneous_behavior): Call warn_return_addr_local. > > gcc/testsuite/ChangeLog: > > PR c/71924 > * gcc.dg/Wreturn-local-addr-2.c: New test. > * gcc.dg/Walloca-4.c: Prune expected warnings. > * gcc.dg/pr41551.c: Same. > * gcc.dg/pr59523.c: Same. > * gcc.dg/tree-ssa/pr88775-2.c: Same. > * gcc.dg/winline-7.c: Same. > > diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c > index 33fe352bb23..2933ecf502e 100644 > --- a/gcc/gimple-ssa-isolate-paths.c > +++ b/gcc/gimple-ssa-isolate-paths.c > @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) >return false; > } > > +/* Return true if EXPR is a expression of pointer type that refers > + to the address of a variable with automatic storage duration. > + If so, set *PLOC to the location of the object or the call that > + allocated it (for alloca and VLAs). When PMAYBE is non-null, > + also consider PHI statements and set *PMAYBE when some but not > + all arguments of such statements refer to local variables, and > + to clear it otherwise. */ > + > +static bool > +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, > +hash_set *visited = NULL) > +{ > + if (TREE_CODE (exp) == ADDR_EXPR) > +{ > + tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); > + if (TREE_CODE (baseaddr) == MEM_REF) > + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, > visited); > + > + if ((!VAR_P (baseaddr) > +|| is_global_var (baseaddr)) > + && TREE_CODE (baseaddr) != PARM_DECL) > + return false; > + > + *ploc = DECL_SOURCE_LOCATION (baseaddr); > + return true; > +} > + > + if (TREE_CODE (exp) == SSA_NAME) > +{ > + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); > + enum gimple_code code = gimple_code (def_stmt); > + > + if (is_gimple_assign (def_stmt)) > + { > + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); > + if (POINTER_TYPE_P (type)) > + { > + tree ptr = gimple_assign_rhs1 (def_stmt); > + return is_addr_local (ptr, ploc, pmaybe, visited); > + } > + return false; > + } > + > + if (code == GIMPLE_CALL > + && gimple_call_builtin_p (def_stmt)) > + { > + tree fn = gimple_call_fndecl (def_stmt); > + int code = DECL_FUNCTION_CODE (fn); > + if (code != BUILT_IN_ALLOCA > + && code != BUILT_IN_ALLOCA_WITH_ALIGN) > + return false; > + > + *ploc = gimple_location (def_stmt); > + return true; > + } > + > + if (code == GIMPLE_PHI && pmaybe) > + { > + unsigned count = 0; > + gphi *phi_stmt = as_a (def_stmt); > + > + unsigned nargs = gimple_phi_num_args (phi_stmt); > + for (unsigned i = 0; i < nargs; ++i) > + { > + if (!visited->add (phi_stmt)) > + { > + tree arg = gimple_phi_arg_def (phi_stmt, i); > + if (is_addr_local (arg, ploc, pmaybe, visited)) > + ++count; > + } > + } > + > + *pmaybe = count && count < nargs; > + return count != 0; > + } > +} > + > + return false; > +} Is there some reason you didn't query the alias oracle here? It would seem a fairly natural fit. Ultimately given a pointer (which will be an SSA_NAME) you want to ask whether or not it conclusively points into the stack. That would seem to dramatically simplify is_addr_local. The rest looks pretty
[PATCH] warn on returning alloca and VLA (PR 71924, 90549)
-Wreturn-local-addr detects a subset of instances of returning the address of a local object from a function but the warning doesn't try to handle alloca or VLAs, or some non-trivial cases of ordinary automatic variables[1]. The attached patch extends the implementation of the warning to detect those. It still doesn't detect instances where the address is the result of a built-in such strcpy[2]. Tested on x86_64-linux. Martin [1] For example, this is only diagnosed with the patch: void* f (int i) { struct S { int a[2]; } s[2]; return >a[i]; } [2] The following is not diagnosed even with the patch: void sink (void*); void* f (int i) { char a[6]; char *p = __builtin_strcpy (a, "123"); sink (p); return p; } I would expect detecting to be possible and useful. Maybe as a follow-up. PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset gcc/ChangeLog: PR c/71924 * gimple-ssa-isolate-paths.c (is_addr_local): New function. (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. (find_explicit_erroneous_behavior): Call warn_return_addr_local. gcc/testsuite/ChangeLog: PR c/71924 * gcc.dg/Wreturn-local-addr-2.c: New test. * gcc.dg/Walloca-4.c: Prune expected warnings. * gcc.dg/pr41551.c: Same. * gcc.dg/pr59523.c: Same. * gcc.dg/tree-ssa/pr88775-2.c: Same. * gcc.dg/winline-7.c: Same. diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index 33fe352bb23..2933ecf502e 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) return false; } +/* Return true if EXPR is a expression of pointer type that refers + to the address of a variable with automatic storage duration. + If so, set *PLOC to the location of the object or the call that + allocated it (for alloca and VLAs). When PMAYBE is non-null, + also consider PHI statements and set *PMAYBE when some but not + all arguments of such statements refer to local variables, and + to clear it otherwise. */ + +static bool +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, + hash_set *visited = NULL) +{ + if (TREE_CODE (exp) == ADDR_EXPR) +{ + tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); + if (TREE_CODE (baseaddr) == MEM_REF) + return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, visited); + + if ((!VAR_P (baseaddr) + || is_global_var (baseaddr)) + && TREE_CODE (baseaddr) != PARM_DECL) + return false; + + *ploc = DECL_SOURCE_LOCATION (baseaddr); + return true; +} + + if (TREE_CODE (exp) == SSA_NAME) +{ + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); + enum gimple_code code = gimple_code (def_stmt); + + if (is_gimple_assign (def_stmt)) + { + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); + if (POINTER_TYPE_P (type)) + { + tree ptr = gimple_assign_rhs1 (def_stmt); + return is_addr_local (ptr, ploc, pmaybe, visited); + } + return false; + } + + if (code == GIMPLE_CALL + && gimple_call_builtin_p (def_stmt)) + { + tree fn = gimple_call_fndecl (def_stmt); + int code = DECL_FUNCTION_CODE (fn); + if (code != BUILT_IN_ALLOCA + && code != BUILT_IN_ALLOCA_WITH_ALIGN) + return false; + + *ploc = gimple_location (def_stmt); + return true; + } + + if (code == GIMPLE_PHI && pmaybe) + { + unsigned count = 0; + gphi *phi_stmt = as_a (def_stmt); + + unsigned nargs = gimple_phi_num_args (phi_stmt); + for (unsigned i = 0; i < nargs; ++i) + { + if (!visited->add (phi_stmt)) + { + tree arg = gimple_phi_arg_def (phi_stmt, i); + if (is_addr_local (arg, ploc, pmaybe, visited)) + ++count; + } + } + + *pmaybe = count && count < nargs; + return count != 0; + } +} + + return false; +} + +/* Detect and diagnose returning the address of a local variable in + a PHI result LHS and argument OP and PHI edge E in basic block BB. */ + +static basic_block +warn_return_addr_local_phi_arg (basic_block bb, basic_block duplicate, +tree lhs, tree op, edge e) +{ + location_t origin; + if (!is_addr_local (op, )) +return NULL; + + gimple *use_stmt; + imm_use_iterator iter; + + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) +{ + greturn *return_stmt = dyn_cast (use_stmt); + if (!return_stmt) + continue; + + if (gimple_return_retval (return_stmt) != lhs) + continue; + + { + auto_diagnostic_group d; + if (warning_at (gimple_location (use_stmt), + OPT_Wreturn_local_addr, + "function may return address " + "of local variable") + && origin != UNKNOWN_LOCATION) + inform (origin, "declared here"); + } + + if ((flag_isolate_erroneous_paths_dereference +