On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> On 05/18/2014 08:45 PM, Sandra Loosemore wrote: >> >On 05/18/2014 02:59 PM, Jan Hubicka wrote: >> >>For cases like local-statics-7 your approach can be "saved" by adding >> >>simple IPA analysis >> >>to look for static vars that are used only by one function and keeping >> >>your DSE code active >> >>for them, so we can still get rid of this special case assignments >> >>during late compilation. >> >>I am however not quite convinced it is worth the effort - do you have >> >>some real world >> >>cases where it helps? >> > >> >Um, the well-known benchmark. ;-) >> >> I looked at the generated code for this benchmark and see that your >> patch is indeed not getting rid of all the pointless static >> variables, while ours is, so this is quite disappointing. I'm >> thinking now that we will still need to retain our patch at least >> locally, although we'd really like to get it on trunk if possible. > > Yours patch can really be improved by adding simple IPA analysis to work > out what variables are refernced by one function only instead of using > not-longer-that-relevant local static info. > As last IPA pass for each static variable with no address taken, look at all > references and see if they all come from one function or functions inlined to > it. >> >> Here is another testcase vaguely based on the same kind of >> signal-processing algorithm as the benchmark, but coded without >> reference to it. I have arm-none-eabi builds around for both 4.9.0 >> with our remove_local_statics patch applied, and trunk with your >> patch. With -O2, our optimization produces 23 instructions and gets >> rid of all 3 statics, yours produces 33 and only gets rid of 1 of >> them. >> >> Of course it's lame to use static variables in this way, but OTOH >> it's lame if GCC can't recognize them and optimize them away, too. >> >> -Sandra >> > >> void >> fir (int *coeffs, int coeff_length, int *input, int input_length, int >> *output) >> { >> static int *coeffp; >> static int *inputp; >> static int *outputp; > > Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late. > coeffp is removed late, too. > >> int i, c, acc; >> >> for (i = 0; i < input_length; i++) >> { >> coeffp = coeffs; >> inputp = input + i; >> outputp = output + i; >> acc = 0; >> for (c = 0; c < coeff_length; c++) >> { >> acc += *coeffp * *inputp; >> coeffp++; >> inputp--; >> } > > It seems like AA can not work out the fact that inputp is unchanged until that > late. Richi, any ideas?
Well, AA is perfectly fine - it's just that this boils down to a partial redundancy problem. The stores can be removed by DCE even with Index: gcc/tree-ssa-dce.c =================================================================== --- gcc/tree-ssa-dce.c (revision 210635) +++ gcc/tree-ssa-dce.c (working copy) @@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple break; case GIMPLE_ASSIGN: - if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME - && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt))) - return; - break; + { + tree lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) == SSA_NAME + && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt))) + return; + if (TREE_CODE (lhs) == VAR_DECL + && !TREE_ADDRESSABLE (lhs) + && TREE_STATIC (lhs) + && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs) + /* ??? Make sure lhs is only refered to from cfun->decl. */ + && decl_function_context (lhs) == cfun->decl) + return; + break; + } default: break; but I don't have a good way of checking the ??? prerequesite (even without taking its address the statics may be refered to by a) inline copies, b) ipa-split function parts, c) nested functions). I'm sure IPA references are not kept up-to-date. The last reads go away with PRE at the expense of two additional induction variables. If we know that a static variable does not have its address taken and is only refered to from a single function we can in theory simply rewrite it into SSA form during early opts (before inlining its body), for example by SRA. That isn't even restricted to function-local statics (for statics used in multiple functions but not having address-taken we can do the same with doing function entry / exit load / store). I think that would be a much better IPA enabled optimization than playing games with looking at individual stmts. Richard. > Honza >> *outputp = acc; >> } >> } >> >> >