On Tue, 10 Nov 2020, Jan Hubicka wrote: > > > + tree callee = gimple_call_fndecl (stmt); > > > + if (callee) > > > + { > > > + cgraph_node *node = cgraph_node::get (callee); > > > + modref_summary *summary = node ? get_modref_function_summary (node) > > > + : NULL; > > > + > > > + if (summary && summary->arg_flags.length () > arg) > > > > So could we make modref "transform" push this as fnspec attribute or > > would that not really be an optimization? > > It was my original plan to synthetize fnspecs, but I think it is not > very good idea: we have the summary readily available and we can > represent information that fnspecs can't > (do not have artificial limits on number of parameters or counts) > > I would preffer fnspecs to be used only for in-compiler declarations.
Fine, I was just curious... > > > + > > > +/* Analyze EAF flags for SSA name NAME. > > > + KNOWN_FLAGS is a cache for flags we already determined. > > > + DEPTH is a recursion depth used to make debug output prettier. */ > > > + > > > +static int > > > +analyze_ssa_name_flags (tree name, vec<int> *known_flags, int depth) > > > > C++ has references which makes the access to known_flags nicer ;) > > Yay, will chang that :) > > > > > +{ > > > + imm_use_iterator ui; > > > + gimple *use_stmt; > > > + int flags = EAF_DIRECT | EAF_NOCLOBBER | EAF_NOESCAPE | EAF_UNUSED; > > > + > > > + /* See if value is already computed. */ > > > + if ((*known_flags)[SSA_NAME_VERSION (name)]) > > > + { > > > + /* Punt on cycles for now, so we do not need dataflow. */ > > > + if ((*known_flags)[SSA_NAME_VERSION (name)] == 1) > > > + { > > > + if (dump_file) > > > + fprintf (dump_file, > > > + "%*sGiving up on a cycle in SSA graph\n", depth * 4, ""); > > > + return 0; > > > + } > > > + return (*known_flags)[SSA_NAME_VERSION (name)] - 2; > > > + } > > > + /* Recursion guard. */ > > > + (*known_flags)[SSA_NAME_VERSION (name)] = 1; > > > > This also guards against multiple evaluations of the same stmts > > but only in some cases? Consider > > > > _1 = ..; > > _2 = _1 + _3; > > _4 = _1 + _5; > > _6 = _2 + _4; > > > > where we visit _2 = and _4 = from _1 but from both are going > > to visit _6. > > Here we first push _6, then we go for _2 then for _1 evaluate _1, > evalueate _2, go for _4 and evaluate _4, and evaluate _6. > It is DFS and you need backward edge in DFS (comming from a PHI). Hmm, but then we eventually evaluate _6 twice? > > Cycles seems to somewhat matter for GCC: we do have a lot of functions > that walk linked lists that we could track otherwise. > > > > Maybe I'm blind but you're not limiting depth? Guess that asks > > for problems, esp. as you are recursing rather than using a > > worklist or so? > > > > I see you try to "optimize" the walk by only visiting def->use > > links from parameters but then a RPO walk over all stmts would > > be simpler iteration-wise ... > We usually evaluate just small part of bigger functions (since we lose > track quite easily after hitting first memory store). My plan was to > change this to actual dataflow once we have it well defined > (this means after discussing EAF flags with you and adding the logic to > track callsites for true IPA pass that midly complicated things - for > every ssa name I track callsite/arg pair where it is passed to > either directly or indirectly. Then this is translaed into call summary > and used by IPA pass to compute final flags) > > I guess I can add --param ipa-modref-walk-depth for now and handle > dataflow incremntally? Works for me. > In particular I am not sure if I should just write iterated RPO myself > or use tree-ssa-propagate.h (the second may be overkill). tree-ssa-propagate.h is not to be used, it should DIE ;) I guess you do want to iterate SSA cycles rather than BB cycles so I suggest to re-surrect the SSA SCC discovery from the SCC value-numbering (see tree-ssa-sccvn.c:DFS () on the gcc-8-branch) which is non-recursive and micro-optimized. Could put it somewhere useful (tree-ssa.c?). > > > > > + if (dump_file) > > > + { > > > + fprintf (dump_file, > > > + "%*sAnalyzing flags of ssa name: ", depth * 4, ""); > > > + print_generic_expr (dump_file, name); > > > + fprintf (dump_file, "\n"); > > > + } > > > + > > > + FOR_EACH_IMM_USE_STMT (use_stmt, ui, name) > > > + { > > > + if (flags == 0) > > > + { > > > + BREAK_FROM_IMM_USE_STMT (ui); > > > + } > > > + if (is_gimple_debug (use_stmt)) > > > + continue; > > > + if (dump_file) > > > + { > > > + fprintf (dump_file, "%*s Analyzing stmt:", depth * 4, ""); > > > + print_gimple_stmt (dump_file, use_stmt, 0); > > > + } > > > + > > > + /* Gimple return may load the return value. */ > > > + if (gimple_code (use_stmt) == GIMPLE_RETURN) > > > > if (greturn *ret = dyn_cast <greturn *> (use_stmt)) > > > > makes the as_a below not needed, similar for the other cases (it > > also makes accesses cheaper, avoiding some checking code). > > Looks cleaner indeed. > > > > > + { > > > + if (memory_access_to (gimple_return_retval > > > + (as_a <greturn *> (use_stmt)), name)) > > > + flags &= ~EAF_UNUSED; > > > + } > > > + /* Account for LHS store, arg loads and flags from callee > > > function. */ > > > + else if (is_gimple_call (use_stmt)) > > > + { > > > + tree callee = gimple_call_fndecl (use_stmt); > > > + > > > + /* Recursion would require bit of propagation; give up for now. */ > > > + if (callee && recursive_call_p (current_function_decl, callee)) > > > + flags = 0; > > > + else > > > + { > > > + int ecf_flags = gimple_call_flags (use_stmt); > > > + bool ignore_stores = ignore_stores_p (current_function_decl, > > > + ecf_flags); > > > + > > > + /* Handle *name = func (...). */ > > > + if (gimple_call_lhs (use_stmt) > > > + && memory_access_to (gimple_call_lhs (use_stmt), name)) > > > + flags &= ~(EAF_UNUSED | EAF_NOCLOBBER); > > > + > > > + /* Handle all function parameters. */ > > > + for (unsigned i = 0; i < gimple_call_num_args (use_stmt); i++) > > > + /* Name is directly passed to the callee. */ > > > + if (gimple_call_arg (use_stmt, i) == name) > > > + { > > > + if (ecf_flags & (ECF_CONST | ECF_NOVOPS)) > > > + flags &= ignore_stores > > > + ? 0 > > > + : call_lhs_flags (use_stmt, i, > > > + known_flags, depth); > > > + else > > > + { > > > + int call_flags = gimple_call_arg_flags (as_a <gcall *> > > > + (use_stmt), i); > > > + if (ignore_stores) > > > + call_flags |= EAF_NOCLOBBER | EAF_NOESCAPE; > > > + else > > > + call_flags &= call_lhs_flags (use_stmt, i, > > > + known_flags, depth); > > > + > > > + flags &= call_flags; > > > + } > > > + } > > > + /* Name is dereferenced and passed to a callee. */ > > > + else if (memory_access_to (gimple_call_arg (use_stmt, i), name)) > > > + { > > > + if (ecf_flags & (ECF_CONST | ECF_NOVOPS)) > > > + flags &= ~EAF_UNUSED; > > > + else > > > + flags &= deref_flags (gimple_call_arg_flags > > > + (as_a <gcall *> (use_stmt), i), > > > + ignore_stores); > > > + if (!ignore_stores) > > > + flags &= call_lhs_flags (use_stmt, i, known_flags, > > > + depth); > > > + } > > > > Hmm, you forget gimple_call_chain at least which is at least > > argument passing? Possibly the chain argument is never a function > > parameter but how do you know... (partial inlining?) > > Ah, right. I forgot about chain. > I wil ladd it. > > > > Then there's gimple_call_fn for indirect function calls to a > > function parameter? > > Well, it is never load or store, so I do not really care about it. > for > > void test (int (*foo)()) > { > foo(); > } > > I should be able to derive EAF_UNUSED. > > void test (int (**foo)()) > { > (*foo)(); > } > > I will see sparate load. > > > > I guess it would be nice to amend the gimple_walk_ stuff to have > > a SSA name callback where the tree and the SSA use is passed. > > Well, for another time. > > > > > + } > > > + /* Only NOCLOBBER or DIRECT flags alone are not useful (see comments > > > + in tree-ssa-alias.c). Give up earlier. */ > > > + if ((flags & ~(EAF_DIRECT | EAF_NOCLOBBER)) == 0) > > > + flags = 0; > > > + } > > > + else if (gimple_assign_load_p (use_stmt)) > > > + { > > > + /* Memory to memory copy. */ > > > + if (gimple_store_p (use_stmt)) > > > + { > > > + /* Handle *name = *exp. */ > > > + if (memory_access_to (gimple_assign_lhs (use_stmt), name)) > > > + flags &= ~(EAF_UNUSED | EAF_NOCLOBBER); > > > + > > > + /* Handle *lhs = *name. > > > + > > > + We do not track memory locations, so assume that value > > > + is used arbitrarily. */ > > > + if (memory_access_to (gimple_assign_rhs1 (use_stmt), name)) > > > + flags = 0; > > > + } > > > + /* Handle lhs = *name. */ > > > + else if (memory_access_to (gimple_assign_rhs1 (use_stmt), name)) > > > + flags &= deref_flags (analyze_ssa_name_flags > > > + (gimple_assign_lhs (use_stmt), > > > + known_flags, depth + 1), false); > > > + } > > > + else if (gimple_store_p (use_stmt)) > > > + { > > > + /* Handle *lhs = name. */ > > > + if (is_gimple_assign (use_stmt) > > > + && gimple_assign_rhs1 (use_stmt) == name) > > > + { > > > + if (dump_file) > > > + fprintf (dump_file, "%*s ssa name saved to memory\n", > > > + depth * 4, ""); > > > + flags = 0; > > > + } > > > + /* Handle *name = exp. */ > > > + else if (is_gimple_assign (use_stmt) > > > + && memory_access_to (gimple_assign_lhs (use_stmt), name)) > > > + flags &= ~(EAF_UNUSED | EAF_NOCLOBBER); > > > + /* ASM statements etc. */ > > > + else if (!is_gimple_assign (use_stmt)) > > > + { > > > + if (dump_file) > > > + fprintf (dump_file, "%*s Unhandled store\n", > > > + depth * 4, ""); > > > + flags = 0; > > > + } > > > + } > > > + else if (is_gimple_assign (use_stmt)) > > > + { > > > + enum tree_code code = gimple_assign_rhs_code (use_stmt); > > > + > > > + /* See if operation is a merge as considered by > > > + tree-ssa-structalias.c:find_func_aliases. */ > > > + if (!truth_value_p (code) > > > + && code != POINTER_DIFF_EXPR > > > + && (code != POINTER_PLUS_EXPR > > > + || gimple_assign_rhs1 (use_stmt) == name)) > > > + flags &= analyze_ssa_name_flags > > > + (gimple_assign_lhs (use_stmt), known_flags, > > > + depth + 1); > > > + } > > > + else if (gimple_code (use_stmt) == GIMPLE_PHI) > > > + { > > > + flags &= analyze_ssa_name_flags > > > + (gimple_phi_result (use_stmt), known_flags, > > > + depth + 1); > > > + } > > > + /* Conditions are not considered escape points > > > + by tree-ssa-structalias. */ > > > + else if (gimple_code (use_stmt) == GIMPLE_COND) > > > + ; > > > + else > > > + { > > > + if (dump_file) > > > + fprintf (dump_file, "%*s Unhandled stmt\n", depth * 4, ""); > > > + flags = 0; > > > + } > > > + > > > + if (dump_file) > > > + { > > > + fprintf (dump_file, "%*s current flags of ", depth * 4, ""); > > > + print_generic_expr (dump_file, name); > > > + dump_eaf_flags (dump_file, flags); > > > + } > > > + } > > > + if (dump_file) > > > + { > > > + fprintf (dump_file, "%*sflags of ssa name ", depth * 4, ""); > > > + print_generic_expr (dump_file, name); > > > + dump_eaf_flags (dump_file, flags); > > > + } > > > + (*known_flags)[SSA_NAME_VERSION (name)] = flags + 2; > > > + return flags; > > > +} > > > + > > > +/* Determine EAF flags for function parameters. */ > > > + > > > +static void > > > +analyze_parms (modref_summary *summary) > > > +{ > > > + unsigned int parm_index = 0; > > > + unsigned int count = 0; > > > + > > > + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; > > > + parm = TREE_CHAIN (parm)) > > > + count++; > > > + > > > + if (!count) > > > + return; > > > + > > > + auto_vec<int> known_flags; > > > + known_flags.safe_grow_cleared (num_ssa_names); > > > > , true for exact reserve. Could be space optimized by not using > > auto_vec<int> but auto_vec<usigned char>? > > I think there is not that much memory wasted, but indeed chars will be > more effective. > > > > > + > > > + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; > > > parm_index++, > > > + parm = TREE_CHAIN (parm)) > > > + { > > > + tree name = ssa_default_def (cfun, parm); > > > + if (!name) > > > + continue; > > > > looks like the vec might be quite sparse ... > > > > > + int flags = analyze_ssa_name_flags (name, &known_flags, 0); > > > + > > > + if (flags) > > > + { > > > + if (parm_index >= summary->arg_flags.length ()) > > > + summary->arg_flags.safe_grow_cleared (count); > > > > you want , true for exact reserve. > > > > > + summary->arg_flags[parm_index] = flags; > > > > maybe this as well - does it make sense to skip !is_gimple_reg_type > > params in the counting? Guess it makes lookup too complicated. > > But we do have testcases with >30000 parameters ... > > Well, the index needs to be actual index all call argument. > As said, i did not see any noticeable modref memory use increase at WPA > (I watched) since it is at most 1 byte for parameter in case something > got tracked. > > Thanks for review! > > Honza > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend