Hi, On Fri, Nov 15 2019, Feng Xue OS wrote: > Honza, > > I made some changes: do not penalize self-recursive function, and > add --param ipa-cp-min-recursive-probability, similar to recursive > inline. Please review this new one.
The patch and its effect on exchange is intriguing, I only have a few comments, see below, otherwise I am happy about it. > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index fe8daf40888..c84f4d73ed6 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,23 @@ > +2019-11-15 Feng Xue <f...@os.amperecomputing.com> > + > + PR ipa/92133 > + * doc/invoke.texi (ipa-cp-max-recursive-depth): Document new option. > + (ipa-cp-min-recursive-probability): Likewise. > + * params.opt (ipa-cp-max-recursive-depth): New. > + (ipa-cp-min-recursive-probability): Likewise. > + * ipa-cp.c (ipcp_lattice<valtype>::add_value): Add two new parameters > + val_pos_p and unlimited. > + (self_recursively_generated_p): New function. > + (get_val_across_arith_op): Likewise. > + (propagate_vals_across_arith_jfunc): Add constant propagation for > + self-recursive function. > + (incorporate_penalties): Do not penalize pure self-recursive function. > + (good_cloning_opportunity_p): Dump node_is_self_scc flag. > + (propagate_constants_topo): Set node_is_self_scc flag for cgraph node. > + (get_info_about_necessary_edges): Relax hotness check for edge to > + self-recursive function. > + * ipa-prop.h (ipa_node_params): Add new field node_is_self_scc. > + The least important comment: Thanks for providing the ChangeLog but sending ChangeLog in the patch itself makes it difficult for people to apply the patch because the ChangeLog hunks will never apply cleanly. That's why people send them in the email body when they email gcc-patches. Hopefully we'll be putting ChangeLogs only into the commit message soon and all of this will be a non-issue. > 2019-11-15 Feng Xue <f...@os.amperecomputing.com> > > PR ipa/92528 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index fe79ca2247a..c30adfd31c2 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -12045,6 +12045,13 @@ IPA-CP calculates its own score of cloning > profitability heuristics > and performs those cloning opportunities with scores that exceed > @option{ipa-cp-eval-threshold}. > > +@item ipa-cp-max-recursive-depth > +Maximum depth of recursive cloning for self-recursive function. > + > +@item ipa-cp-min-recursive-probability > +Recursive cloning only when the probability of call being executed exceeds > +the parameter. > + > @item ipa-cp-recursion-penalty > Percentage penalty the recursive functions will receive when they > are evaluated for cloning. > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index 8372dfaa771..bbf508bca6c 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -228,7 +228,9 @@ public: > inline bool set_contains_variable (); > bool add_value (valtype newval, cgraph_edge *cs, > ipcp_value<valtype> *src_val = NULL, > - int src_idx = 0, HOST_WIDE_INT offset = -1); > + int src_idx = 0, HOST_WIDE_INT offset = -1, > + ipcp_value<valtype> **val_pos_p = NULL, > + bool unlimited = false); > void print (FILE * f, bool dump_sources, bool dump_benefits); > }; > > @@ -1817,22 +1819,37 @@ allocate_and_init_ipcp_value > (ipa_polymorphic_call_context source) > /* Try to add NEWVAL to LAT, potentially creating a new ipcp_value for it. > CS, > SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same > meaning. OFFSET -1 means the source is scalar and not a part of an > - aggregate. */ > + aggregate. If non-NULL, VAL_POS_P specifies position in value list, > + after which newly created ipcp_value will be inserted, and it is also > + used to record address of the added ipcp_value before function returns. > + UNLIMITED means whether value count should not exceed the limit given > + by PARAM_IPA_CP_VALUE_LIST_SIZE. */ > > template <typename valtype> > bool > ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs, > ipcp_value<valtype> *src_val, > - int src_idx, HOST_WIDE_INT offset) > + int src_idx, HOST_WIDE_INT offset, > + ipcp_value<valtype> **val_pos_p, > + bool unlimited) > { > ipcp_value<valtype> *val; > > + if (val_pos_p) > + { > + for (val = values; val && val != *val_pos_p; val = val->next); Please put empty statements on a separate line (I think there is one more in the patch IIRC). > + gcc_checking_assert (val); > + } > + > if (bottom) > return false; > > for (val = values; val; val = val->next) > if (values_equal_for_ipcp_p (val->value, newval)) > { > + if (val_pos_p) > + *val_pos_p = val; > + > if (ipa_edge_within_scc (cs)) > { > ipcp_value_source<valtype> *s; > @@ -1847,7 +1864,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > return false; > } > > - if (values_count == param_ipa_cp_value_list_size) > + if (!unlimited && values_count == param_ipa_cp_value_list_size) > { > /* We can only free sources, not the values themselves, because sources > of other values in this SCC might point to them. */ > @@ -1861,6 +1878,9 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > } > } > > + if (val_pos_p) > + *val_pos_p = NULL; > + > values = NULL; > return set_to_bottom (); > } > @@ -1868,11 +1888,74 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > values_count++; > val = allocate_and_init_ipcp_value (newval); > val->add_source (cs, src_val, src_idx, offset); > - val->next = values; > - values = val; > + if (val_pos_p) > + { > + val->next = (*val_pos_p)->next; > + (*val_pos_p)->next = val; > + *val_pos_p = val; > + } I must say I am not a big fan of the val_pos_p parameter. Would simply putting new values always at the end of the list work to reduce the iterations? At this point the function has traversed all the values already when searching if it is present, so it can remember the last one and just add stuff after it. But if I'm missing something clever I'm not going to oppose it. > + else > + { > + val->next = values; > + values = val; > + } > + > + return true; > +} > + > +/* Return true, if a ipcp_value VAL is orginated from parameter value of > + self-feeding recursive function by applying non-passthrough arithmetic > + transformation. */ > + > +static bool > +self_recursively_generated_p (ipcp_value<tree> *val) > +{ > + class ipa_node_params *info = NULL; > + > + for (ipcp_value_source<tree> *src = val->sources; src; src = src->next) > + { > + cgraph_edge *cs = src->cs; > + > + if (!src->val || cs->caller != cs->callee->function_symbol () > + || src->val == val) > + return false; I'd suggest putting the condition calling function_symbol last. > + > + if (!info) > + info = IPA_NODE_REF (cs->caller); > + > + class ipcp_param_lattices *plats = ipa_get_parm_lattices (info, > + src->index); > + ipcp_lattice<tree> *src_lat = src->offset == -1 ? &plats->itself > + : plats->aggs; > + ipcp_value<tree> *src_val; > + > + for (src_val = src_lat->values; src_val && src_val != val; > + src_val = src_val->next); I think that GNU code style dictates that when a for does not fit on a single line, the three bits have to be on a separate line each. Plus please put the empty statement on its own line too. > + > + if (!src_val) > + return false; > + } > + > return true; > } > > +static tree > +get_val_across_arith_op (enum tree_code opcode, > + tree opnd1_type, > + tree opnd2, > + ipcp_value<tree> *src_val, > + tree res_type) The function should get at least a brief comment. > +{ > + tree opnd1 = src_val->value; > + > + /* Skip source values that is incompatible with specified type. */ > + if (opnd1_type > + && !useless_type_conversion_p (opnd1_type, TREE_TYPE (opnd1))) > + return NULL_TREE; > + > + return ipa_get_jf_arith_result (opcode, opnd1, opnd2, res_type); > +} > + > /* Propagate values through an arithmetic transformation described by a jump > function associated with edge CS, taking values from SRC_LAT and putting > them into DEST_LAT. OPND1_TYPE is expected type for the values in > SRC_LAT. > @@ -1896,24 +1979,74 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs, > ipcp_value<tree> *src_val; > bool ret = false; > > - /* Do not create new values when propagating within an SCC because if there > - are arithmetic functions with circular dependencies, there is infinite > - number of them and we would just make lattices bottom. If this > condition > - is ever relaxed we have to detect self-feeding recursive calls in > - cgraph_edge_brings_value_p in a smarter way. */ > + /* Due to circular dependencies, propagating within an SCC through > arithmetic > + transformation would create infinite number of values. But for > + self-feeding recursive function, we could allow propagation in a limited > + count, and this can enable a simple kind of recursive function > versioning. > + For other scenario, we would just make lattices bottom. */ > if (opcode != NOP_EXPR && ipa_edge_within_scc (cs)) > - ret = dest_lat->set_contains_variable (); > + { > + int i; > + > + if (src_lat != dest_lat || param_ipa_cp_max_recursive_depth < 1) > + return dest_lat->set_contains_variable (); > + > + /* No benefit if recursive execution is in low probability. */ > + if (cs->sreal_frequency () * 100 > + <= ((sreal) 1) * param_ipa_cp_min_recursive_probability) > + return dest_lat->set_contains_variable (); > + > + auto_vec<ipcp_value<tree> *, 8> val_seeds; > + > + for (src_val = src_lat->values; src_val; src_val = src_val->next) > + { > + /* Now we do not use self-recursively generated value as propagation > + source, this is absolutely conservative, but could avoid explosion > + of lattice's value space, especially when one recursive function > + calls another recursive. */ > + if (self_recursively_generated_p (src_val)) > + { > + /* If the lattice has already been propagated for the call site, > + no need to do that again. */ > + for (ipcp_value_source<tree> *s = src_val->sources; s; > + s = s->next) I'm afraid you'll have to reformat this for loop too. > + if (s->cs == cs) > + return dest_lat->set_contains_variable (); > + } > + else > + val_seeds.safe_push (src_val); > + } > + > + /* Recursively generate lattice values with a limited count. */ > + FOR_EACH_VEC_ELT (val_seeds, i, src_val) > + { > + for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++) > + { > + tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2, > + src_val, res_type); > + if (!cstval) > + break; > + > + /* Try to place the new lattice value after its source, which > + can decrease iterations of propagate stage. */ > + ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, > + src_offset, &src_val, true); > + gcc_checking_assert (src_val); > + } > + } > + ret |= dest_lat->set_contains_variable (); > + } > else > for (src_val = src_lat->values; src_val; src_val = src_val->next) > { > - tree opnd1 = src_val->value; > - tree cstval = NULL_TREE; > - > - /* Skip source values that is incompatible with specified type. */ > - if (!opnd1_type > - || useless_type_conversion_p (opnd1_type, TREE_TYPE (opnd1))) > - cstval = ipa_get_jf_arith_result (opcode, opnd1, opnd2, res_type); > + /* Now we do not use self-recursively generated value as propagation > + source, otherwise it is easy to make value space of normal lattice > + overflow. */ > + if (self_recursively_generated_p (src_val)) > + continue; I think you are missing a necessary call to dest_lat->set_contains_variable() here. Either call it or initialize cstval to zero and call get_val_across_arith_op only when self_recursively_generated_p returns false; > > + tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2, > + src_val, res_type); > if (cstval) > ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, > src_offset); > @@ -3032,7 +3165,7 @@ hint_time_bonus (ipa_hints hints) > static inline int64_t > incorporate_penalties (ipa_node_params *info, int64_t evaluation) > { > - if (info->node_within_scc) > + if (info->node_within_scc && !info->node_is_self_scc) > evaluation = (evaluation > * (100 - param_ipa_cp_recursion_penalty)) / 100; > The rest looks good to me, thanks for the patch, I hope to see it in the trunk soon. I'll be looking into IPA-CP tuning shortly too. If you already have any ideas, I'd be interested in hearing them. Thanks, Martin