[PATCH 2/2] ipa-cp: One more use of ipa_vr_supported_type_p
Hi, Since we have the predicate, this patch converts one more check for essentially the same thing into its use. It has passed a bootstrap and testsuite on x86_64. I believe it is obvious enough that I can commit it myself and so will do so later today. Thanks, Martin 2024-09-11 Martin Jambor * gcc/ipa-cp.cc (propagate_vr_across_jump_function): Use ipa_vr_supported_type_p instead of explicit check for integral and pointer types. --- gcc/ipa-cp.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index a1033b81aef..fa7bd6a15da 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -2519,8 +2519,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, ipa_jump_func *jfunc, return false; if (!param_type - || (!INTEGRAL_TYPE_P (param_type) - && !POINTER_TYPE_P (param_type))) + || !ipa_vr_supported_type_p (param_type)) return dest_lat->set_to_bottom (); if (jfunc->type == IPA_JF_PASS_THROUGH) -- 2.46.0
[PATCH 1/2] ipa: Rename ipa_supports_p to ipa_vr_supported_type_p
Hi, ipa_supports_p is not a name that captures well what the predicate determines. Therefore, this patch renames it to ipa_vr_supported_type_p. This change has been pre-approved by Honza and has passed bootstrap and test-suite on x86_64 and so I will push it to master later today. Thanks, Martin gcc/ChangeLog: 2024-09-06 Martin Jambor * ipa-cp.h (ipa_supports_p): Rename to ipa_vr_supported_type_p. * ipa-cp.cc (ipa_vr_operation_and_type_effects): Adjust called function name. (propagate_vr_across_jump_function): Likewise. * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Likewise. (ipcp_get_parm_bits): Likewise. --- gcc/ipa-cp.cc | 5 +++-- gcc/ipa-cp.h| 2 +- gcc/ipa-prop.cc | 6 +++--- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 56468dc40ee..a1033b81aef 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -1649,7 +1649,8 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr, enum tree_code operation, tree dst_type, tree src_type) { - if (!ipa_supports_p (dst_type) || !ipa_supports_p (src_type)) + if (!ipa_vr_supported_type_p (dst_type) + || !ipa_vr_supported_type_p (src_type)) return false; range_op_handler handler (operation); @@ -2553,7 +2554,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, ipa_jump_func *jfunc, ipa_range_set_and_normalize (op_vr, op); if (!handler - || !ipa_supports_p (operand_type) + || !ipa_vr_supported_type_p (operand_type) /* Sometimes we try to fold comparison operators using a pointer type to hold the result instead of a boolean type. Avoid trapping in the sanity check in diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h index 4616c61625a..ba2ebfede63 100644 --- a/gcc/ipa-cp.h +++ b/gcc/ipa-cp.h @@ -294,7 +294,7 @@ bool values_equal_for_ipcp_p (tree x, tree y); /* Return TRUE if IPA supports ranges of TYPE. */ static inline bool -ipa_supports_p (tree type) +ipa_vr_supported_type_p (tree type) { return irange::supports_p (type) || prange::supports_p (type); } diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index 99ebd6229ec..78d1fb7086d 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -2392,8 +2392,8 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, else { if (param_type - && ipa_supports_p (TREE_TYPE (arg)) - && ipa_supports_p (param_type) + && ipa_vr_supported_type_p (TREE_TYPE (arg)) + && ipa_vr_supported_type_p (param_type) && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt) && !vr.undefined_p ()) { @@ -5761,7 +5761,7 @@ ipcp_get_parm_bits (tree parm, tree *value, widest_int *mask) ipcp_transformation *ts = ipcp_get_transformation_summary (cnode); if (!ts || vec_safe_length (ts->m_vr) == 0 - || !ipa_supports_p (TREE_TYPE (parm))) + || !ipa_vr_supported_type_p (TREE_TYPE (parm))) return false; int i = ts->get_param_index (current_function_decl, parm); -- 2.46.0
Re: PING^5 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]
Hi, On Fri, Aug 09 2024, Kewen.Lin wrote: > Hi, > > Gentle ping this patch: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html I'd like to second this ping, please. Thank you, Martin > > BR, > Kewen > >>> on 2024/7/12 00:15, Martin Jambor wrote: >>>> Hi, >>>> >>>> can I add myself to the bunch of people who are pinging this? >>>> Having >>>> this in will make our life easier. >>>> >>>> Thanks a lot, >>>> >>>> Martin >>>> >>>> >>>> On Wed, May 08 2024, Kewen.Lin wrote: >>>>> Hi, >>>>> >>>>> As the discussion in PR112980, although the current >>>>> implementation for -fpatchable-function-entry* conforms >>>>> with the documentation (making N NOPs be consecutive), >>>>> it's inefficient for both kernel and userspace livepatching >>>>> (see comments in PR for the details). >>>>> >>>>> So this patch is to change the current implementation by >>>>> emitting the "before" NOPs before global entry point and >>>>> the "after" NOPs after local entry point. The new behavior >>>>> would not keep NOPs to be consecutive, so the documentation >>>>> is updated to emphasize this. >>>>> >>>>> Bootstrapped and regress-tested on powerpc64-linux-gnu >>>>> P8/P9 and powerpc64le-linux-gnu P9 and P10. >>>>> >>>>> Is it ok for trunk? And backporting to active branches >>>>> after burn-in time? I guess we should also mention this >>>>> change in changes.html? >>>>> >>>>> BR, >>>>> Kewen >>>>> - >>>>> PR target/112980 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/rs6000/rs6000-logue.cc >>>>> (rs6000_output_function_prologue): >>>>> Adjust the handling on patch area emitting with dual >>>>> entry, remove >>>>> the restriction on "before" NOPs count, not emit >>>>> "before" NOPs any >>>>> more but only emit "after" NOPs. >>>>> * config/rs6000/rs6000.cc >>>>> (rs6000_print_patchable_function_entry): >>>>> Adjust by respecting cfun->machine- >>>>>> stop_patch_area_print. >>>>> (rs6000_elf_declare_function_name): For ELFv2 with dual >>>>> entry, set >>>>> cfun->machine->stop_patch_area_print as true. >>>>> * config/rs6000/rs6000.h (struct machine_function): >>>>> Remove member >>>>> global_entry_emitted, add new member >>>>> stop_patch_area_print. >>>>> * doc/invoke.texi (option -fpatchable-function-entry): >>>>> Adjust the >>>>> documentation for PowerPC ELFv2 dual entry. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * c-c++-common/patchable_function_entry-default.c: >>>>> Adjust. >>>>> * gcc.target/powerpc/pr99888-4.c: Likewise. >>>>> * gcc.target/powerpc/pr99888-5.c: Likewise. >>>>> * gcc.target/powerpc/pr99888-6.c: Likewise. >>>>> --- >>>>> gcc/config/rs6000/rs6000-logue.cc | 40 + >>>>> -- >>>>> gcc/config/rs6000/rs6000.cc | 15 +-- >>>>> gcc/config/rs6000/rs6000.h | 10 +++-- >>>>> gcc/doc/invoke.texi | 8 ++-- >>>>> .../patchable_function_entry-default.c | 3 -- >>>>> gcc/testsuite/gcc.target/powerpc/pr99888-4.c | 4 +- >>>>> gcc/testsuite/gcc.target/powerpc/pr99888-5.c | 4 +- >>>>> gcc/testsuite/gcc.target/powerpc/pr99888-6.c | 4 +- >>>>> 8 files changed, 33 insertions(+), 55 deletions(-) >>>>> >>>>> diff --git a/gcc/config/rs6000/rs6000-logue.cc >>>>> b/gcc/config/rs6000/rs6000-logue.cc >>>>> index 60ba15a8bc3..0eb019b44b3 100644 >>>>> --- a/gcc/config/rs6000/rs6000-logue.cc >>>>> +++ b/gcc/config/rs6000/rs6000-logue.cc >>>>> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE >>>>> *file) >>>>> fprintf (file, "\tadd 2,2,12\n
Re: [PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra
Hello, and ping please. Martin On Fri, Aug 09 2024, Martin Jambor wrote: > Hello, > > and ping please. > > Martin > > On Fri, Jul 26 2024, Martin Jambor wrote: >> Hi, >> >> when looking at PR 115815 we realized that it would make sense to make >> calls to functions originally declared static constructors and >> destructors created by pass_ipa_cdtor_merge visible to IPA-SRA. This >> patch does that. >> >> Bootstrapped and tested on x86_64-linux. OK for master? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2024-07-25 Martin Jambor >> >> * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and >> pass_ipa_sra. >> --- >> gcc/passes.def | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gcc/passes.def b/gcc/passes.def >> index b06d6d45f63..33b2c10c9c9 100644 >> --- a/gcc/passes.def >> +++ b/gcc/passes.def >> @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3. If not see >>NEXT_PASS (pass_ipa_profile); >>NEXT_PASS (pass_ipa_icf); >>NEXT_PASS (pass_ipa_devirt); >> + NEXT_PASS (pass_ipa_cdtor_merge); >>NEXT_PASS (pass_ipa_cp); >>NEXT_PASS (pass_ipa_sra); >> - NEXT_PASS (pass_ipa_cdtor_merge); >>NEXT_PASS (pass_ipa_fn_summary); >>NEXT_PASS (pass_ipa_inline); >>NEXT_PASS (pass_ipa_pure_const); >> -- >> 2.45.2
Re: [PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)
Hello, and ping please. Martin On Fri, Aug 09 2024, Martin Jambor wrote: > Hello, > > and ping please. > > Martin > > On Fri, Jul 26 2024, Martin Jambor wrote: >> Hi, >> >> in PR 115815, IPA-SRA thought it had control over all invocations of a >> (recursive) static destructor but it did not see the implied >> invocation which led to the original being left behind and the >> clean-up code encountering uses of SSAs that definitely should have >> been dead. >> >> Fixed by teaching cgraph_node::can_be_local_p about static >> constructors and destructors. Similar test is missing in >> cgraph_node::local_p so I added the check there as well. >> >> Bootstrapped and tested on x86_64-linux. OK for master and after a >> while to gcc14 and gcc13 release branches? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2024-07-25 Martin Jambor >> >> PR ipa/115815 >> * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check >> DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR. >> * ipa-visibility.cc (non_local_p): Likewise. >> (cgraph_node::local_p): Delete extraneous line of tabs. >> >> gcc/testsuite/ChangeLog: >> >> 2024-07-25 Martin Jambor >> >> PR ipa/115815 >> * gcc.dg/lto/pr115815_0.c: New test. >> --- >> gcc/cgraph.cc | 4 +++- >> gcc/ipa-visibility.cc | 5 +++-- >> gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++ >> 3 files changed, 24 insertions(+), 3 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c >> >> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc >> index 473d8410bc9..39a3adbc7c3 100644 >> --- a/gcc/cgraph.cc >> +++ b/gcc/cgraph.cc >> @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, >> void *) >> && !node->forced_by_abi >> && !node->used_from_object_file_p () >> && !node->same_comdat_group) >> - || !node->externally_visible)); >> + || !node->externally_visible) >> + && !DECL_STATIC_CONSTRUCTOR (node->decl) >> + && !DECL_STATIC_DESTRUCTOR (node->decl)); >> } >> >> /* Return true if cgraph_node can be made local for API change. >> diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc >> index 501d3c304aa..21f0c47f388 100644 >> --- a/gcc/ipa-visibility.cc >> +++ b/gcc/ipa-visibility.cc >> @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data >> ATTRIBUTE_UNUSED) >> && !node->externally_visible >> && !node->used_from_other_partition >> && !node->in_other_partition >> - && node->get_availability () >= AVAIL_AVAILABLE); >> + && node->get_availability () >= AVAIL_AVAILABLE >> + && !DECL_STATIC_CONSTRUCTOR (node->decl) >> + && !DECL_STATIC_DESTRUCTOR (node->decl)); >> } >> >> /* Return true when function can be marked local. */ >> @@ -116,7 +118,6 @@ cgraph_node::local_p (void) >> return n->callees->callee->local_p (); >> return !n->call_for_symbol_thunks_and_aliases (non_local_p, >>NULL, true); >> - >> } >> >> /* A helper for comdat_can_be_unshared_p. */ >> diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c >> b/gcc/testsuite/gcc.dg/lto/pr115815_0.c >> new file mode 100644 >> index 000..d938ae4c802 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c >> @@ -0,0 +1,18 @@ >> +int a; >> +volatile int v; >> +volatile int w; >> + >> +int __attribute__((destructor)) >> +b() { >> + if (v) >> +return a + b(); >> + v = 5; >> + return 0; >> +} >> + >> +int >> +main (int argc, char **argv) >> +{ >> + w = 1; >> + return 0; >> +} >> -- >> 2.45.2
[PATCH] sra: Avoid risking x87 magling binary representation of a replacement (PR 58416)
Hi, PR 58416 shows that storing non-floating point data to floating point scalar registers can lead to miscompilations when the data is normalized or otherwise processed upon loading to a register. To avoid that risk, this patch detects situations where we have multiple types and a we decide to represent the data in a type with a mode that is known to not be able to transfer actual bits reliably using the new TARGET_MODE_CAN_TRANSFER_BITS hook. Bootstrapped and tested on x86_64-linux. OK for trunk? Any back-ports to release branches would of course need a back-port of the hook itself, unfortunately. Thanks, Martin gcc/ChangeLog: 2024-08-19 Martin Jambor PR target/58416 * tree-sra.cc (types_risk_mangled_binary_repr_p): New function. (sort_and_splice_var_accesses): Use it. (propagate_subaccesses_from_rhs): Likewise. gcc/testsuite/ChangeLog: 2024-08-19 Martin Jambor PR target/58416 * gcc.dg/torture/pr58416.c: New test. --- gcc/testsuite/gcc.dg/torture/pr58416.c | 32 ++ gcc/tree-sra.cc| 28 +- 2 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr58416.c diff --git a/gcc/testsuite/gcc.dg/torture/pr58416.c b/gcc/testsuite/gcc.dg/torture/pr58416.c new file mode 100644 index 000..0922b0e7089 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr58416.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +struct s { + char s[sizeof(long double)]; +}; + +union u { + long double d; + struct s s; +}; + +int main() +{ + union u x = {0}; +#if __SIZEOF_LONG_DOUBLE__ == 16 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 12 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 8 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 4 + x.s = (struct s){""}; +#endif + + union u y = x; + + for (unsigned char *p = (unsigned char *)&y + sizeof y; + p-- > (unsigned char *)&y;) +if (*p != (unsigned char)'x') + __builtin_abort (); + return 0; +} diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 8040b0c5645..64e2f007d68 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -2335,6 +2335,19 @@ same_access_path_p (tree exp1, tree exp2) return true; } +/* Return true when either T1 is a type that, when loaded into a register and + stored back to memory will yield the same bits or when both T1 and T2 are + compatible. */ + +static bool +types_risk_mangled_binary_repr_p (tree t1, tree t2) +{ + if (mode_can_transfer_bits (TYPE_MODE (t1))) +return false; + + return !types_compatible_p (t1, t2); +} + /* Sort all accesses for the given variable, check for partial overlaps and return NULL if there are any. If there are none, pick a representative for each combination of offset and size and create a linked list out of them. @@ -2461,6 +2474,17 @@ sort_and_splice_var_accesses (tree var) } unscalarizable_region = true; } + else if (types_risk_mangled_binary_repr_p (access->type, ac2->type)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Cannot scalarize the following access " + "because data would be held in a mode which is not " + "guaranteed to preserve all bits.\n "); + dump_access (dump_file, access, false); + } + unscalarizable_region = true; + } if (grp_same_access_path && !same_access_path_p (access->expr, ac2->expr)) @@ -3127,7 +3151,9 @@ propagate_subaccesses_from_rhs (struct access *lacc, struct access *racc) ret = true; subtree_mark_written_and_rhs_enqueue (lacc); } - if (!lacc->first_child && !racc->first_child) + if (!lacc->first_child + && !racc->first_child + && !types_risk_mangled_binary_repr_p (racc->type, lacc->type)) { /* We are about to change the access type from aggregate to scalar, so we need to put the reverse flag onto the access, if any. */ -- 2.46.0
Re: [PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra
Hello, and ping please. Martin On Fri, Jul 26 2024, Martin Jambor wrote: > Hi, > > when looking at PR 115815 we realized that it would make sense to make > calls to functions originally declared static constructors and > destructors created by pass_ipa_cdtor_merge visible to IPA-SRA. This > patch does that. > > Bootstrapped and tested on x86_64-linux. OK for master? > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2024-07-25 Martin Jambor > > * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and > pass_ipa_sra. > --- > gcc/passes.def | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/passes.def b/gcc/passes.def > index b06d6d45f63..33b2c10c9c9 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_ipa_profile); >NEXT_PASS (pass_ipa_icf); >NEXT_PASS (pass_ipa_devirt); > + NEXT_PASS (pass_ipa_cdtor_merge); >NEXT_PASS (pass_ipa_cp); >NEXT_PASS (pass_ipa_sra); > - NEXT_PASS (pass_ipa_cdtor_merge); >NEXT_PASS (pass_ipa_fn_summary); >NEXT_PASS (pass_ipa_inline); >NEXT_PASS (pass_ipa_pure_const); > -- > 2.45.2
Re: [PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)
Hello, and ping please. Martin On Fri, Jul 26 2024, Martin Jambor wrote: > Hi, > > in PR 115815, IPA-SRA thought it had control over all invocations of a > (recursive) static destructor but it did not see the implied > invocation which led to the original being left behind and the > clean-up code encountering uses of SSAs that definitely should have > been dead. > > Fixed by teaching cgraph_node::can_be_local_p about static > constructors and destructors. Similar test is missing in > cgraph_node::local_p so I added the check there as well. > > Bootstrapped and tested on x86_64-linux. OK for master and after a > while to gcc14 and gcc13 release branches? > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2024-07-25 Martin Jambor > > PR ipa/115815 > * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check > DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR. > * ipa-visibility.cc (non_local_p): Likewise. > (cgraph_node::local_p): Delete extraneous line of tabs. > > gcc/testsuite/ChangeLog: > > 2024-07-25 Martin Jambor > > PR ipa/115815 > * gcc.dg/lto/pr115815_0.c: New test. > --- > gcc/cgraph.cc | 4 +++- > gcc/ipa-visibility.cc | 5 +++-- > gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++ > 3 files changed, 24 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index 473d8410bc9..39a3adbc7c3 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, > void *) > && !node->forced_by_abi > && !node->used_from_object_file_p () > && !node->same_comdat_group) > -|| !node->externally_visible)); > +|| !node->externally_visible) > +&& !DECL_STATIC_CONSTRUCTOR (node->decl) > +&& !DECL_STATIC_DESTRUCTOR (node->decl)); > } > > /* Return true if cgraph_node can be made local for API change. > diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc > index 501d3c304aa..21f0c47f388 100644 > --- a/gcc/ipa-visibility.cc > +++ b/gcc/ipa-visibility.cc > @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data > ATTRIBUTE_UNUSED) > && !node->externally_visible > && !node->used_from_other_partition > && !node->in_other_partition > -&& node->get_availability () >= AVAIL_AVAILABLE); > +&& node->get_availability () >= AVAIL_AVAILABLE > +&& !DECL_STATIC_CONSTRUCTOR (node->decl) > +&& !DECL_STATIC_DESTRUCTOR (node->decl)); > } > > /* Return true when function can be marked local. */ > @@ -116,7 +118,6 @@ cgraph_node::local_p (void) > return n->callees->callee->local_p (); > return !n->call_for_symbol_thunks_and_aliases (non_local_p, > NULL, true); > - > } > > /* A helper for comdat_can_be_unshared_p. */ > diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c > b/gcc/testsuite/gcc.dg/lto/pr115815_0.c > new file mode 100644 > index 000..d938ae4c802 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c > @@ -0,0 +1,18 @@ > +int a; > +volatile int v; > +volatile int w; > + > +int __attribute__((destructor)) > +b() { > + if (v) > +return a + b(); > + v = 5; > + return 0; > +} > + > +int > +main (int argc, char **argv) > +{ > + w = 1; > + return 0; > +} > -- > 2.45.2
[PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra
Hi, when looking at PR 115815 we realized that it would make sense to make calls to functions originally declared static constructors and destructors created by pass_ipa_cdtor_merge visible to IPA-SRA. This patch does that. Bootstrapped and tested on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2024-07-25 Martin Jambor * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra. --- gcc/passes.def | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/passes.def b/gcc/passes.def index b06d6d45f63..33b2c10c9c9 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_icf); NEXT_PASS (pass_ipa_devirt); + NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_ipa_cp); NEXT_PASS (pass_ipa_sra); - NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_ipa_fn_summary); NEXT_PASS (pass_ipa_inline); NEXT_PASS (pass_ipa_pure_const); -- 2.45.2
[PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)
Hi, in PR 115815, IPA-SRA thought it had control over all invocations of a (recursive) static destructor but it did not see the implied invocation which led to the original being left behind and the clean-up code encountering uses of SSAs that definitely should have been dead. Fixed by teaching cgraph_node::can_be_local_p about static constructors and destructors. Similar test is missing in cgraph_node::local_p so I added the check there as well. Bootstrapped and tested on x86_64-linux. OK for master and after a while to gcc14 and gcc13 release branches? Thanks, Martin gcc/ChangeLog: 2024-07-25 Martin Jambor PR ipa/115815 * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR. * ipa-visibility.cc (non_local_p): Likewise. (cgraph_node::local_p): Delete extraneous line of tabs. gcc/testsuite/ChangeLog: 2024-07-25 Martin Jambor PR ipa/115815 * gcc.dg/lto/pr115815_0.c: New test. --- gcc/cgraph.cc | 4 +++- gcc/ipa-visibility.cc | 5 +++-- gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++ 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index 473d8410bc9..39a3adbc7c3 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, void *) && !node->forced_by_abi && !node->used_from_object_file_p () && !node->same_comdat_group) - || !node->externally_visible)); + || !node->externally_visible) + && !DECL_STATIC_CONSTRUCTOR (node->decl) + && !DECL_STATIC_DESTRUCTOR (node->decl)); } /* Return true if cgraph_node can be made local for API change. diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc index 501d3c304aa..21f0c47f388 100644 --- a/gcc/ipa-visibility.cc +++ b/gcc/ipa-visibility.cc @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED) && !node->externally_visible && !node->used_from_other_partition && !node->in_other_partition - && node->get_availability () >= AVAIL_AVAILABLE); + && node->get_availability () >= AVAIL_AVAILABLE + && !DECL_STATIC_CONSTRUCTOR (node->decl) + && !DECL_STATIC_DESTRUCTOR (node->decl)); } /* Return true when function can be marked local. */ @@ -116,7 +118,6 @@ cgraph_node::local_p (void) return n->callees->callee->local_p (); return !n->call_for_symbol_thunks_and_aliases (non_local_p, NULL, true); - } /* A helper for comdat_can_be_unshared_p. */ diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c b/gcc/testsuite/gcc.dg/lto/pr115815_0.c new file mode 100644 index 000..d938ae4c802 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c @@ -0,0 +1,18 @@ +int a; +volatile int v; +volatile int w; + +int __attribute__((destructor)) +b() { + if (v) +return a + b(); + v = 5; + return 0; +} + +int +main (int argc, char **argv) +{ + w = 1; + return 0; +} -- 2.45.2
Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]
Hi, can I add myself to the bunch of people who are pinging this? Having this in will make our life easier. Thanks a lot, Martin On Wed, May 08 2024, Kewen.Lin wrote: > Hi, > > As the discussion in PR112980, although the current > implementation for -fpatchable-function-entry* conforms > with the documentation (making N NOPs be consecutive), > it's inefficient for both kernel and userspace livepatching > (see comments in PR for the details). > > So this patch is to change the current implementation by > emitting the "before" NOPs before global entry point and > the "after" NOPs after local entry point. The new behavior > would not keep NOPs to be consecutive, so the documentation > is updated to emphasize this. > > Bootstrapped and regress-tested on powerpc64-linux-gnu > P8/P9 and powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? And backporting to active branches > after burn-in time? I guess we should also mention this > change in changes.html? > > BR, > Kewen > - > PR target/112980 > > gcc/ChangeLog: > > * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): > Adjust the handling on patch area emitting with dual entry, remove > the restriction on "before" NOPs count, not emit "before" NOPs any > more but only emit "after" NOPs. > * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): > Adjust by respecting cfun->machine->stop_patch_area_print. > (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set > cfun->machine->stop_patch_area_print as true. > * config/rs6000/rs6000.h (struct machine_function): Remove member > global_entry_emitted, add new member stop_patch_area_print. > * doc/invoke.texi (option -fpatchable-function-entry): Adjust the > documentation for PowerPC ELFv2 dual entry. > > gcc/testsuite/ChangeLog: > > * c-c++-common/patchable_function_entry-default.c: Adjust. > * gcc.target/powerpc/pr99888-4.c: Likewise. > * gcc.target/powerpc/pr99888-5.c: Likewise. > * gcc.target/powerpc/pr99888-6.c: Likewise. > --- > gcc/config/rs6000/rs6000-logue.cc | 40 +-- > gcc/config/rs6000/rs6000.cc | 15 +-- > gcc/config/rs6000/rs6000.h| 10 +++-- > gcc/doc/invoke.texi | 8 ++-- > .../patchable_function_entry-default.c| 3 -- > gcc/testsuite/gcc.target/powerpc/pr99888-4.c | 4 +- > gcc/testsuite/gcc.target/powerpc/pr99888-5.c | 4 +- > gcc/testsuite/gcc.target/powerpc/pr99888-6.c | 4 +- > 8 files changed, 33 insertions(+), 55 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-logue.cc > b/gcc/config/rs6000/rs6000-logue.cc > index 60ba15a8bc3..0eb019b44b3 100644 > --- a/gcc/config/rs6000/rs6000-logue.cc > +++ b/gcc/config/rs6000/rs6000-logue.cc > @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file) > fprintf (file, "\tadd 2,2,12\n"); > } > > - unsigned short patch_area_size = crtl->patch_area_size; > - unsigned short patch_area_entry = crtl->patch_area_entry; > - /* Need to emit the patching area. */ > - if (patch_area_size > 0) > - { > - cfun->machine->global_entry_emitted = true; > - /* As ELFv2 ABI shows, the allowable bytes between the global > - and local entry points are 0, 4, 8, 16, 32 and 64 when > - there is a local entry point. Considering there are two > - non-prefixed instructions for global entry point prologue > - (8 bytes), the count for patchable nops before local entry > - point would be 2, 6 and 14. It's possible to support those > - other counts of nops by not making a local entry point, but > - we don't have clear use cases for them, so leave them > - unsupported for now. */ > - if (patch_area_entry > 0) > - { > - if (patch_area_entry != 2 > - && patch_area_entry != 6 > - && patch_area_entry != 14) > - error ("unsupported number of nops before function entry (%u)", > -patch_area_entry); > - rs6000_print_patchable_function_entry (file, patch_area_entry, > - true); > - patch_area_size -= patch_area_entry; > - } > - } > - >fputs ("\t.localentry\t", file); >assemble_name (file, name); >fputs (",.-", file); >assemble_name (file, name); >fputs ("\n", file); >/* Emit the nops after local entry. */ > - if (patch_area_size > 0) > - rs6000_print_patchable_function_entry (file, patch_area_size, > -patch_area_entry == 0); > + unsigned short patch_area_size = crtl->patch_area_size; > + unsigned short patch_area_entry = crtl->patch_area_entry; > + if (patch_area_size > patch_area_entry) > + { > + cfun->mach
[commited, gcc13] ipa: Compare jump functions in ICF (PR 113907)
Hi, This is a manual backport of r14-9840-g1162861439fd3c from master. Manual because the bits and value range representation in jump functions have changes during the gcc 14 development cycle. In PR 113907 comment #58, Honza found a case where ICF thinks bodies of functions are equivalent but becaise of difference in aliases in a memory access, different aggregate jump functions are associated with supposedly equivalent call statements. This patch adds a way to compare jump functions and plugs it into ICF to avoid the issue. Bootstrapped and tested on x86_64-linux. Committed to the gcc-13 branch. Martin gcc/ChangeLog: 2024-05-14 Martin Jambor PR ipa/113907 * ipa-prop.h (ipa_jump_functions_equivalent_p): Declare. (values_equal_for_ipcp_p): Likewise. * ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function. (ipa_agg_jump_functions_equivalent_p): Likewise. (ipa_jump_functions_equivalent_p): Likewise. * ipa-cp.cc (values_equal_for_ipcp_p): Make function public. * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h, ipa-cp.h and ipa-prop.h. (func_checker::compare_gimple_call): Comapre jump functions. gcc/testsuite/ChangeLog: 2024-05-10 Martin Jambor PR ipa/113907 * gcc.dg/lto/pr113907_0.c: New. * gcc.dg/lto/pr113907_1.c: Likewise. * gcc.dg/lto/pr113907_2.c: Likewise. --- gcc/ipa-cp.cc | 2 +- gcc/ipa-icf-gimple.cc | 29 + gcc/ipa-prop.cc | 157 ++ gcc/ipa-prop.h| 3 + gcc/testsuite/gcc.dg/lto/pr113907_0.c | 18 +++ gcc/testsuite/gcc.dg/lto/pr113907_1.c | 35 ++ gcc/testsuite/gcc.dg/lto/pr113907_2.c | 11 ++ 7 files changed, 254 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_2.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index b3e0f62e400..8f36608cf33 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -458,7 +458,7 @@ ipcp_lattice::is_single_const () /* Return true iff X and Y should be considered equal values by IPA-CP. */ -static bool +bool values_equal_for_ipcp_p (tree x, tree y) { gcc_checking_assert (x != NULL_TREE && y != NULL_TREE); diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc index 49302ad56c6..054a557bd58 100644 --- a/gcc/ipa-icf-gimple.cc +++ b/gcc/ipa-icf-gimple.cc @@ -42,7 +42,11 @@ along with GCC; see the file COPYING3. If not see #include "tree-sra.h" #include "tree-ssa-alias-compare.h" +#include "alloc-pool.h" +#include "symbol-summary.h" #include "ipa-icf-gimple.h" +#include "sreal.h" +#include "ipa-prop.h" namespace ipa_icf_gimple { @@ -751,6 +755,31 @@ func_checker::compare_gimple_call (gcall *s1, gcall *s2) && !compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2))) return return_false_with_msg ("GIMPLE internal call LHS type mismatch"); + if (!gimple_call_internal_p (s1)) +{ + cgraph_edge *e1 = cgraph_node::get (m_source_func_decl)->get_edge (s1); + cgraph_edge *e2 = cgraph_node::get (m_target_func_decl)->get_edge (s2); + class ipa_edge_args *args1 = ipa_edge_args_sum->get (e1); + class ipa_edge_args *args2 = ipa_edge_args_sum->get (e2); + if ((args1 != nullptr) != (args2 != nullptr)) + return return_false_with_msg ("ipa_edge_args mismatch"); + if (args1) + { + int n1 = ipa_get_cs_argument_count (args1); + int n2 = ipa_get_cs_argument_count (args2); + if (n1 != n2) + return return_false_with_msg ("ipa_edge_args nargs mismatch"); + for (int i = 0; i < n1; i++) + { + struct ipa_jump_func *jf1 = ipa_get_ith_jump_func (args1, i); + struct ipa_jump_func *jf2 = ipa_get_ith_jump_func (args2, i); + if (((jf1 != nullptr) != (jf2 != nullptr)) + || (jf1 && !ipa_jump_functions_equivalent_p (jf1, jf2))) + return return_false_with_msg ("jump function mismatch"); + } + } +} + return compare_operand (t1, t2, get_operand_access_type (&map, t1)); } diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index 0d816749534..11ba2521b2c 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -6022,5 +6022,162 @@ ipcp_transform_function (struct cgraph_node *node) return modified_mem_access ? TODO_update_ssa_only_virtuals : 0; } +/* Return true if the two pass_through components of two jump functions are + known to be equivalent. AGG_JF denotes whether they are part of aggregate + functions or not. The function can be used before the IPA phase of IPA-CP + or inlining because
[PATCH] sra: Do not leave work for DSE (that it can sometimes not perform)
Hi, when looking again at the g++.dg/tree-ssa/pr109849.C testcase we discovered that it generates terrible store-to-load forwarding stalls because SRA was leaving behind aggregate loads but all the stores were by scalar parts and DSE failed to remove the useless load. SRA has all the knowledge to remove the statement even now, so this small patch makes it do so. With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9 times faster (on an AMD EPYC 75F3 machine). Bootstrapped and tested on x86_64. OK for master? Given that the patch is simple but can sometimes have large benefit, could it possibly be backported to gcc-14 branch even if it is not a regression (at least not in the last decade) in a few weeks? Thanks, Martin gcc/ChangeLog: 2024-04-18 Martin Jambor * tree-sra.cc (sra_modify_assign): Remove the original statement also when dealing with a store to a fully covered aggregate from a non-candidate. gcc/testsuite/ChangeLog: 2024-04-23 Martin Jambor * g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store to cur disappears. * gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE, check that the unwanted stores were removed at early SRA time. --- gcc/testsuite/g++.dg/tree-ssa/pr109849.C | 3 ++- gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c | 6 +++--- gcc/tree-sra.cc| 14 -- 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C index cd348c0f590..d06dbb10482 100644 --- a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C +++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-sra" } */ +/* { dg-options "-O2 -fdump-tree-sra -fdump-tree-optimized" } */ #include typedef unsigned int uint32_t; @@ -29,3 +29,4 @@ main() } /* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} } */ +/* { dg-final { scan-tree-dump-not "cur = MEM" "optimized"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c index 43152de5616..1d01392c595 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-dse1-details -fno-short-enums -fno-tree-fre" } */ +/* { dg-options "-O2 -fdump-tree-esra -fno-short-enums -fno-tree-fre" } */ /* { dg-skip-if "we want a BIT_FIELD_REF from fold_truth_andor" { ! lp64 } } */ /* { dg-skip-if "temporary variable names are not x and y" { mmix-knuth-mmixware } } */ @@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b) && constraint_expr_equal (a.rhs, b.rhs); } -/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } */ -/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } */ +/* { dg-final { scan-tree-dump-not "x = " "esra" } } */ +/* { dg-final { scan-tree-dump-not "y = " "esra" } } */ diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 32fa28911f2..8040b0c5645 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -4854,8 +4854,18 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) But use the RHS aggregate to load from to expose more optimization opportunities. */ if (access_has_children_p (lacc)) - generate_subtree_copies (lacc->first_child, rhs, lacc->offset, -0, 0, gsi, true, true, loc); + { + generate_subtree_copies (lacc->first_child, rhs, lacc->offset, + 0, 0, gsi, true, true, loc); + if (lacc->grp_covered) + { + unlink_stmt_vdef (stmt); + gsi_remove (& orig_gsi, true); + release_defs (stmt); + sra_stats.deleted++; + return SRA_AM_REMOVED; + } + } } return SRA_AM_NONE; -- 2.44.0
Re: [wwwdocs] Add znver5 to GCC 14 changes
Hi Gerald, On Fri, May 03 2024, Gerald Pfeifer wrote: > Hi Martin, > > On Thu, 2 May 2024, Martin Jambor wrote: >> + GCC now supports AMD CPUs based on the znver5 core via >> +-march=znver5. Based on ISA extensions enabled on >> +a znver4 core, the switch further enables the AVXVNNI, MOVDIRI, >> +MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI ISA extensions. > > just two small suggestions: We usually sort extensions alphabetically, > so AVX512VP2INTERSECT, AVXVNNI, MOVDIR64B, MOVDIRI, and PREFETCHI. If > there is a specific reason to do otherwise, that's okay of course. > > And I might write "In addition to the ISA extensions enabled on a znver4 > core, this switch..." to avoid the repetition of "based on" (and make it a > bit more clear even that it is a full superset, not just 'loosely' based". > Thanks for the suggestions, I'll go ahead and commit the following then. Martin diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index 8dfbf7dc..46a0266d 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -954,6 +954,12 @@ __asm (".global __flmap_lock" "\n\t" -fsanitize=hwaddress will enable -mlam=u57 by default. + GCC now supports AMD CPUs based on the znver5 core via +-march=znver5. In addition to the ISA extensions +enabled on a znver4 core, this switch further enables the +AVX512VP2INTERSECT, AVXVNNI, MOVDIR64B, MOVDIRI, and PREFETCHI ISA +extensions. + MCore
[wwwdocs] Add znver5 to GCC 14 changes
Hello, based on input from AMD, I'd like to commit the following to the wwwdocs repo to point out new support for Zen 5 based AMD CPUs in GCC 14? Is it OK? Any suggestions, comments or questions? Thanks, Martin diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index 8dfbf7dc..d250340b 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -954,6 +954,11 @@ __asm (".global __flmap_lock" "\n\t" -fsanitize=hwaddress will enable -mlam=u57 by default. + GCC now supports AMD CPUs based on the znver5 core via +-march=znver5. Based on ISA extensions enabled on +a znver4 core, the switch further enables the AVXVNNI, MOVDIRI, +MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI ISA extensions. + MCore
Re: [wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior
Hi, On Wed, May 01 2024, Gerald Pfeifer wrote: > On Tue, 30 Apr 2024, Martin Jambor wrote: >> +Pragma GCC Target now affects preprocessor >> symbols > > Note the id: should be "gcc-target-pragma", though I even suggest to > simplify and say "target-pragma". > >> +The behavior of pragma GCC Target and specifically how it affects ISA > > Seconding Jakub's > > "And here as well, perhaps even #pragma GCC target." > >> +macros has changed in GCC 14. In GCC 13 and older, the GCC >> +target pragma defined and undefined corresponding ISA macros in >> +C when using integrated preprocessor during compilation but not when > > "...the integrated preprocessor..." > >> +preprocessor was invoked as a separate step or when using -save-temps. > > "...the preprocessor..." > > and -save-temps, or better "the -save-temps > option". > >> +This can lead to different behavior, especially in C++. For example, >> +functions the C++ snippet below will be (silently) compiled for an >> +incorrect instruction set by GCC 14. > > "functions" above looks like it's extraneous and should be skipped? > >> + /* With GCC 14, __AVX2__ here will always be defined and pop_options >> + never called. */ >> + #if ! __AVX2__ >> + #pragma GCC pop_options >> + #endif > > Maybe a bit subtle, I would not say a #pragma is called; how about invoked > or activated? > >> + >> +The fix in this case would be to remember >> +whether pop_options needs to be performed in a new >> +user-defined macro. > > "The fix in this case is to remember" (or "...remembering...") > Thanks for your suggestions, this is what I am going to commit in a moment. Martin diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html index c825a68e..a20d82c2 100644 --- a/htdocs/gcc-14/porting_to.html +++ b/htdocs/gcc-14/porting_to.html @@ -514,6 +514,48 @@ be included explicitly when compiling with GCC 14: +Pragma GCC target now affects preprocessor symbols + + +The behavior of pragma GCC target and specifically how it affects ISA +macros has changed in GCC 14. In GCC 13 and older, the GCC +target pragma defined and undefined corresponding ISA macros in +C when using the integrated preprocessor during compilation but not +when the preprocessor was invoked as a separate step or when using +the -save-temps option. In C++ the ISA macro definitions +were performed in a way which did not have any actual effect. + +In GCC 14 C++ behaves like C with integrated preprocessing in earlier +versions. Moreover, in both languages ISA macros are defined and +undefined as expected when preprocessing separately from compilation. + + +This can lead to different behavior, especially in C++. For example, +a part of the C++ snippet below will be (silently) compiled for an +incorrect instruction set by GCC 14. + + + #if ! __AVX2__ + #pragma GCC push_options + #pragma GCC target("avx2") + #endif + + /* Code to be compiled for AVX2. */ + + /* With GCC 14, __AVX2__ here will always be defined and pop_options + never invoked. */ + #if ! __AVX2__ + #pragma GCC pop_options + #endif + + /* With GCC 14, all following functions will be compiled for AVX2 + which was not intended. */ + + + +The fix in this case is to remember whether pop_options +needs to be performed in a new user-defined macro. +
Re: [wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior
Hi, On Thu, Apr 25 2024, Jakub Jelinek wrote: > On Thu, Apr 25, 2024 at 02:34:22PM +0200, Martin Jambor wrote: >> when looking at a package build issue with GCC 14, Michal Jireš noted a >> different behavior of pragma GCC Target. This snippet tries to describe >> the gist of the problem. I have left it in the C section even though it >> is not really C specific, but could not think of a good name for a new >> section for it. Ideas (and any other suggestions for improvements) >> welcome, of course. > > The change was more subtle. > We used to define/undefine the ISA macros in C in GCC 13 and older as well, > but only when using integrated preprocessor during compilation, > so it didn't work that way with -save-temps or separate -E and -S/-c > steps. > While in C++ it behaved as if the define/undefines aren't done at all > (they were done, but after preprocessing/lexing everything, so didn't > affect anything). > In GCC 14, it behaves in C++ the same as in C in older versions, and > additionally they are defined/undefined also when using separate > preprocessing, in both C and C++. > I see, thanks for the correction. Would the following then perhaps describe the situation accurately? Note that I have moved the whole thing to C++ section because it seems porting issues in C because of this are quite unlikely. Michal, I assume that the file where this issue happened was written in C++, right? Martin diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html index c825a68e..1e67b0b3 100644 --- a/htdocs/gcc-14/porting_to.html +++ b/htdocs/gcc-14/porting_to.html @@ -514,6 +514,51 @@ be included explicitly when compiling with GCC 14: +Pragma GCC Target now affects preprocessor symbols + + +The behavior of pragma GCC Target and specifically how it affects ISA +macros has changed in GCC 14. In GCC 13 and older, the GCC +target pragma defined and undefined corresponding ISA macros in +C when using integrated preprocessor during compilation but not when +preprocessor was invoked as a separate step or when using -save-temps. +In C++ the ISA macro definitions were performed in a way which did not +have any actual effect. + +In GCC 14 C++ behaves like C with integrated preprocessing in earlier +versions. Moreover, in both languages ISA macros are defined and +undefined as expected when preprocessing separately from compilation. + + +This can lead to different behavior, especially in C++. For example, +functions the C++ snippet below will be (silently) compiled for an +incorrect instruction set by GCC 14. + + + #if ! __AVX2__ + #pragma GCC push_options + #pragma GCC target("avx2") + #endif + + /* Code to be compiled for AVX2. */ + + /* With GCC 14, __AVX2__ here will always be defined and pop_options + never called. */ + #if ! __AVX2__ + #pragma GCC pop_options + #endif + + /* With GCC 14, all following functions will be compiled for AVX2 + which was not intended. */ + + + +The fix in this case would be to remember +whether pop_options needs to be performed in a new +user-defined macro. + + +
[wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior
Hello, when looking at a package build issue with GCC 14, Michal Jireš noted a different behavior of pragma GCC Target. This snippet tries to describe the gist of the problem. I have left it in the C section even though it is not really C specific, but could not think of a good name for a new section for it. Ideas (and any other suggestions for improvements) welcome, of course. Otherwise, would this be good to go to the wwwdocs? Thanks, Martin diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html index c825a68e..ae9a3cde 100644 --- a/htdocs/gcc-14/porting_to.html +++ b/htdocs/gcc-14/porting_to.html @@ -490,6 +490,43 @@ in C23. GCC will probably continue to support old-style function definitions even once C23 is used as the default language dialect. +Pragma GCC Target now affects preprocessor symbols + + +The behavior of pragma GCC Target has changed in GCC 14. For example, +GCC 13 and below defines __AVX2__ only when the target +is specified on the command line. This has been considered https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87299";>a +bug and since it was fixed in GCC 14, __AVX2__ is now also +defined with #pragma GCC target("avx2"). + + +Therefore, if macros expand to something like the snippet below, +functions will be (silently) compiled for an incorrect instruction +set. + + + #if ! __AVX2__ + #pragma GCC push_options + #pragma GCC target("avx2") + #endif + + /* Code to be compiled for AVX2. */ + + /* With GCC 14, __AVX2__ here will always be defined and pop_options + never called. */ + #if ! __AVX2__ + #pragma GCC pop_options + #endif + + /* With GCC 14, all following functions will be compiled for AVX2 + which was not intended. */ + + + +The fix in this case would be to remember +whether pop_options needs to be performed in a new +user-defined macro. + C++ language issues Header dependency changes
Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params
Hi, On Fri, Apr 12 2024, Filip Kastl wrote: > On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote: >> Hi! >> >> On 2024-04-11T19:52:51+0200, Martin Jambor wrote: >> > contrib/check-params-in-docs.py is a script that checks that all >> > options reported with ./gcc/xgcc -Bgcc --help=param are in >> > gcc/doc/invoke.texi and vice versa. >> >> Eh, first time I'm hearing about this one! It's running as part of our internal buildbot that Martin Liška set up. I must admit I did want to spend the minimum time necessary to fix the failure and did not realize Filip was looking at it too until I commited my simple fix... >> >> (a) Shouldn't this be running as part of the GCC build process? >> >> > gcn-preferred-vectorization-factor is in the manual but normally not >> > reported by --help, probably because I do not have gcn offload >> > configured. >> >> No, because you've not been building GCC for GCN target. ;-P >> >> > This patch makes the script silently about this particular >> > fact. >> >> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar >> to how that's done for "skip aarch64 params"? >> >> (c) ..., and shouldn't we likewise skip any "x86" ones? >> >> (d) ..., or in fact any target specific ones, following after the generic >> section? (Easily achieved with a special marker in >> 'gcc/doc/invoke.texi', just before: >> >> The following choices of @var{name} are available on AArch64 targets: >> >> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py' >> accordingly? > > Hi, > > I've made a patch to address (b), (c), (d). I didn't adjust takewhile. I > chose to do it differently since target-specific params in both invoke.texi > and > --help=params have to be ignored. > > The downside of this patch is that the script won't complain if someone adds a > target-specific param and doesn't document it. > > What do you think? ...and this is clearly much better. Thanks! Martin > > Cheers, > Filip > > -- 8< -- > > contrib/check-params-in-docs.py is a script that checks that all options > reported with gcc --help=params are in gcc/doc/invoke.texi and vice > versa. > gcc/doc/invoke.texi lists target-specific params but gcc --help=params > doesn't. This meant that the script would mistakenly complain about > parms missing from --help=params. Previously, the script was just set > to ignore aarch64 and gcn params which solved this issue only for x86. > This patch sets the script to ignore all target-specific params. > > contrib/ChangeLog: > > * check-params-in-docs.py: Ignore target specific params. > > Signed-off-by: Filip Kastl > --- > contrib/check-params-in-docs.py | 21 + > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py > index f7879dd8e08..ccdb8d72169 100755 > --- a/contrib/check-params-in-docs.py > +++ b/contrib/check-params-in-docs.py > @@ -38,6 +38,9 @@ def get_param_tuple(line): > description = line[i:].strip() > return (name, description) > > +def target_specific(param): > +return param.split('-')[0] in ('aarch64', 'gcn', 'x86') > + > > parser = argparse.ArgumentParser() > parser.add_argument('texi_file') > @@ -45,13 +48,16 @@ parser.add_argument('params_output') > > args = parser.parse_args() > > -ignored = {'logical-op-non-short-circuit', > 'gcn-preferred-vectorization-factor'} > -params = {} > +ignored = {'logical-op-non-short-circuit'} > +help_params = {} > > for line in open(args.params_output).readlines(): > if line.startswith(' ' * 2) and not line.startswith(' ' * 8): > r = get_param_tuple(line) > -params[r[0]] = r[1] > +help_params[r[0]] = r[1] > + > +# Skip target-specific params > +help_params = [x for x in help_params.keys() if not target_specific(x)] > > # Find section in .texi manual with parameters > texi = ([x.strip() for x in open(args.texi_file).readlines()]) > @@ -66,14 +72,13 @@ for line in texi: > texi_params.append(line[len(token):]) > break > > -# skip digits > +# Skip digits > texi_params = [x for x in texi_params if not x[0].isdigit()] > -# skip aarch64 params > -texi_params = [x for x in texi_params if not x.startswith('aarch64')] > -sorted_params = sorted(texi_params) > +# Skip target-specific params > +texi_params = [x for x in texi_params if not target_specific(x)] > > texi_set = set(texi_params) - ignored > -params_set = set(params.keys()) - ignored > +params_set = set(help_params) - ignored > > success = True > extra = texi_set - params_set > -- > 2.43.1
[PATCH] contrib/check-params-in-docs.py: Ignore gcn-preferred-vectorization-factor
Hi, contrib/check-params-in-docs.py is a script that checks that all options reported with ./gcc/xgcc -Bgcc --help=param are in gcc/doc/invoke.texi and vice versa. gcn-preferred-vectorization-factor is in the manual but normally not reported by --help, probably because I do not have gcn offload configured. This patch makes the script silently about this particular fact. I'll push the patch as obvious momentarily. Martin contrib/ChangeLog: 2024-04-11 Martin Jambor * check-params-in-docs.py (ignored): Add gcn-preferred-vectorization-factor. --- contrib/check-params-in-docs.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py index 623c82284e2..f7879dd8e08 100755 --- a/contrib/check-params-in-docs.py +++ b/contrib/check-params-in-docs.py @@ -45,7 +45,7 @@ parser.add_argument('params_output') args = parser.parse_args() -ignored = {'logical-op-non-short-circuit'} +ignored = {'logical-op-non-short-circuit', 'gcn-preferred-vectorization-factor'} params = {} for line in open(args.params_output).readlines(): -- 2.44.0
[wwwdocs, committed] Fix link to "Feature Test Macros" in "Porting to GCC 14" page
Hi, Michal Jireš found out that the link to Feature Test Macros on the Porting to GCC 14 page was broken, it misses a "/latest/" directory in the middle of the path. I'll commit the following as obvious. Thanks, Martin --- htdocs/gcc-14/porting_to.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html index 35274691..c825a68e 100644 --- a/htdocs/gcc-14/porting_to.html +++ b/htdocs/gcc-14/porting_to.html @@ -133,7 +133,7 @@ On GNU systems, headers described in standards (such as the C standard, or POSIX) may require the definition of certain macros at the start of the compilation before all required function declarations are made available. -See https://sourceware.org/glibc/manual/html_node/Feature-Test-Macros.html";>Feature Test Macros +See https://sourceware.org/glibc/manual/latest/html_node/Feature-Test-Macros.html";>Feature Test Macros in the GNU C Library manual for details. -- 2.44.0
Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding
Hello, On Sun, Apr 07 2024, Xi Ruoyao wrote: > On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote: >> The patch has been approved by Honza in Bugzilla. (I hope. He did write >> it looked reasonable.) Together with the patch for PR 113907, it has >> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on >> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux. It also >> passed normal bootstrap on aarch64-linux but there many testcases failed >> because the compiler timed out. The machine is old and slow and might >> have been oversubscribed so my plan is to try again on gcc185 from >> cfarm. If that goes well, I intend to commit the patch and then start >> working on backports. > > I've tried these two patches out on my own 24-core AArch64 machine. > Bootstrapped (but no LTO or PGO) and regtested fine. > Thank you very much, I have pushed th patches to upstream. Martin
[PATCH] ipa: Force args obtined through pass-through maps to the expected type (PR 113964)
Hi, interactions of IPA-CP and IPA-SRA on the same data is a rather big source of issues, I'm afraid. PR 113964 is a situation where IPA-CP propagates an unsigned short in a union parameter into a function which itself calls a different function which has a same union parameter and both these union parameters are split with IPA-SRA. The leaf function however uses a signed short member of the union. In the calling function, we get the unsigned constant as the replacement for the union and it is then passed in the call without any type compatibility checks. Apparently on riscv64 it matters whether the parameter is signed or unsigned short and so the leaf function can see different values. Fixed by using useless_type_conversion_p at the appropriate place and if it fails, use force_value_to type as elsewhere in similar situations. Bootstrapped and tested on x86_64-linux, the reporter has also run the testsuite with this patch on riscv64 and reported in Bugzilla there were no issues. OK for master and GCC 13? Thanks, Martin gcc/ChangeLog: 2024-04-04 Martin Jambor PR ipa/113964 * ipa-param-manipulation.cc (ipa_param_adjustments::modify_call): Force values obtined through pass-through maps to the expected split type. gcc/testsuite/ChangeLog: 2024-04-04 Patrick O'Neill Martin Jambor PR ipa/113964 * gcc.dg/ipa/pr114247.c: New test. --- gcc/ipa-param-manipulation.cc | 6 ++ gcc/testsuite/gcc.dg/ipa/pr114247.c | 31 + 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr114247.c diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc index 3e0df6a6f77..b4ca78b652e 100644 --- a/gcc/ipa-param-manipulation.cc +++ b/gcc/ipa-param-manipulation.cc @@ -740,6 +740,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs, } if (repl) { + if (!useless_type_conversion_p(apm->type, repl->typed.type)) + { + repl = force_value_to_type (apm->type, repl); + repl = force_gimple_operand_gsi (&gsi, repl, + true, NULL, true, GSI_SAME_STMT); + } vargs.quick_push (repl); continue; } diff --git a/gcc/testsuite/gcc.dg/ipa/pr114247.c b/gcc/testsuite/gcc.dg/ipa/pr114247.c new file mode 100644 index 000..60aa2bc0122 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr114247.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fsigned-char -fno-strict-aliasing -fwrapv" } */ + +union a { + unsigned short b; + int c; + signed short d; +}; +int e, f = 1, g; +long h; +const int **i; +void j(union a k, int l, unsigned m) { + const int *a[100]; + i = &a[0]; + h = k.d; +} +static int o(union a k) { + k.d = -1; + while (1) +if (f) + break; + j(k, g, e); + return 0; +} +int main() { + union a n = {1}; + o(n); + if (h != -1) +__builtin_abort(); + return 0; +} -- 2.44.0
[PATCH] ICF&SRA: Make ICF and SRA agree on padding
Hi, PR 113359 shows that (at least with -fno-strict-aliasing) ICF can unify two functions which copy an aggregate type of the same size but then SRA, through its total scalarization, can copy the aggregate by pieces, skipping paddding, but the padding was not the same in the two original functions that ICF unified. This patch enhances SRA with the ability to collect padding information which then can be compared from within ICF. Unfortunately SRA uses OPTION_SET_P when determining its limits, so ICF needs to switch cfuns at least once to figure it out too. The patch has been approved by Honza in Bugzilla. (I hope. He did write it looked reasonable.) Together with the patch for PR 113907, it has passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux. It also passed normal bootstrap on aarch64-linux but there many testcases failed because the compiler timed out. The machine is old and slow and might have been oversubscribed so my plan is to try again on gcc185 from cfarm. If that goes well, I intend to commit the patch and then start working on backports. Martin gcc/ChangeLog: 2024-03-27 Martin Jambor PR ipa/113359 * ipa-icf-gimple.h (func_checker): New members safe_for_total_scalarization_p, m_total_scalarization_limit_known_p and m_total_scalarization_limit. (func_checker::func_checker): Initialize new member variables. * ipa-icf-gimple.cc: Include tree-sra.h. (func_checker::func_checker): Initialize new member variables. (func_checker::safe_for_total_scalarization_p): New function. (func_checker::compare_operand): Use the new function. * tree-sra.h (sra_get_max_scalarization_size): Declare. (sra_total_scalarization_would_copy_same_data_p): Likewise. * tree-sra.cc (prepare_iteration_over_array_elts): New function. (class sra_padding_collecting): New. (sra_padding_collecting::record_padding): Likewise. (scalarizable_type_p): Rename to totally_scalarizable_type_p. Add ability to record padding when requested. (totally_scalarize_subtree): Split out gathering information necessary to iterate over array elements to prepare_iteration_over_array_elts. Fix errornous early exit. (analyze_all_variable_accesses): Adjust the call to totally_scalarizable_type_p. Move determining of total scalariation size limit... (sra_get_max_scalarization_size): ...here. (check_ts_and_push_padding_to_vec): New function. (sra_total_scalarization_would_copy_same_data_p): Likewise. gcc/testsuite/ChangeLog: 2024-03-27 Martin Jambor PR ipa/113359 * gcc.dg/lto/pr113359-1_0.c: New. * gcc.dg/lto/pr113359-1_1.c: Likewise. * gcc.dg/lto/pr113359-2_0.c: Likewise. * gcc.dg/lto/pr113359-2_1.c: Likewise. * gcc.dg/lto/pr113359-3_0.c: Likewise. * gcc.dg/lto/pr113359-3_1.c: Likewise. * gcc.dg/lto/pr113359-4_0.c: Likewise. * gcc.dg/lto/pr113359-4_1.c: Likewise. * gcc.dg/lto/pr113359-5_0.c: Likewise. * gcc.dg/lto/pr113359-5_1.c: Likewise. --- gcc/ipa-icf-gimple.cc | 41 +++- gcc/ipa-icf-gimple.h| 15 +- gcc/testsuite/gcc.dg/lto/pr113359-1_0.c | 86 gcc/testsuite/gcc.dg/lto/pr113359-1_1.c | 38 gcc/testsuite/gcc.dg/lto/pr113359-2_0.c | 87 gcc/testsuite/gcc.dg/lto/pr113359-2_1.c | 38 gcc/testsuite/gcc.dg/lto/pr113359-3_0.c | 114 +++ gcc/testsuite/gcc.dg/lto/pr113359-3_1.c | 49 + gcc/testsuite/gcc.dg/lto/pr113359-4_0.c | 114 +++ gcc/testsuite/gcc.dg/lto/pr113359-4_1.c | 49 + gcc/testsuite/gcc.dg/lto/pr113359-5_0.c | 118 +++ gcc/testsuite/gcc.dg/lto/pr113359-5_1.c | 50 + gcc/tree-sra.cc | 252 +++- gcc/tree-sra.h | 3 + 14 files changed, 999 insertions(+), 55 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-1_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-1_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-2_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-2_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-3_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-3_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-4_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-4_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-5_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-5_1.c diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc index 17f62bec068..c25eb24710f 100644 --- a/gcc/ipa-icf-gimple.cc +++ b/gcc/ipa-icf-gimple.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "attribs.h" #include "gimple-walk.h" +#in
[PATCH] ipa: Compare jump functions in ICF (PR 113907)
Hello, In PR 113907 comment #58, Honza found a case where ICF thinks bodies of functions are equivalent but becaise of difference in aliases in a memory access, different aggregate jump functions are associated with supposedly equivalent call statements. This patch adds a way to compare jump functions and plugs it into ICF to avoid the issue. The patch has been approved by Honza in Bugzilla. Together with the patch for PR 113359, it has passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux. It also passed normal bootstrap on aarch64-linux but there many testcases failed because the compiler timed out. The machine is old and slow and might have been oversubscribed so my plan is to try again on gcc185 from cfarm. If that goes well, I intend to commit the patch and then start working on backports. Martin gcc/ChangeLog: 2024-03-20 Martin Jambor PR ipa/113907 * ipa-prop.h (class ipa_vr): Declare new overload of a member function equal_p. (ipa_jump_functions_equivalent_p): Declare. * ipa-prop.cc (ipa_vr::equal_p): New function. (ipa_agg_pass_through_jf_equivalent_p): Likewise. (ipa_agg_jump_functions_equivalent_p): Likewise. (ipa_jump_functions_equivalent_p): Likewise. * ipa-cp.h (values_equal_for_ipcp_p): Declare. * ipa-cp.cc (values_equal_for_ipcp_p): Make function public. * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h, ipa-cp.h and ipa-prop.h. (func_checker::compare_gimple_call): Comapre jump functions. gcc/testsuite/ChangeLog: 2024-03-20 Martin Jambor PR ipa/113907 * gcc.dg/lto/pr113907_0.c: New. * gcc.dg/lto/pr113907_1.c: Likewise. * gcc.dg/lto/pr113907_2.c: Likewise. --- gcc/ipa-cp.cc | 2 +- gcc/ipa-cp.h | 2 + gcc/ipa-icf-gimple.cc | 30 + gcc/ipa-prop.cc | 167 ++ gcc/ipa-prop.h| 3 + gcc/testsuite/gcc.dg/lto/pr113907_0.c | 18 +++ gcc/testsuite/gcc.dg/lto/pr113907_1.c | 35 ++ gcc/testsuite/gcc.dg/lto/pr113907_2.c | 11 ++ 8 files changed, 267 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_1.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_2.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 2a1da631e9c..b7add455bd5 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -201,7 +201,7 @@ ipcp_lattice::is_single_const () /* Return true iff X and Y should be considered equal values by IPA-CP. */ -static bool +bool values_equal_for_ipcp_p (tree x, tree y) { gcc_checking_assert (x != NULL_TREE && y != NULL_TREE); diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h index 0b3cfe4b526..7ff74fb5c98 100644 --- a/gcc/ipa-cp.h +++ b/gcc/ipa-cp.h @@ -289,4 +289,6 @@ public: bool virt_call = false; }; +bool values_equal_for_ipcp_p (tree x, tree y); + #endif /* IPA_CP_H */ diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc index 8c2df7a354e..17f62bec068 100644 --- a/gcc/ipa-icf-gimple.cc +++ b/gcc/ipa-icf-gimple.cc @@ -41,7 +41,12 @@ along with GCC; see the file COPYING3. If not see #include "gimple-walk.h" #include "tree-ssa-alias-compare.h" +#include "alloc-pool.h" +#include "symbol-summary.h" #include "ipa-icf-gimple.h" +#include "sreal.h" +#include "ipa-cp.h" +#include "ipa-prop.h" namespace ipa_icf_gimple { @@ -714,6 +719,31 @@ func_checker::compare_gimple_call (gcall *s1, gcall *s2) && !compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2))) return return_false_with_msg ("GIMPLE internal call LHS type mismatch"); + if (!gimple_call_internal_p (s1)) +{ + cgraph_edge *e1 = cgraph_node::get (m_source_func_decl)->get_edge (s1); + cgraph_edge *e2 = cgraph_node::get (m_target_func_decl)->get_edge (s2); + class ipa_edge_args *args1 = ipa_edge_args_sum->get (e1); + class ipa_edge_args *args2 = ipa_edge_args_sum->get (e2); + if ((args1 != nullptr) != (args2 != nullptr)) + return return_false_with_msg ("ipa_edge_args mismatch"); + if (args1) + { + int n1 = ipa_get_cs_argument_count (args1); + int n2 = ipa_get_cs_argument_count (args2); + if (n1 != n2) + return return_false_with_msg ("ipa_edge_args nargs mismatch"); + for (int i = 0; i < n1; i++) + { + struct ipa_jump_func *jf1 = ipa_get_ith_jump_func (args1, i); + struct ipa_jump_func *jf2 = ipa_get_ith_jump_func (args2, i); + if (((jf1 != nullptr) != (jf2 != nullptr)) + || (jf1 && !ipa_jump_functions_equivalent_p (jf
Re: [PATCH] ipa: Avoid duplicate replacements in IPA-SRA transformation phase
Hello, and ping, please. (In my copy I have fixed the formatting issue spotted by Jakub.) Martin On Fri, Mar 15 2024, Martin Jambor wrote: > Hi, > > when the analysis part of IPA-SRA figures out that it would split out > a scalar part of an aggregate which is known by IPA-CP to contain a > known constant, it skips it knowing that the transformation part looks > at IPA-CP aggregate results too and does the right thing (which can > include doing the propagation in GIMPLE because that is the last > moment the parameter exists). > > However, when IPA-SRA wants to split out a smaller non-aggregate out > of an aggregate, which happens to be of the same size as a known > scalar constant at the same offset, the transformation bit fails to > recognize the situation, tries to do both splitting and constant > propagation and in PR 111571 testcase creates a nonsensical call > statement on which the call redirection then ICEs. > > Fixed by making sure we don't try to do two replacements of the same > part of the same parameter. > > The look-up among replacements requires these are sorted and this > patch just sorts them if they are not already sorted before each new > look-up. The worst number of sortings that can happen is number of > parameters which are both split and have aggregate constants times > param_ipa_max_agg_items (default 16). I don't think complicating the > source code to optimize for this unlikely case is worth it but if need > be, it can of course be done. > > Bootstrapped and tested on x86_64-linux. OK for master and eventually > also the gcc-13 branch? > > Thanks, > > Martin > > > > gcc/ChangeLog: > > 2024-03-15 Martin Jambor > > PR ipa/111571 > * ipa-param-manipulation.cc > (ipa_param_body_adjustments::common_initialization): Avoid creating > duplicate replacement entries. > > gcc/testsuite/ChangeLog: > > 2024-03-15 Martin Jambor > > PR ipa/111571 > * gcc.dg/ipa/pr111571.c: New test. > --- > gcc/ipa-param-manipulation.cc | 16 > gcc/testsuite/gcc.dg/ipa/pr111571.c | 29 + > 2 files changed, 45 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr111571.c > > diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc > index 3e0df6a6f77..4c6337cc563 100644 > --- a/gcc/ipa-param-manipulation.cc > +++ b/gcc/ipa-param-manipulation.cc > @@ -1525,6 +1525,22 @@ ipa_param_body_adjustments::common_initialization > (tree old_fndecl, >replacement with a constant (for split aggregates passed >by value). */ > > + if (split[parm_num]) > + { > + /* We must be careful not to add a duplicate > + replacement. */ > + sort_replacements (); > + ipa_param_body_replacement *pbr = > + lookup_replacement_1 (m_oparms[parm_num], > + av.unit_offset); > + if (pbr) > + { > + /* Otherwise IPA-SRA should have bailed out. */ > + gcc_assert (AGGREGATE_TYPE_P (TREE_TYPE (pbr->repl))); > + continue; > + } > + } > + > tree repl; > if (av.by_ref) > repl = av.value; > diff --git a/gcc/testsuite/gcc.dg/ipa/pr111571.c > b/gcc/testsuite/gcc.dg/ipa/pr111571.c > new file mode 100644 > index 000..2a4adc608db > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/ipa/pr111571.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +struct a { > + int b; > +}; > +struct c { > + long d; > + struct a e; > + long f; > +}; > +int g, h, i; > +int j() {return 0;} > +static void k(struct a l, int p) { > + if (h) > +g = 0; > + for (; g; g = j()) > +if (l.b) > + break; > +} > +static void m(struct c l) { > + k(l.e, l.f); > + for (;; --i) > +; > +} > +int main() { > + struct c n = {10, 9}; > + m(n); > +} > -- > 2.44.0
Re: [PATCH] tree-optimization/113727 - bogus SRA with BIT_FIELD_REF
Hello, On Tue, Mar 19 2024, Richard Biener wrote: > When SRA analyzes BIT_FIELD_REFs it handles writes and not byte > aligned reads differently from byte aligned reads. Instead of > trying to create replacements for the loaded portion the former > cases try to replace the base object while keeping the wrapping > BIT_FIELD_REFs. This breaks when we have both kinds operating > on the same base object if there's no appearant overlap conflict > as the conflict that then nevertheless exists isn't handled with. > The fix is to enforce what I think is part of the design handling > the former case - that only the full base object gets replaced > and no further sub-objects are created within as otherwise > keeping the wrapping BIT_FIELD_REF cannot work. The patch > enforces this within analyze_access_subtree. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > OK? I agree this is the best thing to do. Thanks, Martin > > Thanks, > Richard. > > PR tree-optimization/113727 > * tree-sra.cc (analyze_access_subtree): Do not allow > replacements in subtrees when grp_partial_lhs. > > * gcc.dg/torture/pr113727.c: New testcase. > --- > gcc/testsuite/gcc.dg/torture/pr113727.c | 26 + > gcc/tree-sra.cc | 3 ++- > 2 files changed, 28 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113727.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr113727.c > b/gcc/testsuite/gcc.dg/torture/pr113727.c > new file mode 100644 > index 000..f92ddad5c8e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113727.c > @@ -0,0 +1,26 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target int32plus } */ > + > +struct f { > + unsigned au : 5; > + unsigned f3 : 21; > +} g_994; > + > +int main() > +{ > + struct f aq1 = {}; > +{ > + struct f aq = {9, 5}; > + struct f as = aq; > + for (int y = 0 ; y <= 4; y += 1) > + if (as.au) > + { > + struct f aa[5] = {{2, 154}, {2, 154}, {2, 154}, {2, 154}, {2, 154}}; > + as = aa[0]; > + } > + aq1 = as; > +} > + if (aq1.f3 != 0x9a) > +__builtin_abort(); > + return 0; > +} > diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc > index f8e71ec48b9..dbfae5e7fdd 100644 > --- a/gcc/tree-sra.cc > +++ b/gcc/tree-sra.cc > @@ -2735,7 +2735,8 @@ analyze_access_subtree (struct access *root, struct > access *parent, > { >hole |= covered_to < child->offset; >sth_created |= analyze_access_subtree (child, root, > - allow_replacements && !scalar, > + allow_replacements && !scalar > + && !root->grp_partial_lhs, >totally); > >root->grp_unscalarized_data |= child->grp_unscalarized_data; > -- > 2.35.3
[PATCH] ipa: Avoid duplicate replacements in IPA-SRA transformation phase
Hi, when the analysis part of IPA-SRA figures out that it would split out a scalar part of an aggregate which is known by IPA-CP to contain a known constant, it skips it knowing that the transformation part looks at IPA-CP aggregate results too and does the right thing (which can include doing the propagation in GIMPLE because that is the last moment the parameter exists). However, when IPA-SRA wants to split out a smaller non-aggregate out of an aggregate, which happens to be of the same size as a known scalar constant at the same offset, the transformation bit fails to recognize the situation, tries to do both splitting and constant propagation and in PR 111571 testcase creates a nonsensical call statement on which the call redirection then ICEs. Fixed by making sure we don't try to do two replacements of the same part of the same parameter. The look-up among replacements requires these are sorted and this patch just sorts them if they are not already sorted before each new look-up. The worst number of sortings that can happen is number of parameters which are both split and have aggregate constants times param_ipa_max_agg_items (default 16). I don't think complicating the source code to optimize for this unlikely case is worth it but if need be, it can of course be done. Bootstrapped and tested on x86_64-linux. OK for master and eventually also the gcc-13 branch? Thanks, Martin gcc/ChangeLog: 2024-03-15 Martin Jambor PR ipa/111571 * ipa-param-manipulation.cc (ipa_param_body_adjustments::common_initialization): Avoid creating duplicate replacement entries. gcc/testsuite/ChangeLog: 2024-03-15 Martin Jambor PR ipa/111571 * gcc.dg/ipa/pr111571.c: New test. --- gcc/ipa-param-manipulation.cc | 16 gcc/testsuite/gcc.dg/ipa/pr111571.c | 29 + 2 files changed, 45 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr111571.c diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc index 3e0df6a6f77..4c6337cc563 100644 --- a/gcc/ipa-param-manipulation.cc +++ b/gcc/ipa-param-manipulation.cc @@ -1525,6 +1525,22 @@ ipa_param_body_adjustments::common_initialization (tree old_fndecl, replacement with a constant (for split aggregates passed by value). */ + if (split[parm_num]) + { + /* We must be careful not to add a duplicate +replacement. */ + sort_replacements (); + ipa_param_body_replacement *pbr = + lookup_replacement_1 (m_oparms[parm_num], + av.unit_offset); + if (pbr) + { + /* Otherwise IPA-SRA should have bailed out. */ + gcc_assert (AGGREGATE_TYPE_P (TREE_TYPE (pbr->repl))); + continue; + } + } + tree repl; if (av.by_ref) repl = av.value; diff --git a/gcc/testsuite/gcc.dg/ipa/pr111571.c b/gcc/testsuite/gcc.dg/ipa/pr111571.c new file mode 100644 index 000..2a4adc608db --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr111571.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct a { + int b; +}; +struct c { + long d; + struct a e; + long f; +}; +int g, h, i; +int j() {return 0;} +static void k(struct a l, int p) { + if (h) +g = 0; + for (; g; g = j()) +if (l.b) + break; +} +static void m(struct c l) { + k(l.e, l.f); + for (;; --i) +; +} +int main() { + struct c n = {10, 9}; + m(n); +} -- 2.44.0
[PATCH] ipa: Fix C++ member ptr indirect inlining (PR 114254, PR 108802)
Hi, Even though we have had code to handle creation of indirect call graph edges (so that these calls can than be made direct as part of IPA-CP and inlining and eventually also inlined) for C++ member pointers for many years, it turns out that it does not work for lambdas and that it has been severely broken since GCC 10 when the base class has virtual functions. Lambdas don't work because the code cannot work with structures representing member function pointers because they are passed by reference instead by value and the code was not ready for that. The presence of virtual methods broke thinks because at some point C++ FE got clever and stopped emitting the check for virtual methods when the base class does not have any and that in turn made our existing testcases not test the necessary pattern matching code. The pattern matcher had a small bug which did not matter before r10-917-g3b47da42de621c but did afterwards. This patch changes the pattern matcher to match both of these cases. Special thanks to the Linaro automated checker of patches which reported that the earlier version of my PR 108802 fix was not working on Aarch64 which in turn made me discover PR 114254. The patch has passed bootstrap and testing on x86_64-linux, aarch64-linux and ppc64-linux and I also LTO bootstrap on x86_64-linux. I understand we have been living with these deficiencies for a while now but both are technically regressions. If Honza agrees (and manages to review the patch quickly), I'm fine with pushing them to master now but I can also wait until the next stage 1. Thanks, Martin gcc/ChangeLog: 2024-03-06 Martin Jambor PR ipa/108802 PR ipa/114254 * ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Fix case looking at COMPONENT_REFs directly from a PARM_DECL, also recognize loads from a pointer parameter. (ipa_analyze_indirect_call_uses): Also recognize loads from a pointer parameter, also recognize the case when pfn pointer is loaded in its own BB. gcc/testsuite/ChangeLog: 2024-03-06 Martin Jambor PR ipa/108802 PR ipa/114254 * g++.dg/ipa/iinline-4.C: New test. * g++.dg/ipa/pr108802.C: Likewise. --- gcc/ipa-prop.cc | 110 +++ gcc/testsuite/g++.dg/ipa/iinline-4.C | 61 +++ gcc/testsuite/g++.dg/ipa/pr108802.C | 14 3 files changed, 154 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ipa/iinline-4.C create mode 100644 gcc/testsuite/g++.dg/ipa/pr108802.C diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index e22c4f78405..e8e4918d5a8 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -2500,7 +2500,9 @@ static tree ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool use_delta, HOST_WIDE_INT *offset_p) { - tree rhs, rec, ref_field, ref_offset, fld, ptr_field, delta_field; + tree rhs, fld, ptr_field, delta_field; + tree ref_field = NULL_TREE; + tree ref_offset = NULL_TREE; if (!gimple_assign_single_p (stmt)) return NULL_TREE; @@ -2511,35 +2513,53 @@ ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool use_delta, ref_field = TREE_OPERAND (rhs, 1); rhs = TREE_OPERAND (rhs, 0); } + + if (TREE_CODE (rhs) == MEM_REF) +{ + ref_offset = TREE_OPERAND (rhs, 1); + if (ref_field && integer_nonzerop (ref_offset)) + return NULL_TREE; +} + else if (!ref_field) +return NULL_TREE; + + if (TREE_CODE (rhs) == MEM_REF + && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (rhs, 0))) +{ + rhs = TREE_OPERAND (rhs, 0); + if (TREE_CODE (SSA_NAME_VAR (rhs)) != PARM_DECL + || !type_like_member_ptr_p (TREE_TYPE (TREE_TYPE (rhs)), &ptr_field, + &delta_field)) + return NULL_TREE; +} else -ref_field = NULL_TREE; - if (TREE_CODE (rhs) != MEM_REF) -return NULL_TREE; - rec = TREE_OPERAND (rhs, 0); - if (TREE_CODE (rec) != ADDR_EXPR) -return NULL_TREE; - rec = TREE_OPERAND (rec, 0); - if (TREE_CODE (rec) != PARM_DECL - || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field, &delta_field)) -return NULL_TREE; - ref_offset = TREE_OPERAND (rhs, 1); +{ + if (TREE_CODE (rhs) == MEM_REF + && TREE_CODE (TREE_OPERAND (rhs, 0)) == ADDR_EXPR) + rhs = TREE_OPERAND (TREE_OPERAND (rhs, 0), 0); + if (TREE_CODE (rhs) != PARM_DECL + || !type_like_member_ptr_p (TREE_TYPE (rhs), &ptr_field, + &delta_field)) + return NULL_TREE; +} if (use_delta) fld = delta_field; else fld = ptr_field; - if (offset_p) -*offset_p = int_bit_position (fld); if (ref_field) { - if (integer_nonzerop (ref_offset)) + if (ref_field != fld)
Re: [PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)
Hello, and ping please. Martin On Thu, Feb 08 2024, Martin Jambor wrote: > Hi, > > PR 113757 shows that the code which was meant to debug-reset and > remove SSAs defined by LHSs of calls redirected to > __builtin_unreachable can trigger also when speculative > devirtualization creates a call to a noreturn function (and since it > is noreturn, it does not bother dealing with its return value). > > What is more, it seems that the code handling this case is not really > necessary. I feel slightly idiotic about this because I have a > feeling that I added it because of a failing test-case but I can > neither find the testcase nor a reason why the code in > cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it > turns the SSA name into a default-def, a bit like IPA-SRA, but any > code dominated by a call to a noreturn is not dangerous when it comes > to its side-effects). So this patch just removes the handling. > > Bootstrapped and tested on x86_64-linux and ppc64le-linux. I have also > LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux. > > OK for master? > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2024-02-07 Martin Jambor > > PR ipa/113757 > * tree-inline.cc (redirect_all_calls): Remove code adding SSAs to > id->killed_new_ssa_names. > > gcc/testsuite/ChangeLog: > > 2024-02-07 Martin Jambor > > PR ipa/113757 > * g++.dg/ipa/pr113757.C: New test. > --- > gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++ > gcc/tree-inline.cc | 14 ++ > 2 files changed, 16 insertions(+), 12 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C > > diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C > b/gcc/testsuite/g++.dg/ipa/pr113757.C > new file mode 100644 > index 000..885d4010a10 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/ipa/pr113757.C > @@ -0,0 +1,14 @@ > +// { dg-do compile } > +// { dg-options "-O2 -fPIC" } > +// { dg-require-effective-target fpic } > + > +long size(); > +struct ll { virtual int hh(); }; > +ll *slice_owner; > +int ll::hh() { __builtin_exit(0); } > +int nn() { > + if (size()) > +return 0; > + return slice_owner->hh(); > +} > +int (*a)() = nn; > diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc > index 75c10eb7dfc..cac41b4f031 100644 > --- a/gcc/tree-inline.cc > +++ b/gcc/tree-inline.cc > @@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, basic_block > bb) >gimple *stmt = gsi_stmt (si); >if (is_gimple_call (stmt)) > { > - tree old_lhs = gimple_call_lhs (stmt); > struct cgraph_edge *edge = id->dst_node->get_edge (stmt); > if (edge) > { > if (!id->killed_new_ssa_names) > id->killed_new_ssa_names = new hash_set (16); > - gimple *new_stmt > - = cgraph_edge::redirect_call_stmt_to_callee (edge, > - id->killed_new_ssa_names); > - if (old_lhs > - && TREE_CODE (old_lhs) == SSA_NAME > - && !gimple_call_lhs (new_stmt)) > - /* In case of IPA-SRA removing the LHS, the name should have > -been already added to the hash. But in case of redirecting > -to builtin_unreachable it was not and the name still should > -be pruned from debug statements. */ > - id->killed_new_ssa_names->add (old_lhs); > + cgraph_edge::redirect_call_stmt_to_callee (edge, > + id->killed_new_ssa_names); > > if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt)) > gimple_purge_dead_eh_edges (bb); > -- > 2.43.0
[PATCH] ipa: Create indirect call edges also for lambdas
Hi, Even though we have had code to handle creation of indirect call graph edges (so that these calls can than be made direct as part of IPA-CP and inlining and eventually also inlined) for C++ member pointers for many years, this code expects the member pointers to be structures passed by value. In PR 108802 it turned out that for lambdas these are passed by reference. This patch adjusts the code for that so that small lambdas are also inlineable without depending on early inlining. Bootstrapped and LTO bootstrapped on x86_64-linux. This is technically a regression against GCC 10. OK for master even now? Thanks, Martin gcc/ChangeLog: 2024-02-20 Martin Jambor PR ipa/108802 * ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Also recognize loads from a pointer parameter. (ipa_analyze_indirect_call_uses): Likewise. gcc/testsuite/ChangeLog: 2024-02-20 Martin Jambor PR ipa/108802 * g++.dg/ipa/pr108802.C: New test. --- gcc/ipa-prop.cc | 56 + gcc/testsuite/g++.dg/ipa/pr108802.C | 14 2 files changed, 55 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ipa/pr108802.C diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index bec0ebd210c..25d252fd57c 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -2514,14 +2514,26 @@ ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool use_delta, if (TREE_CODE (rhs) != MEM_REF) return NULL_TREE; rec = TREE_OPERAND (rhs, 0); - if (TREE_CODE (rec) != ADDR_EXPR) -return NULL_TREE; - rec = TREE_OPERAND (rec, 0); - if (TREE_CODE (rec) != PARM_DECL - || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field, &delta_field)) + if (TREE_CODE (rec) == ADDR_EXPR) +{ + rec = TREE_OPERAND (rec, 0); + if (TREE_CODE (rec) != PARM_DECL + || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field, + &delta_field)) + return NULL_TREE; +} + else if (TREE_CODE (rec) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (rec)) +{ + if (TREE_CODE (SSA_NAME_VAR (rec)) != PARM_DECL + || !type_like_member_ptr_p (TREE_TYPE (TREE_TYPE (rec)), &ptr_field, + &delta_field)) + return NULL_TREE; +} + else return NULL_TREE; - ref_offset = TREE_OPERAND (rhs, 1); + ref_offset = TREE_OPERAND (rhs, 1); if (use_delta) fld = delta_field; else @@ -2757,17 +2769,31 @@ ipa_analyze_indirect_call_uses (struct ipa_func_body_info *fbi, gcall *call, if (rec != rec2) return; - index = ipa_get_param_decl_index (info, rec); - if (index >= 0 - && parm_preserved_before_stmt_p (fbi, index, call, rec)) + if (TREE_CODE (rec) == SSA_NAME) { - struct cgraph_edge *cs = ipa_note_param_call (fbi->node, index, - call, false); - cs->indirect_info->offset = offset; - cs->indirect_info->agg_contents = 1; - cs->indirect_info->member_ptr = 1; - cs->indirect_info->guaranteed_unmodified = 1; + index = ipa_get_param_decl_index (info, SSA_NAME_VAR (rec)); + if (index < 0 + || !parm_ref_data_preserved_p (fbi, index, call, +gimple_assign_rhs1 (def))) + return; + by_ref = true; } + else +{ + index = ipa_get_param_decl_index (info, rec); + if (index < 0 + || !parm_preserved_before_stmt_p (fbi, index, call, rec)) + return; + by_ref = false; +} + + struct cgraph_edge *cs = ipa_note_param_call (fbi->node, index, + call, false); + cs->indirect_info->offset = offset; + cs->indirect_info->agg_contents = 1; + cs->indirect_info->member_ptr = 1; + cs->indirect_info->by_ref = by_ref; + cs->indirect_info->guaranteed_unmodified = 1; return; } diff --git a/gcc/testsuite/g++.dg/ipa/pr108802.C b/gcc/testsuite/g++.dg/ipa/pr108802.C new file mode 100644 index 000..2e2b6c66b64 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr108802.C @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -std=c++14 -fdump-ipa-inline -fno-early-inlining" } */ +/* { dg-add-options bind_pic_locally } */ + +struct A { +int interesting(int x) { return 2 * x; } +}; + +int f1() { +A a; +return [&](auto&& f) { return (a.*f)(42); } (&A::interesting); +} + +/* { dg-final { scan-ipa-dump "A::interesting\[^\\n\]*inline copy in int f1" "inline" } } */ -- 2.43.0
[PATCH] ipa: Convert lattices from pure array to vector (PR 113476)
On Tue, Feb 13 2024, Martin Jambor wrote: > On Mon, Feb 12 2024, Jan Hubicka wrote: >>> Believe it or not, even though I have re-worked the internals of the >>> lattices completely, the array itself is older than my involvement with >>> GCC (or at least with ipa-cp.c ;-). >>> >>> So it being an array and not a vector is historical coincidence, as far >>> as I am concerned :-). But that may be the reason, or because vector >>> macros at that time looked scary, or perhaps the initialization by >>> XCNEWVEC zeroing everything out was considered attractive (I kind of >>> like that but constructors would probably be cleaner), I don't know. >> >> If your class is no longer a POD, then the clearing before construcion >> is dead and GCC may optimize it out. So fixing this may solve some >> surprised in foreseable future when we will try to compile older GCC's >> with newer ones. >> > > That's a good point. I'll prepare a patch converting the whole thing to > use constructors and vectors. > In PR 113476 we have discovered that ipcp_param_lattices is no longer a POD and should be destructed. In a follow-up discussion it transpired that their initialization done by memsetting their backing memory to zero is also invalid because now any write there before construction can be considered dead. Plus that having them in an array is a little bit old-school and does not get the extra checking offered by vector along with automatic construction and destruction when necessary. So this patch converts the array to a vector. That however means that ipcp_param_lattices cannot be just a forward declared type but must be known to all code that deal with ipa_node_params and thus to all code that includes ipa-prop.h. Therefore I have moved ipcp_param_lattices and the type it depends on to a new header ipa-cp.h which now ipa-prop.h depends on. Because we have the (IMHO not a very wise) rule that headers don't include what they need themselves, I had to add inclusions of ipa-cp.h and sreal.h (on which it depends) to very many files, which made the patch rather ugly. Bootstrapped and tested on x86_64-linux. I also had it checked by our script which builds more than a hundred of cross-compilers, so other targets are hopefully also fine. OK for master? Martin gcc/lto/ChangeLog: 2024-02-16 Martin Jambor * lto-common.cc: Include sreal.h and ipa-cp.h. * lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher. * lto.cc: Include sreal.h and ipa-cp.h. gcc/ChangeLog: 2024-02-16 Martin Jambor * ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust initializers in the contructor. (ipa_node_params::~ipa_node_params): Release lattices as a vector. * ipa-cp.h: New file. * ipa-cp.cc: Include sreal.h and ipa-cp.h. (ipcp_value_source): Move to ipa-cp.h. (ipcp_value_base): Likewise. (ipcp_value): Likewise. (ipcp_lattice): Likewise. (ipcp_agg_lattice): Likewise. (ipcp_bits_lattice): Likewise. (ipcp_vr_lattice): Likewise. (ipcp_param_lattices): Likewise. (ipa_get_parm_lattices): Remove assert latticess is non-NULL). (ipa_value_from_jfunc): Adjust a check for empty lattices. (ipa_context_from_jfunc): Likewise. (ipa_agg_value_from_jfunc): Likewise. (merge_agg_lats_step): Do not memset new aggregate lattices to zero. (ipcp_propagate_stage): Allocate lattices in a vector as opposed to just in contiguous memory. (ipcp_store_vr_results): Adjust a check for empty lattices. * auto-profile.cc: Include sreal.h and ipa-cp.h. * cgraph.cc: Likewise. * cgraphclones.cc: Likewise. * cgraphunit.cc: Likewise. * config/aarch64/aarch64.cc: Likewise. * config/i386/i386-builtins.cc: Likewise. * config/i386/i386-expand.cc: Likewise. * config/i386/i386-features.cc: Likewise. * config/i386/i386-options.cc: Likewise. * config/i386/i386.cc: Likewise. * config/rs6000/rs6000.cc: Likewise. * config/s390/s390.cc: Likewise. * gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the files to be included in gtype-desc.cc. * gimple-range-fold.cc: Include sreal.h and ipa-cp.h. * ipa-devirt.cc: Likewise. * ipa-fnsummary.cc: Likewise. * ipa-icf.cc: Likewise. * ipa-inline-analysis.cc: Likewise. * ipa-inline-transform.cc: Likewise. * ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher. * ipa-modref.cc: Include sreal.h and ipa-cp.h. * ipa-param-manipulation.cc: Likewise. * ipa-predicate.cc: Likewise. * ipa-profile.cc: Likewise. * ipa-prop.cc: Likewise. (ipa_n
[PATCH] testsuite: Fix guality/ipa-sra-1.c to work with return IPA-VRP
Hi, the test guality/ipa-sra-1.c stopped working after r14-5628-g53ba8d669550d3 because the variable from which the values of removed parameters could be calculated is also removed with it. Fixed with this patch which stops a function from returning a constant. I have also noticed that the XFAILed test passes at -O0 -O1 and -Og on all (three) targets I have tried, not just aarch64, so I extended the xfail exception accordingly. Tested by running make -k check-gcc RUNTESTFLAGS="guality.exp=ipa-sra-1.c" on x86_64-linux, aarch64-linux and ppc64le-linux. I hope it is obvious change for me to commit without approval which I will do later today. Thanks, Martin gcc/testsuite/ChangeLog: 2024-02-14 Martin Jambor * gcc.dg/guality/ipa-sra-1.c (get_val1): Move up in the file. (get_val2): Likewise. (bar): Do not return a constant. Extend xfail exception for all targets. --- gcc/testsuite/gcc.dg/guality/ipa-sra-1.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c b/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c index 9ef4eac93a7..55267c6f838 100644 --- a/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c +++ b/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c @@ -1,6 +1,10 @@ /* { dg-do run } */ /* { dg-options "-g -fno-ipa-icf" } */ +int __attribute__((noipa)) +get_val1 (void) {return 20;} +int __attribute__((noipa)) +get_val2 (void) {return 7;} void __attribute__((noipa)) use (int x) @@ -12,8 +16,8 @@ static int __attribute__((noinline)) bar (int i, int k) { asm ("" : "+r" (i)); - use (i); /* { dg-final { gdb-test . "k" "3" { xfail { ! { aarch64*-*-* && { any-opts "-O0" "-O1" "-Og" } } } } } } */ - return 6; + use (i); /* { dg-final { gdb-test . "k" "3" { xfail { ! { *-*-*-* && { any-opts "-O0" "-O1" "-Og" } } } } } } */ + return 6 + get_val1(); } volatile int v; @@ -30,11 +34,6 @@ foo (int i, int k) volatile int v; -int __attribute__((noipa)) -get_val1 (void) {return 20;} -int __attribute__((noipa)) -get_val2 (void) {return 7;} - int main (void) { -- 2.43.0
Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)
Hi, On Mon, Feb 12 2024, Jan Hubicka wrote: >> Hi, >> >> In PR 113476 we have discovered that ipcp_param_lattices is no longer >> a POD and should be destructed. This patch does that, calling >> destructor on each element of the array containing them when the >> corresponding summary of a node is freed. An alternative would be to >> change the XCNEWVEC-and-placement-new to initializations in >> constructors of all things in ipcp_param_lattices and then simply use >> normal operators new and delete. I am not sure, the initialization >> through XCNEWVEC may be a bit more efficient although that is probably >> not a big concern. In the end, I opted for a simpler solution for >> stage 4. >> >> I have verified that valgrind no longer reports lost memory blocks >> allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source >> (dwarf2out.i) attached to Bugzilla. The patch also passes bootstrap and >> LTO bootstrap and testing on x86_64-linux. >> >> OK for master? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2024-02-09 Martin Jambor >> >> PR tree-optimization/113476 >> * ipa-prop.h (ipa_node_params::~ipa_node_params): Moved... >> * ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here. Added >> destruction of lattices. > > OK. > So you do not use vectors (which would also handle the destruction) > basically to save space needed to keep the > size of the vector since that is known from the parameter count? > OK, so when I started looking at converting lattices to vector, it immediately became clear why it is an array. The type of the element of the array (ipcp_param_lattices and all it contains) is only forward declared in ipa-prop.h where ipa_node_params is defined which can therefore just contain a pointer. The actual definition of ipcp_param_lattices is then done only in ipa-cp.c. Converting the array to a vector would means moving ipcp_param_lattices together with ipcp_lattice, ipcp_value, ipcp_value_base, ipcp_agg_lattice, ipcp_bits_lattice, ipcp_vr_lattice from ipa-cp.c to ipa-prop.h. Or an ipa-cp.h which ipa-prop.h would require/include. But perhaps that is the proper C++ thing to do :-/ Martin
Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)
On Mon, Feb 12 2024, Jan Hubicka wrote: >> Believe it or not, even though I have re-worked the internals of the >> lattices completely, the array itself is older than my involvement with >> GCC (or at least with ipa-cp.c ;-). >> >> So it being an array and not a vector is historical coincidence, as far >> as I am concerned :-). But that may be the reason, or because vector >> macros at that time looked scary, or perhaps the initialization by >> XCNEWVEC zeroing everything out was considered attractive (I kind of >> like that but constructors would probably be cleaner), I don't know. > > If your class is no longer a POD, then the clearing before construcion > is dead and GCC may optimize it out. So fixing this may solve some > surprised in foreseable future when we will try to compile older GCC's > with newer ones. > That's a good point. I'll prepare a patch converting the whole thing to use constructors and vectors. Thanks, Martin
Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)
On Mon, Feb 12 2024, Jan Hubicka wrote: >> Hi, >> >> In PR 113476 we have discovered that ipcp_param_lattices is no longer >> a POD and should be destructed. This patch does that, calling >> destructor on each element of the array containing them when the >> corresponding summary of a node is freed. An alternative would be to >> change the XCNEWVEC-and-placement-new to initializations in >> constructors of all things in ipcp_param_lattices and then simply use >> normal operators new and delete. I am not sure, the initialization >> through XCNEWVEC may be a bit more efficient although that is probably >> not a big concern. In the end, I opted for a simpler solution for >> stage 4. >> >> I have verified that valgrind no longer reports lost memory blocks >> allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source >> (dwarf2out.i) attached to Bugzilla. The patch also passes bootstrap and >> LTO bootstrap and testing on x86_64-linux. >> >> OK for master? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2024-02-09 Martin Jambor >> >> PR tree-optimization/113476 >> * ipa-prop.h (ipa_node_params::~ipa_node_params): Moved... >> * ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here. Added >> destruction of lattices. > > OK. > So you do not use vectors (which would also handle the destruction) > basically to save space needed to keep the > size of the vector since that is known from the parameter count? > Believe it or not, even though I have re-worked the internals of the lattices completely, the array itself is older than my involvement with GCC (or at least with ipa-cp.c ;-). So it being an array and not a vector is historical coincidence, as far as I am concerned :-). But that may be the reason, or because vector macros at that time looked scary, or perhaps the initialization by XCNEWVEC zeroing everything out was considered attractive (I kind of like that but constructors would probably be cleaner), I don't know. Martin
[PATCH] ipa: call destructors on lattices before freeing them (PR 113476)
Hi, In PR 113476 we have discovered that ipcp_param_lattices is no longer a POD and should be destructed. This patch does that, calling destructor on each element of the array containing them when the corresponding summary of a node is freed. An alternative would be to change the XCNEWVEC-and-placement-new to initializations in constructors of all things in ipcp_param_lattices and then simply use normal operators new and delete. I am not sure, the initialization through XCNEWVEC may be a bit more efficient although that is probably not a big concern. In the end, I opted for a simpler solution for stage 4. I have verified that valgrind no longer reports lost memory blocks allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source (dwarf2out.i) attached to Bugzilla. The patch also passes bootstrap and LTO bootstrap and testing on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2024-02-09 Martin Jambor PR tree-optimization/113476 * ipa-prop.h (ipa_node_params::~ipa_node_params): Moved... * ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here. Added destruction of lattices. --- gcc/ipa-cp.cc | 17 + gcc/ipa-prop.h | 9 - 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index e85477df32d..9864ff052de 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -399,6 +399,23 @@ public: bool virt_call; }; +/* Destructor of node function summary, placed here because it mainly must + destruct value range lattices not known outside of this source file. */ + +ipa_node_params::~ipa_node_params () +{ + if (lattices) +{ + int count = ipa_get_param_count (this); + for (int i = 0; i < count; i++) + lattices[i].~ipcp_param_lattices (); + free (lattices); +} + vec_free (descriptors); + known_csts.release (); + known_contexts.release (); +} + /* Allocation pools for values and their sources in ipa-cp. */ object_allocator > ipcp_cst_values_pool diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index 9c78dc9f486..fe401640824 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -670,15 +670,6 @@ ipa_node_params::ipa_node_params () { } -inline -ipa_node_params::~ipa_node_params () -{ - free (lattices); - vec_free (descriptors); - known_csts.release (); - known_contexts.release (); -} - /* Intermediate information that we get from alias analysis about a particular parameter in a particular basic_block. When a parameter or the memory it references is marked modified, we use that information in all dominated -- 2.43.0
Re: [RFC] GCC Security policy
Hi, On Fri, Feb 09 2024, Siddhesh Poyarekar wrote: > On 2024-02-09 10:38, Martin Jambor wrote: >> If anyone is interested in scoping this and then mentoring this as a >> Google Summer of Code project this year then now is the right time to >> speak up! > > I can help with mentoring and reviews, although I'll need someone to > assist with actual approvals. I'm sure that we could manage that. The project does not look like it would be a huge one. > > There are two distinct sets of ideas to explore, one is privilege > management and the other sandboxing. > > For privilege management we could add a --allow-root driver flag that > allows gcc to run as root. Without the flag one could either outright > refuse to run or drop privileges and run. Dropping privileges will be a > bit tricky to implement because it would need a user to drop privileges > to and then there would be the question of how to manage file access to > read the compiler input and write out the compiler output. If there's > no such user, gcc could refuse to run as root by default. I wonder > though if from a security posture perspective it makes sense to simply > discourage running as root all the time and not bother trying to make it > work with dropped privileges and all that. Of course it would mean that > this would be less of a "project"; it'll be a simple enough patch to > refuse to run until --allow-root is specified. Yeah, this would not be enough for a GSoC project, not even for their new small project category. Additionally, I think that many, if not all, Linux distributions that build binary packages do it in a VM/container/chroot where they do it simply under root because the whole environment is there just for the build. So this would complicate lives for an important set of our users. > > This probably ties in somewhat with an idea David Malcolm had riffed on > with me earlier, of caching files for diagnostics. If we could unify > file accesses somehow, we could make this happen, i.e. open/read files > as root and then do all execution as non-root. > > Sandboxing will have similar requirements, i.e. map in input files and > an output file handle upfront and then unshare() into a sandbox to do > the actual compilation. This will make sure that at least the > processing of inputs does not affect the system on which the compilation > is being run. Right. As we often just download some (sometimes large) pre-processed source from Bugzilla and then happily run GCC on it on our computers, this feature might be actually useful for us (still, we'd probably need a more concrete description of what we want, would e.g. using "-wrapper gdb,--args" work in such a sandbox?). I agree that for some even semi-complex builds, a more general sandboxing solution is probably better. Martin
Re: [RFC] GCC Security policy
Hi, On Tue, Aug 08 2023, Richard Biener via Gcc-patches wrote: > On Tue, Aug 8, 2023 at 2:33 PM Siddhesh Poyarekar wrote: >> >> On 2023-08-08 04:16, Richard Biener wrote: >> > On Mon, Aug 7, 2023 at 7:30 PM David Edelsohn via Gcc-patches >> > wrote: >> >> >> >> FOSS Best Practices recommends that projects have an official Security >> >> policy stated in a SECURITY.md or SECURITY.txt file at the root of the >> >> repository. GLIBC and Binutils have added such documents. >> >> >> >> Appended is a prototype for a Security policy file for GCC based on the >> >> Binutils document because GCC seems to have more affinity with Binutils as >> >> a tool. Do the runtime libraries distributed with GCC, especially libgcc, >> >> require additional security policies? >> >> >> >> [ ] Is it appropriate to use the Binutils SECURITY.txt as the starting >> >> point or should GCC use GLIBC SECURITY.md as the starting point for the >> >> GCC >> >> Security policy? >> >> >> >> [ ] Does GCC, or some components of GCC, require additional care because >> >> of >> >> runtime libraries like libgcc and libstdc++, and because of gcov and >> >> profile-directed feedback? >> > >> > I do think that the runtime libraries should at least be explicitly >> > mentioned >> > because they fall into the "generated output" category and bugs in the >> > runtime are usually more severe as affecting a wider class of inputs. >> >> Ack, I'd expect libstdc++ and libgcc to be aligned with glibc's >> policies. libiberty and others on the other hand, would probably be >> more suitably aligned with binutils libbfd, where we assume trusted input. >> >> >> Thoughts? >> >> >> >> Thanks, David >> >> >> >> GCC Security Process >> >> >> >> >> >> What is a GCC security bug? >> >> === >> >> >> >> A security bug is one that threatens the security of a system or >> >> network, or might compromise the security of data stored on it. >> >> In the context of GCC there are two ways in which such >> >> bugs might occur. In the first, the programs themselves might be >> >> tricked into a direct compromise of security. In the second, the >> >> tools might introduce a vulnerability in the generated output that >> >> was not already present in the files used as input. >> >> >> >> Other than that, all other bugs will be treated as non-security >> >> issues. This does not mean that they will be ignored, just that >> >> they will not be given the priority that is given to security bugs. >> >> >> >> This stance applies to the creation tools in the GCC (e.g., >> >> gcc, g++, gfortran, gccgo, gccrs, gnat, cpp, gcov, etc.) and the >> >> libraries that they use. >> >> >> >> Notes: >> >> == >> >> >> >> None of the programs in GCC need elevated privileges to operate and >> >> it is recommended that users do not use them from accounts where such >> >> privileges are automatically available. >> > >> > I'll note that we could ourselves mitigate some of that by handling >> > privileged >> > invocation of the driver specially, dropping privs on exec of the sibling >> > tools >> > and possibly using temporary files or pipes to do the parts of the I/O that >> > need to be privileged. >> >> It's not a bad idea, but it ends up giving legitimizing running the >> compiler as root, pushing the responsibility of privilege management to >> the driver. How about rejecting invocation as root altogether by >> default, bypassed with a --run-as-root flag instead? >> >> I've also been thinking about a --sandbox flag that isolates the build >> process (for gcc as well as binutils) into a separate namespace so that >> it's usable in a restricted mode on untrusted sources without exposing >> the rest of the system to it. > > There's probably external tools to do this, not sure if we should replicate > things in the driver for this. > > But sure, I think the driver is the proper point to address any of such > issues - iff we want to address them at all. Maybe a nice little > google summer-of-code project ;) > If anyone is interested in scoping this and then mentoring this as a Google Summer of Code project this year then now is the right time to speak up! Thanks, Martin
[PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)
Hi, PR 113757 shows that the code which was meant to debug-reset and remove SSAs defined by LHSs of calls redirected to __builtin_unreachable can trigger also when speculative devirtualization creates a call to a noreturn function (and since it is noreturn, it does not bother dealing with its return value). What is more, it seems that the code handling this case is not really necessary. I feel slightly idiotic about this because I have a feeling that I added it because of a failing test-case but I can neither find the testcase nor a reason why the code in cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it turns the SSA name into a default-def, a bit like IPA-SRA, but any code dominated by a call to a noreturn is not dangerous when it comes to its side-effects). So this patch just removes the handling. Bootstrapped and tested on x86_64-linux and ppc64le-linux. I have also LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2024-02-07 Martin Jambor PR ipa/113757 * tree-inline.cc (redirect_all_calls): Remove code adding SSAs to id->killed_new_ssa_names. gcc/testsuite/ChangeLog: 2024-02-07 Martin Jambor PR ipa/113757 * g++.dg/ipa/pr113757.C: New test. --- gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++ gcc/tree-inline.cc | 14 ++ 2 files changed, 16 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C b/gcc/testsuite/g++.dg/ipa/pr113757.C new file mode 100644 index 000..885d4010a10 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr113757.C @@ -0,0 +1,14 @@ +// { dg-do compile } +// { dg-options "-O2 -fPIC" } +// { dg-require-effective-target fpic } + +long size(); +struct ll { virtual int hh(); }; +ll *slice_owner; +int ll::hh() { __builtin_exit(0); } +int nn() { + if (size()) +return 0; + return slice_owner->hh(); +} +int (*a)() = nn; diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index 75c10eb7dfc..cac41b4f031 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, basic_block bb) gimple *stmt = gsi_stmt (si); if (is_gimple_call (stmt)) { - tree old_lhs = gimple_call_lhs (stmt); struct cgraph_edge *edge = id->dst_node->get_edge (stmt); if (edge) { if (!id->killed_new_ssa_names) id->killed_new_ssa_names = new hash_set (16); - gimple *new_stmt - = cgraph_edge::redirect_call_stmt_to_callee (edge, - id->killed_new_ssa_names); - if (old_lhs - && TREE_CODE (old_lhs) == SSA_NAME - && !gimple_call_lhs (new_stmt)) - /* In case of IPA-SRA removing the LHS, the name should have - been already added to the hash. But in case of redirecting - to builtin_unreachable it was not and the name still should - be pruned from debug statements. */ - id->killed_new_ssa_names->add (old_lhs); + cgraph_edge::redirect_call_stmt_to_callee (edge, + id->killed_new_ssa_names); if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt)) gimple_purge_dead_eh_edges (bb); -- 2.43.0
Re: [PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)
Hi, On Mon, Jan 22 2024, Jan Hubicka wrote: >> Hi, >> >> When the check for exceeding param_ipa_cp_value_list_size limit was >> modified to be ignored for generating values from self-recursive >> calls, it should have been changed from equal to, to equals toor is >> greater than. This omission manifests itself as PR 113490. >> >> When I examined the condition I also noticed that the parameter should >> come from the callee rather than the caller, since the value list is >> associated with the former and not the latter. In practice the limit >> is of course very likely to be the same, but I fixed this aspect of >> the condition too. I briefly audited all other uses of opt_for_fn in >> ipa-cp.cc and all the others looked OK. >> >> Bootstrapped and tested on x86_64-linux. OK for master? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2024-01-19 Martin Jambor >> >> PR ipa/113490 >> * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value >> count is equal or greater than the limit. Use the limit from the >> callee. >> >> gcc/testsuite/ChangeLog: >> >> 2024-01-19 Martin Jambor >> >> PR ipa/113490 >> * gcc.dg/ipa/pr113490.c: New test. > OK, > thanks! thank you, I have pushed the following, which has a tweak in the added test so that it is only run on targets which support the required vectors. Martin When the check for exceeding param_ipa_cp_value_list_size limit was modified to be ignored for generating values from self-recursive calls, it should have been changed from equal to, to equals to or is greater than. This omission manifests itself as PR 113490. When I examined the condition I also noticed that the parameter should come from the callee rather than the caller, since the value list is associated with the former and not the latter. In practice the limit is of course very likely to be the same, but I fixed this aspect of the condition too. I briefly audited all other uses of opt_for_fn in ipa-cp.cc and all the others looked OK. gcc/ChangeLog: 2024-01-19 Martin Jambor PR ipa/113490 * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value count is equal or greater than the limit. Use the limit from the callee. gcc/testsuite/ChangeLog: 2024-01-22 Martin Jambor PR ipa/113490 * gcc.dg/ipa/pr113490.c: New test. --- gcc/ipa-cp.cc | 2 +- gcc/testsuite/gcc.dg/ipa/pr113490.c | 31 + 2 files changed, 32 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr113490.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index b1e2a3a829a..e85477df32d 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -2298,7 +2298,7 @@ ipcp_lattice::add_value (valtype newval, cgraph_edge *cs, return false; } - if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl, + if (!same_lat_gen_level && values_count >= opt_for_fn (cs->callee->decl, param_ipa_cp_value_list_size)) { /* We can only free sources, not the values themselves, because sources diff --git a/gcc/testsuite/gcc.dg/ipa/pr113490.c b/gcc/testsuite/gcc.dg/ipa/pr113490.c new file mode 100644 index 000..526e22b3787 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr113490.c @@ -0,0 +1,31 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -Wno-psabi" } */ + +typedef char A __attribute__((vector_size (64))); +typedef short B __attribute__((vector_size (64))); +typedef unsigned C __attribute__((vector_size (64))); +typedef long D __attribute__((vector_size (64))); +typedef __int128 E __attribute__((vector_size (64))); + +D bar1_D_0; +E bar4 (A, D); + +E +bar1 (C C_0) +{ + C_0 >>= 1; + bar4 ((A) C_0, bar1_D_0); + bar4 ((A) (E) {~0 }, (D) (A){ ~0 }); + bar4 ((A) (B) { ~0 }, (D) (C) { ~0 }); + bar1 ((C) (D){ 0, ~0}); + bar4 ((A) C_0, bar1_D_0); + (A) { bar1 ((C) { 7})[5] - C_0[63], bar4 ((A) (D) {~0}, (D) (C) { 0, ~0})[3]}; +} + +E +bar4 (A A_0, D D_0) +{ + bar1 ((C) A_0); + bar1 ((C) {5}); + bar1 ((C) D_0); +} -- 2.43.0
[PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)
Hi, When the check for exceeding param_ipa_cp_value_list_size limit was modified to be ignored for generating values from self-recursive calls, it should have been changed from equal to, to equals toor is greater than. This omission manifests itself as PR 113490. When I examined the condition I also noticed that the parameter should come from the callee rather than the caller, since the value list is associated with the former and not the latter. In practice the limit is of course very likely to be the same, but I fixed this aspect of the condition too. I briefly audited all other uses of opt_for_fn in ipa-cp.cc and all the others looked OK. Bootstrapped and tested on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2024-01-19 Martin Jambor PR ipa/113490 * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value count is equal or greater than the limit. Use the limit from the callee. gcc/testsuite/ChangeLog: 2024-01-19 Martin Jambor PR ipa/113490 * gcc.dg/ipa/pr113490.c: New test. --- gcc/ipa-cp.cc | 2 +- gcc/testsuite/gcc.dg/ipa/pr113490.c | 31 + 2 files changed, 32 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr113490.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index b1e2a3a829a..e85477df32d 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -2298,7 +2298,7 @@ ipcp_lattice::add_value (valtype newval, cgraph_edge *cs, return false; } - if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl, + if (!same_lat_gen_level && values_count >= opt_for_fn (cs->callee->decl, param_ipa_cp_value_list_size)) { /* We can only free sources, not the values themselves, because sources diff --git a/gcc/testsuite/gcc.dg/ipa/pr113490.c b/gcc/testsuite/gcc.dg/ipa/pr113490.c new file mode 100644 index 000..cffb0c5f639 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr113490.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -Wno-psabi" } */ + +typedef char A __attribute__((vector_size (64))); +typedef short B __attribute__((vector_size (64))); +typedef unsigned C __attribute__((vector_size (64))); +typedef long D __attribute__((vector_size (64))); +typedef __int128 E __attribute__((vector_size (64))); + +D bar1_D_0; +E bar4 (A, D); + +E +bar1 (C C_0) +{ + C_0 >>= 1; + bar4 ((A) C_0, bar1_D_0); + bar4 ((A) (E) {~0 }, (D) (A){ ~0 }); + bar4 ((A) (B) { ~0 }, (D) (C) { ~0 }); + bar1 ((C) (D){ 0, ~0}); + bar4 ((A) C_0, bar1_D_0); + (A) { bar1 ((C) { 7})[5] - C_0[63], bar4 ((A) (D) {~0}, (D) (C) { 0, ~0})[3]}; +} + +E +bar4 (A A_0, D D_0) +{ + bar1 ((C) A_0); + bar1 ((C) {5}); + bar1 ((C) D_0); +} -- 2.43.0
[PATCH] sra: Disqualify bases of operands of asm gotos
Hi, PR 110422 shows that SRA can ICE assuming there is a single edge outgoing from a block terminated with an asm goto. We need that for BB-terminating statements so that any adjustments they make to the aggregates can be copied over to their replacements. Because we can't have that after ASM gotos, we need to punt. Bootstrapped and tested on x86_64-linux, OK for master? It will need some tweaking for release branches, is it in principle OK for them too (after testing)? Thanks, Martin gcc/ChangeLog: 2024-01-17 Martin Jambor PR tree-optimization/110422 * tree-sra.cc (scan_function): Disqualify bases of operands of asm gotos. gcc/testsuite/ChangeLog: 2024-01-17 Martin Jambor PR tree-optimization/110422 * gcc.dg/torture/pr110422.c: New test. --- gcc/testsuite/gcc.dg/torture/pr110422.c | 10 + gcc/tree-sra.cc | 29 - 2 files changed, 33 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c b/gcc/testsuite/gcc.dg/torture/pr110422.c new file mode 100644 index 000..2e171a7a19e --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr110422.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ + +struct T { int x; }; +int foo(void) { + struct T v; + asm goto("" : "+r"(v.x) : : : lab); + return 0; +lab: + return -5; +} diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 6a1141b7377..f8e71ec48b9 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -1559,15 +1559,32 @@ scan_function (void) case GIMPLE_ASM: { gasm *asm_stmt = as_a (stmt); - for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++) + if (stmt_ends_bb_p (asm_stmt) + && !single_succ_p (gimple_bb (asm_stmt))) { - t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i)); - ret |= build_access_from_expr (t, asm_stmt, false); + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++) + { + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i)); + disqualify_base_of_expr (t, "OP of asm goto."); + } + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++) + { + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i)); + disqualify_base_of_expr (t, "OP of asm goto."); + } } - for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++) + else { - t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i)); - ret |= build_access_from_expr (t, asm_stmt, true); + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++) + { + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i)); + ret |= build_access_from_expr (t, asm_stmt, false); + } + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++) + { + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i)); + ret |= build_access_from_expr (t, asm_stmt, true); + } } } break; -- 2.43.0
Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]
Hi, On Wed, Jan 10 2024, Jakub Jelinek wrote: > Hi! > > As changed in other parts of the compiler, using > build_nonstandard_integer_type is not appropriate for arbitrary precisions, > especially if the precision comes from a BITINT_TYPE or something based on > that, build_nonstandard_integer_type relies on some integral mode being > supported that can support the precision. > > The following patch uses build_bitint_type instead for BITINT_TYPE > precisions. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Note, it would be good if we were able to punt on the optimization > (but this code doesn't seem to be able to punt, so it needs to be done > somewhere earlier) at least in cases where building it would be invalid. > E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive), > but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION). > I've tried to replace 513 with 65532 in the testcase and it didn't ICE, > so maybe it ran into some other SRA limit. Thank you very much for the patch. Regarding punting, did you mean for all BITINT_TYPEs or just for big ones, like you did when you fixed PR 11333 (thanks for that too) or something entirely else? Martin > > 2024-01-10 Jakub Jelinek > > PR tree-optimization/113120 > * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE > with root->size TYPE_PRECISION don't build anything new. > Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type > rather than build_nonstandard_integer_type. > > * gcc.dg/bitint-63.c: New test.
[PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)
Hi, PR 108007 is another manifestation where we rely on DCE to clean-up after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA can leave behind statements which are fed uninitialized values and trap, even though their results are themselves never used. I have already fixed this for unused parameters in callees, this bug shows that almost the same thing can happen for removed returns, on the side of callers. This means that the issue has to be fixed elsewhere, in call redirection. This patch adds a function which looks for (and through, using a work-list) uses of operations fed specific SSA names and removes them all. That would have been easy if it wasn't for debug statements during tree-inline (from which call redirection is also invoked). Debug statements are decoupled from the rest at this point and iterating over uses of SSAs does not bring them up. During tree-inline they are handled especially at the end, I assume in order to make sure that relative ordering of UIDs are the same with and without debug info. This means that during tree-inline we need to make a hash of killed SSAs, that we already have in copy_body_data, available to the function making the purging. So the patch duly does also that, making the interface slightly ugly. Moreover, all newly unused SSA names need to be freed and as PR 112616 showed, it must be done in a defined order, which is what newly added ipa_release_ssas_in_hash does. The only difference from the patch which has already been approved in September but which I later had to revert is (one function name and) that SSAs that are to be released are first put into an auto_vec and sorted according their version number to avoid issues like PR 112616. The patch has passed bootstrap, LTO-bootstrap and profiled-LTO-bootstrap and testing on x86_64-linux, bootstrap, LTO-bootstrap and testing on ppc64le-linux and bootstrap and LTO-bootstrap on Aarch64, testsuite there is still running, OK if it passes? Thanks Martin gcc/ChangeLog: 2024-01-12 Martin Jambor PR ipa/108007 PR ipa/112616 * cgraph.h (cgraph_edge): Add a parameter to redirect_call_stmt_to_callee. * ipa-param-manipulation.h (ipa_param_adjustments): Add a parameter to modify_call. (ipa_release_ssas_in_hash): Declare. * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New parameter killed_ssas, pass it to padjs->modify_call. * ipa-param-manipulation.cc (purge_all_uses): New function. (ipa_param_adjustments::modify_call): New parameter killed_ssas. Instead of substituting uses, invoke purge_all_uses. If hash of killed SSAs has not been provided, create a temporary one and release SSAs that have been added to it. (compare_ssa_versions): New function. (ipa_release_ssas_in_hash): Likewise. * tree-inline.cc (redirect_all_calls): Create id->killed_new_ssa_names earlier, pass it to edge redirection, adjust a comment. (copy_body): Release SSAs in id->killed_new_ssa_names. gcc/testsuite/ChangeLog: 2024-01-15 Martin Jambor PR ipa/108007 PR ipa/112616 * gcc.dg/ipa/pr108007.c: New test. * gcc.dg/ipa/pr112616.c: Likewise. --- gcc/cgraph.cc | 10 ++- gcc/cgraph.h| 9 ++- gcc/ipa-param-manipulation.cc | 112 ++-- gcc/ipa-param-manipulation.h| 5 +- gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 gcc/testsuite/gcc.dg/ipa/pr112616.c | 28 +++ gcc/tree-inline.cc | 27 --- 7 files changed, 184 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c create mode 100644 gcc/testsuite/gcc.dg/ipa/pr112616.c diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index d565c005f62..0ac8f73204b 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n) speculative indirect call, remove "speculative" of the indirect call and also redirect stmt to it's final direct target. + When called from within tree-inline, KILLED_SSAs has to contain the pointer + to killed_new_ssa_names within the copy_body_data structure and SSAs + discovered to be useless (if LHS is removed) will be added to it, otherwise + it needs to be NULL. + It is up to caller to iteratively transform each "speculative" direct call as appropriate. */ gimple * -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, + hash_set *killed_ssas) { tree decl = gimple_call_fndecl (e->call_stmt); gcall *new_stmt; @@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) remove_stmt_from_eh_lp (e->call_stmt); tree old_fntype = gimple_call_f
Re: [wwwdocs] gcc-14/changes.html: OpenMP - improve wording
Hi Tobias, On Mon, Jan 08 2024, Tobias Burnus wrote: > The attached patch there was no patch attached to your message. Martin > does a tiny updated to the OpenMP features (AMD GCN > now also has an optimized memcpy_rect not only nvptx), but the main > change is some shifting around to make it more consistent and better > readable. > > I intend to commit this relatively soon; like always, comments and > suggestions are welcome - be it before or after the commit. > > Current version: http://gcc.gnu.org/gcc-14/changes.html > > Thanks, > > Tobias
Re: [PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest
DAG 1577701 are aritificially in conflict with void * Modref stats: modref kill: 832 kills, 19399 queries modref use: 50760 disambiguations, 1825109 queries modref clobber: 1371014 disambiguations, 40152535 queries 5190238 tbaa queries (0.129263 per modref query) 1341663 base compares (0.033414 per modref query) PTA query stats: pt_solution_includes: 36784427 disambiguations, 46141175 queries pt_solutions_intersect: 4519387 disambiguations, 17081996 queries to: Alias oracle query stats: refs_may_alias_p: 94354083 disambiguations, 106278948 queries ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries call_may_clobber_ref_p: 649273 disambiguations, 659371 queries stmt_kills_ref_p: 142342 kills, 8407310 queries nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries aliasing_component_refs_p: 67104 disambiguations, 3081781 queries TBAA oracle: 22676608 disambiguations 61782455 queries 14044948 are in alias set 0 10998619 queries asked about the same object 153 queries asked about the same alias set 0 access volatile 12484882 are dependent in the DAG 1577245 are aritificially in conflict with void * Modref stats: modref kill: 832 kills, 19399 queries modref use: 50760 disambiguations, 1825106 queries modref clobber: 1371028 disambiguations, 40152504 queries 5190319 tbaa queries (0.129265 per modref query) 1341403 base compares (0.033408 per modref query) PTA query stats: pt_solution_includes: 36784449 disambiguations, 46141210 queries pt_solutions_intersect: 4519320 disambiguations, 17082083 queries gcc/ChangeLog: 2023-12-13 Martin Jambor PR tree-optimization/111807 * tree-sra.cc (build_ref_for_model): Allow offset smaller than model->offset when gsi is non-NULL. Adjust function comment. --- gcc/tree-sra.cc | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 3bd0c7a9af0..1dba721be11 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -1843,8 +1843,11 @@ build_reconstructed_reference (location_t, tree base, struct access *model) /* Construct a memory reference to a part of an aggregate BASE at the given OFFSET and of the same type as MODEL. In case this is a reference to a bit-field, the function will replicate the last component_ref of model's - expr to access it. GSI and INSERT_AFTER have the same meaning as in - build_ref_for_offset. */ + expr to access it. INSERT_AFTER and GSI have the same meaning as in + build_ref_for_offset, furthermore, when GSI is NULL, the function expects + that it re-builds the entire reference from a DECL to the final access and + so will create a MEM_REF when OFFSET does not exactly match offset of + MODEL. */ static tree build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, @@ -1874,7 +1877,8 @@ build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, && !TREE_THIS_VOLATILE (base) && (TYPE_ADDR_SPACE (TREE_TYPE (base)) == TYPE_ADDR_SPACE (TREE_TYPE (model->expr))) - && offset == model->offset + && (offset == model->offset + || (gsi && offset <= model->offset)) /* build_reconstructed_reference can still fail if we have already massaged BASE because of another type incompatibility. */ && (res = build_reconstructed_reference (loc, base, model))) -- 2.43.0
[PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)
Hi, PR 112822 revealed a corner case in load_assign_lhs_subreplacements where it creates invalid gimple: an assignment where on the LHS there is a complex variable which however is not a gimple register because it has partial defs and on the right hand side there is a VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on such statements (like it already does when both sides of a generated assignment have partial definitions. I've made sure the patch passes bootstrap and testsuite on x86_64-linux, the bug reporter was kind enough to also check the same on an powerpc64le-linux (see bugzilla comment #8). The testcase has reasonable size but it is specific to ppc64le and its altivec vectors. My plan is to ask the bug reporter to massage it into a target specific testcase in bugzilla. Alternatively I can try to craft a testcase from scratch but that will take time. Despite the above, is the patch OK for master? Thanks, Martin gcc/ChangeLog: 2023-12-12 Martin Jambor PR tree-optimization/112822 * tree-sra.cc (load_assign_lhs_subreplacements): Invoke force_gimple_operand_gsi also when LHS has partial stores and RHS is a VIEW_CONVERT_EXPR. --- gcc/tree-sra.cc | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 3bd0c7a9af0..99a1b0a6d17 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -4219,11 +4219,15 @@ load_assign_lhs_subreplacements (struct access *lacc, if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); + bool vce = false; if (!useless_type_conversion_p (lacc->type, racc->type)) - rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR, - lacc->type, rhs); + { + rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR, +lacc->type, rhs); + vce = true; + } - if (racc->grp_partial_lhs && lacc->grp_partial_lhs) + if (lacc->grp_partial_lhs && (vce || racc->grp_partial_lhs)) rhs = force_gimple_operand_gsi (&sad->old_gsi, rhs, true, NULL_TREE, true, GSI_SAME_STMT); } -- 2.43.0
Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates
Hi, On Tue, Nov 28 2023, Jan Hubicka wrote: >> On Tue, 28 Nov 2023, Martin Jambor wrote: >> >> > On Tue, Nov 28 2023, Richard Biener wrote: >> > > On Mon, 27 Nov 2023, Martin Jambor wrote: >> > > >> > >> Hi, >> > >> >> > >> The enhancement to address PR 109849 contained an importsnt thinko, >> > >> and that any reference that is passed to a function and does not >> > >> escape, must also not happen to be aliased by the return value of the >> > >> function. This has quickly transpired as bugs PR 112711 and PR >> > >> 112721. >> > >> >> > >> Just as IPA-modref does a good enough job to allow us to rely on the >> > >> escaped set of variables, it sems to be doing well also on updating >> > >> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address >> > >> exactly the situation we need to avoid. Of course, if a call >> > >> statement ignores any returned value, we also do not need to check the >> > >> flag. >> > > >> > > But what about EAF_NOT_RETURNED_INDIRECTLY? Don't you need to >> > > verify the parameter doesn't escape through the return at all? >> > > >> > >> > I thought EAF_NOT_RETURNED_INDIRECTLY prohibits things like "return >> > param->next" but those are not a problem (whatever next points to cannot >> > be an SRA candidate and any ADDR_EXPR storing its address there would >> > trigger a disqualification or at least an assert). But I guess I am >> > wrong, what is actually the exact meaning of the flag? >> >> I thought it's return (x.ptr = param, &x); >> >> so the parameter is reachable from the return value. >> >> But let's Honza answer... > It is same difference as direct/indirect escape. so it check whether > values pointed to by arg can be possibly returned. Indeed maybe we > should think of better name - the other interpretation did not even > occur to me, but it makes sense. > Is my patch OK then? (Apart from making one of the testcases x86_64-only, as Andrew pointed out, which I wanted to do but the line somehow got lost. Making the testcase more general is fairly low on my contested TODO list and the testing depends on a specific instruction trapping.) Thanks, Martin
Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates
On Tue, Nov 28 2023, Richard Biener wrote: > On Mon, 27 Nov 2023, Martin Jambor wrote: > >> Hi, >> >> The enhancement to address PR 109849 contained an importsnt thinko, >> and that any reference that is passed to a function and does not >> escape, must also not happen to be aliased by the return value of the >> function. This has quickly transpired as bugs PR 112711 and PR >> 112721. >> >> Just as IPA-modref does a good enough job to allow us to rely on the >> escaped set of variables, it sems to be doing well also on updating >> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address >> exactly the situation we need to avoid. Of course, if a call >> statement ignores any returned value, we also do not need to check the >> flag. > > But what about EAF_NOT_RETURNED_INDIRECTLY? Don't you need to > verify the parameter doesn't escape through the return at all? > I thought EAF_NOT_RETURNED_INDIRECTLY prohibits things like "return param->next" but those are not a problem (whatever next points to cannot be an SRA candidate and any ADDR_EXPR storing its address there would trigger a disqualification or at least an assert). But I guess I am wrong, what is actually the exact meaning of the flag? Thanks, Martin
[PATCH] tree-sra: Avoid returns of references to SRA candidates
Hi, The enhancement to address PR 109849 contained an importsnt thinko, and that any reference that is passed to a function and does not escape, must also not happen to be aliased by the return value of the function. This has quickly transpired as bugs PR 112711 and PR 112721. Just as IPA-modref does a good enough job to allow us to rely on the escaped set of variables, it sems to be doing well also on updating EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address exactly the situation we need to avoid. Of course, if a call statement ignores any returned value, we also do not need to check the flag. Hopefully this does not pessimize things too much, I have verified that the PR 109849 testcae remains quick and so should also the benchmark it is derived from. The patch has passed bootstrap and testing on x86_64-linux, OK for master? Thanks, Martin gcc/ChangeLog: 2023-11-27 Martin Jambor PR tree-optimization/112711 PR tree-optimization/112721 * tree-sra.cc (build_access_from_call_arg): New parameter CAN_BE_RETURNED, disqualify any candidate passed by reference if it is true. Adjust leading comment. (scan_function): Pass appropriate value to CAN_BE_RETURNED of build_access_from_call_arg. gcc/testsuite/ChangeLog: 2023-11-27 Martin Jambor PR tree-optimization/112711 PR tree-optimization/112721 * g++.dg/tree-ssa/pr112711.C: New test. * gcc.dg/tree-ssa/pr112721.c: Likewise. --- gcc/testsuite/g++.dg/tree-ssa/pr112711.C | 31 ++ gcc/testsuite/gcc.dg/tree-ssa/pr112721.c | 26 +++ gcc/tree-sra.cc | 40 ++-- 3 files changed, 88 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr112711.C create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr112711.C b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C new file mode 100644 index 000..c04524b04a7 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-O1" } */ + +typedef int i32; +typedef unsigned int u32; + +static inline void write_i32(void *memory, i32 value) { + // swap i32 bytes as if it was u32: + u32 u_value = value; + value = __builtin_bswap32(u_value); + + // llvm infers '1' alignment from destination type + __builtin_memcpy(__builtin_assume_aligned(memory, 1), &value, sizeof(value)); +} + +__attribute__((noipa)) +static void bug (void) { + #define assert_eq(lhs, rhs) if (lhs != rhs) __builtin_trap() + + unsigned char data[5]; + write_i32(data, -1362446643); + assert_eq(data[0], 0xAE); + assert_eq(data[1], 0xCA); + write_i32(data + 1, -1362446643); + assert_eq(data[1], 0xAE); +} + +int main() { +bug(); +return 0; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c new file mode 100644 index 000..adf62613266 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-O1" } */ + +unsigned * volatile gv; + +struct a { + int b; +}; +int c, e; +long d; +unsigned * __attribute__((noinline)) +f(unsigned *g) { + for (; c;) +e = d; + return gv ? gv : g; +} +int main() { + int *h; + struct a i = {8}; + int *j = &i.b; + h = (unsigned *) f(j); + *h = 0; + if (i.b != 0) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 3a0d52675fe..6a759783990 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -1268,18 +1268,27 @@ abnormal_edge_after_stmt_p (gimple *stmt, enum out_edge_check *oe_check) } /* Scan expression EXPR which is an argument of a call and create access - structures for all accesses to candidates for scalarization. Return true if - any access has been inserted. STMT must be the statement from which the - expression is taken. */ + structures for all accesses to candidates for scalarization. Return true + if any access has been inserted. STMT must be the statement from which the + expression is taken. CAN_BE_RETURNED must be true if call argument flags + do not rule out that the argument is directly returned. OE_CHECK is used + to remember result of a test for abnormal outgoing edges after this + statement. */ static bool -build_access_from_call_arg (tree expr, gimple *stmt, +build_access_from_call_arg (tree expr, gimple *stmt, bool can_be_returned, enum out_edge_check *oe_check) { if (TREE_CODE (expr) == ADDR_EXPR) { tree base = get_base_address (TREE_OPERAND (expr, 0)); + if (can_be_returned) + { + disqualify_base_of_expr (base, "Address possibly returned, " + "leading to an alis SRA may not know."); + return false; +
Re: [PATCH] sra: SRA of non-escaped aggregates passed by reference to calls
Hello, thanks a lot for your review. On Fri, Nov 17 2023, Richard Biener wrote: > On Thu, 16 Nov 2023, Martin Jambor wrote: > >> Hello, >> >> PR109849 shows that a loop that heavily pushes and pops from a stack >> implemented by a C++ std::vec results in slow code, mainly because the >> vector structure is not split by SRA and so we end up in many loads >> and stores into it. This is because it is passed by reference >> to (re)allocation methods and so needs to live in memory, even though >> it does not escape from them and so we could SRA it if we >> re-constructed it before the call and then separated it to distinct >> replacements afterwards. >> >> This patch does exactly that, first relaxing the selection of >> candidates to also include those which are addressable but do not >> escape and then adding code to deal with the calls. The >> micro-benchmark that is also the (scan-dump) testcase in this patch >> runs twice as fast with it than with current trunk. Honza measured >> its effect on the libjxl benchmark and it almost closes the >> performance gap between Clang and GCC while not requiring excessive >> inlining and thus code growth. >> >> The patch disallows creation of replacements for such aggregates which >> are also accessed with a precision smaller than their size because I >> have observed that this led to excessive zero-extending of data >> leading to slow-downs of perlbench (on some CPUs). Apart from this >> case I have not noticed any regressions, at least not so far. >> >> Gimple call argument flags can tell if an argument is unused (and then >> we do not need to generate any statements for it) or if it is not >> written to and then we do not need to generate statements loading >> replacements from the original aggregate after the call statement. >> Unfortunately, we cannot symmetrically use flags that an aggregate is >> not read because to avoid re-constructing the aggregate before the >> call because flags don't tell which what parts of aggregates were not >> written to, so we load all replacements, and so all need to have the >> correct value before the call. >> >> The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on >> x86_64-linux and a very similar patch has also passed bootstrap and >> testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these >> two architectures but as I'm sending this). OK for master? >> >> Thanks, >> >> Martin >> [...] >> @@ -1920,10 +1981,19 @@ maybe_add_sra_candidate (tree var) >>reject (var, "not aggregate"); >>return false; >> } >> - /* Allow constant-pool entries that "need to live in memory". */ >> - if (needs_to_live_in_memory (var) && !constant_decl_p (var)) >> + >> + if ((is_global_var (var) >> + /* There are cases where non-addressable variables fail the >> + pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */ >> + || (TREE_ADDRESSABLE (var) >> + && pt_solution_includes (&cfun->gimple_df->escaped, var)) > > so it seems this is the "correctness" test rather than using > the call argument flags? I'll note that this may be overly > conservative. It is but with ipa-modref it seems to work fairly well. > > For the call handling above I wondered about return-slot-opt calls > where the address of the LHS escapes to the call - if the points-to > check is the correctness check that should still work out of course > (but subject to PR109945). Hm, I wonder what the implications of that PR is for SRA, but it is a different issue, this patch does not touch handling LHSs. > >> + || (TREE_CODE (var) == RESULT_DECL >> + && !DECL_BY_REFERENCE (var) >> + && aggregate_value_p (var, current_function_decl))) >> + /* Allow constant-pool entries that "need to live in memory". */ >> + && !constant_decl_p (var)) >> { >> - reject (var, "needs to live in memory"); >> + reject (var, "needs to live in memory and escapes or global"); >>return false; >> } >>if (TREE_THIS_VOLATILE (var)) >> @@ -2122,6 +2192,21 @@ sort_and_splice_var_accesses (tree var) >> gcc_assert (access->offset >= low >> && access->offset + access->size <= high); >> >> + if (INTEGRAL_TYPE_P (access->type) >> + && TYPE_PRECISION (access->type) != access->size >>
Re: libstdc++: Speed up push_back
Hello, On Thu, Nov 23 2023, Jonathan Wakely wrote: > On Thu, 23 Nov 2023 at 15:34, Jan Hubicka wrote: >> [...] >> >> I also wonder, if default operator new and malloc can be handled as not >> reading/modifying anything visible to the user code. > > No, there's no way to know if the default operator new is being used. > A replacement operator new could be provided at link-time. > > That's why we need -fsane-operator-new > Would it make sense to add -fsane-operator-new to -Ofast? Martin
[PATCH] Bump LTO_minor_version
Hi Richi, On Wed, Sep 20 2023, Richard Biener wrote: > The following turns MAX_NUM_CHAINS and MAX_CHAIN_LEN to params which > allows to experiment with raising them. For the testcase in PR111489 > raising MAX_CHAIN_LEN from 5 to 8 avoids the bogus diagnostics > at -O2, at -O3 we need a MAX_CHAIN_LEN of 6. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > PR tree-optimization/111489 > * doc/invoke.texi (--param uninit-max-chain-len): Document. > (--param uninit-max-num-chains): Likewise. > * params.def (-param=uninit-max-chain-len=): New. > (-param=uninit-max-num-chains=): Likewise. > * gimple-predicate-analysis.cc (MAX_NUM_CHAINS): Define to > param_uninit_max_num_chains. > (MAX_CHAIN_LEN): Define to param_uninit_max_chain_len. > (uninit_analysis::init_use_preds): Avoid VLA. > (uninit_analysis::init_from_phi_def): Likewise. > (compute_control_dep_chain): Avoid using MAX_CHAIN_LEN in > template parameter. our test attempting to detect that LTO_minor_version should have been bumped but wasn't is failing and eye-balling backports to the gcc-13 branch, this looks like a likely culprit? Unless I am mistaken, params are streamed and therefore they alter the LTO format? If so, I'd like to propose the obvious fix, OK for gcc-13 (after some testing)? Thanks, Martin [PATCH] Bump LTO_minor_version I believe r13-8039-g06ee3438a4fcf9 has changed LTO format and therefore we should bump the minor version of the GCC 13 LTO format. gcc/ChangeLog: 2023-11-20 Martin Jambor * lto-streamer.h (LTO_minor_version): Bump. --- gcc/lto-streamer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index fc7133d07ba..75cebcd02d3 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -122,7 +122,7 @@ along with GCC; see the file COPYING3. If not see form followed by the data for the string. */ #define LTO_major_version GCC_major_version -#define LTO_minor_version 0 +#define LTO_minor_version 1 typedef unsigned char lto_decl_flags_t; -- 2.42.0
Re: Propagate value ranges of return values
Hi, thanks for working on this. On Sun, Nov 19 2023, Jan Hubicka wrote: > Hi, > this is updated version which also adds testuiste compensation > I lost earlier while maintaining the patch in my testing tree. > There are quite few testcases that use constant return values to hide > something from optimizer. > > Bootstrapped/regtested x86_64-linux. > gcc/ChangeLog: > > * cgraph.cc (add_detected_attribute_1): New function. > (cgraph_node::add_detected_attribute): Likewise. > * cgraph.h (cgraph_node::add_detected_attribute): Declare. > * common.opt: Add -Wsuggest-attribute=returns_nonnull. > * doc/invoke.texi: Document new flag. > * gimple-range-fold.cc (fold_using_range::range_of_call): > Use known reutrn value ranges. > * ipa-prop.cc (struct ipa_return_value_summary): New type. > (class ipa_return_value_sum_t): New type. > (ipa_return_value_sum): New summary. > (ipa_record_return_value_range): New function. > (ipa_return_value_range): New function. > * ipa-prop.h (ipa_return_value_range): Declare. > (ipa_record_return_value_range): Declare. > * ipa-pure-const.cc (warn_function_returns_nonnull): New funcion. > * ipa-utils.h (warn_function_returns_nonnull): Declare. > * symbol-summary.h: Fix comment. > * tree-vrp.cc (execute_ranger_vrp): Record return values. > > gcc/testsuite/ChangeLog: > > * g++.dg/ipa/devirt-2.C: Add noipa attribute to prevent ipa-vrp. > * g++.dg/ipa/devirt-7.C: Disable ipa-vrp. > * g++.dg/ipa/ipa-icf-2.C: Disable ipa-vrp. > * g++.dg/ipa/ipa-icf-3.C: Disable ipa-vrp. > * g++.dg/ipa/ivinline-1.C: Disable ipa-vrp. > * g++.dg/ipa/ivinline-3.C: Disable ipa-vrp. > * g++.dg/ipa/ivinline-5.C: Disable ipa-vrp. > * g++.dg/ipa/ivinline-8.C: Disable ipa-vrp. > * g++.dg/ipa/nothrow-1.C: Disable ipa-vrp. > * g++.dg/ipa/pure-const-1.C: Disable ipa-vrp. > * g++.dg/ipa/pure-const-2.C: Disable ipa-vrp. > * g++.dg/lto/inline-crossmodule-1_0.C: Disable ipa-vrp. > * gcc.c-torture/compile/pr106433.c: Add noipa attribute to prevent > ipa-vrp. > * gcc.c-torture/execute/frame-address.c: Likewise. > * gcc.dg/ipa/fopt-info-inline-1.c: Disable ipa-vrp. > * gcc.dg/ipa/ipa-icf-25.c: Disable ipa-vrp. > * gcc.dg/ipa/ipa-icf-38.c: Disable ipa-vrp. > * gcc.dg/ipa/pure-const-1.c: Disable ipa-vrp. > * gcc.dg/ipa/remref-0.c: Add noipa attribute to prevent ipa-vrp. > * gcc.dg/tree-prof/time-profiler-1.c: Disable ipa-vrp. > * gcc.dg/tree-prof/time-profiler-2.c: Disable ipa-vrp. > * gcc.dg/tree-ssa/pr110269.c: Disable ipa-vrp. > * gcc.dg/tree-ssa/pr20701.c: Disable ipa-vrp. > * gcc.dg/tree-ssa/vrp05.c: Disable ipa-vrp. > * gcc.dg/tree-ssa/return-value-range-1.c: New test. > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index e41e5ad3ae7..71dacf23ce1 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p) >return changed; > } > > +/* Worker to set malloc flag. */ I think the comment must be stale, and the name of the function also, it does not add anything, does it? > +static void > +add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed) > +{ > + if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl))) > +{ > + DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr), > + NULL_TREE, DECL_ATTRIBUTES > (node->decl)); > + *changed = true; > +} > + > + ipa_ref *ref; > + FOR_EACH_ALIAS (node, ref) > +{ > + cgraph_node *alias = dyn_cast (ref->referring); > + if (alias->get_availability () > AVAIL_INTERPOSABLE) > + add_detected_attribute_1 (alias, attr, changed); > +} > + > + for (cgraph_edge *e = node->callers; e; e = e->next_caller) > +if (e->caller->thunk > + && (e->caller->get_availability () > AVAIL_INTERPOSABLE)) > + add_detected_attribute_1 (e->caller, attr, changed); > +} > + > +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ Likewise. > + > +bool > +cgraph_node::add_detected_attribute (const char *attr) > +{ > + bool changed = false; > + > + if (get_availability () > AVAIL_INTERPOSABLE) > +add_detected_attribute_1 (this, attr, &changed); > + else > +{ > + ipa_ref *ref; > + > + FOR_EACH_ALIAS (this, ref) > + { > + cgraph_node *alias = dyn_cast (ref->referring); > + if (alias->get_availability () > AVAIL_INTERPOSABLE) > + add_detected_attribute_1 (alias, attr, &changed); > + } > +} > + return changed; > +} > + > /* Worker to set noreturng flag. */ > static void > set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed) [...] > diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc > index 6e9530c3d7f..998b7608d78 100644 > --- a/gcc/gimple-range-fold.cc > +++ b/gcc/g
[PATCH] sra: SRA of non-escaped aggregates passed by reference to calls
Hello, PR109849 shows that a loop that heavily pushes and pops from a stack implemented by a C++ std::vec results in slow code, mainly because the vector structure is not split by SRA and so we end up in many loads and stores into it. This is because it is passed by reference to (re)allocation methods and so needs to live in memory, even though it does not escape from them and so we could SRA it if we re-constructed it before the call and then separated it to distinct replacements afterwards. This patch does exactly that, first relaxing the selection of candidates to also include those which are addressable but do not escape and then adding code to deal with the calls. The micro-benchmark that is also the (scan-dump) testcase in this patch runs twice as fast with it than with current trunk. Honza measured its effect on the libjxl benchmark and it almost closes the performance gap between Clang and GCC while not requiring excessive inlining and thus code growth. The patch disallows creation of replacements for such aggregates which are also accessed with a precision smaller than their size because I have observed that this led to excessive zero-extending of data leading to slow-downs of perlbench (on some CPUs). Apart from this case I have not noticed any regressions, at least not so far. Gimple call argument flags can tell if an argument is unused (and then we do not need to generate any statements for it) or if it is not written to and then we do not need to generate statements loading replacements from the original aggregate after the call statement. Unfortunately, we cannot symmetrically use flags that an aggregate is not read because to avoid re-constructing the aggregate before the call because flags don't tell which what parts of aggregates were not written to, so we load all replacements, and so all need to have the correct value before the call. The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on x86_64-linux and a very similar patch has also passed bootstrap and testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these two architectures but as I'm sending this). OK for master? Thanks, Martin gcc/ChangeLog: 2023-11-16 Martin Jambor PR middle-end/109849 * tree-sra.cc (passed_by_ref_in_call): New. (sra_initialize): Allocate passed_by_ref_in_call. (sra_deinitialize): Free passed_by_ref_in_call. (create_access): Add decl pool candidates only if they are not already candidates. (build_access_from_expr_1): Bail out on ADDR_EXPRs. (build_access_from_call_arg): New function. (asm_visit_addr): Rename to scan_visit_addr, change the disqualification dump message. (scan_function): Check taken addresses for all non-call statements, including phi nodes. Process all call arguments, including the static chain, build_access_from_call_arg. (maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow non-escaped local variables. (sort_and_splice_var_accesses): Disallow smaller-than-precision replacements for aggregates passed by reference to functions. (sra_modify_expr): Use a separate stmt iterator for adding satements before the processed statement and after it. (sra_modify_call_arg): New function. (sra_modify_assign): Adjust calls to sra_modify_expr. (sra_modify_function_body): Likewise, use sra_modify_call_arg to process call arguments, including the static chain. gcc/testsuite/ChangeLog: 2023-11-03 Martin Jambor PR middle-end/109849 * g++.dg/tree-ssa/pr109849.C: New test. * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options. --- gcc/testsuite/g++.dg/tree-ssa/pr109849.C | 31 +++ gcc/testsuite/gfortran.dg/pr43984.f90| 2 +- gcc/tree-sra.cc | 244 ++- 3 files changed, 231 insertions(+), 46 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C new file mode 100644 index 000..cd348c0f590 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sra" } */ + +#include +typedef unsigned int uint32_t; +std::pair pair; +void +test() +{ +std::vector > stack; +stack.push_back (pair); +while (!stack.empty()) { +std::pair cur = stack.back(); +stack.pop_back(); +if (!cur.first) +{ +cur.second++; +stack.push_back (cur); +} +if (cur.second > 1) +break; +} +} +int +main() +{ +for (int i = 0; i < 1; i++) + test(); +} + +/* { dg
[PATCH] gcc/configure: Regenerate
On Mon, Nov 06 2023, Martin Jambor wrote: > [...] > > I'm not sure what that means, whether a wrong version of > autoconf/automake was used (though when I accidentally tried that, it > has always complained loudly) or if some environment difference can > cause this. Perhaps I should change the script not to care about > commits though that won't happen soon (or perhaps I should drop the > checks completely) but would people be OK with me checking in the patch > above (with appropriate ChangeLog) to silence buildbot for a while > again? > I have committed the following to silence the tester. Probabaly because of a re-base of changes to gcc/configure there are line comment mismatches in between what we have and what would be generated. This patch brings them in line so that consitency checkers are happy. gcc/ChangeLog: 2023-11-07 Martin Jambor * configure: Regenerate. --- gcc/configure | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/configure b/gcc/configure index 4d0357cbc28..0d818ae6850 100755 --- a/gcc/configure +++ b/gcc/configure @@ -2,7 +2,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19995 "configure" +#line 20003 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -20106,7 +20106,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 20101 "configure" +#line 20109 "configure" #include "confdefs.h" #if HAVE_DLFCN_H -- 2.42.0
[PATCH] Fix configure script comments(!?!) (Was: Re: [PATCH] genemit: Split insn-emit.cc into ten files)
Hello, On Thu, Oct 12 2023, Robin Dapp wrote: > [...] > gcc/ChangeLog: > > PR bootstrap/84402 > PR target/111600 > > * Makefile.in: Handle split insn-emit.cc. > * configure: Regenerate. > * configure.ac: Add --with-insnemit-partitions. > * genemit.cc (output_peephole2_scratches): Print to file instead > of stdout. > (print_code): Ditto. > (gen_rtx_scratch): Ditto. > (gen_exp): Ditto. > (gen_emit_seq): Ditto. > (emit_c_code): Ditto. > (gen_insn): Ditto. > (gen_expand): Ditto. > (gen_split): Ditto. > (output_add_clobbers): Ditto. > (output_added_clobbers_hard_reg_p): Ditto. > (print_overload_arguments): Ditto. > (print_overload_test): Ditto. > (handle_overloaded_code_for): Ditto. > (handle_overloaded_gen): Ditto. > (print_header): New function. > (handle_arg): New function. > (main): Split output into 10 files. > * gensupport.cc (count_patterns): New function. > * gensupport.h (count_patterns): Define. > * read-md.cc (md_reader::print_md_ptr_loc): Add file argument. > * read-md.h (class md_reader): Change definition. Following this commit, our buildbot script which checks that configure scripts where re-generated correctly is unhappy because it insists comments are wrong, it wants to them to be like this: diff --git a/gcc/configure b/gcc/configure index 4d0357cbc28..0d818ae6850 100755 --- a/gcc/configure +++ b/gcc/configure @@ -2,7 +2,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19995 "configure" +#line 20003 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -20106,7 +20106,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 20101 "configure" +#line 20109 "configure" #include "confdefs.h" #if HAVE_DLFCN_H I'm not sure what that means, whether a wrong version of autoconf/automake was used (though when I accidentally tried that, it has always complained loudly) or if some environment difference can cause this. Perhaps I should change the script not to care about commits though that won't happen soon (or perhaps I should drop the checks completely) but would people be OK with me checking in the patch above (with appropriate ChangeLog) to silence buildbot for a while again? Thanks, Martin
[PATCH] Fortran: Fix generate_error library function fnspec
Hi, when developing an otherwise unrelated patch I've discovered that the fnspec for the Fortran library function generate_error is wrong. It is currently ". R . R " where the first R describes the first parameter and means that it "is only read and does not escape." The function itself, however, with signature: bool generate_error_common (st_parameter_common *cmp, int family, const char *message) contains the following: /* Report status back to the compiler. */ cmp->flags &= ~IOPARM_LIBRETURN_MASK; which does not correspond to the fnspec and breaks testcase gfortran.dg/large_unit_2.f90 when my patch is applied, since it tries to re-use the flags from before the call. This patch replaces the "R" with "W" which stands for "specifies that the memory pointed to by the parameter does not escape." Bootstrapped and tested on x86_64-linux. OK for master? 2023-11-02 Martin Jambor * trans-decl.cc (gfc_build_builtin_function_decls): Fix fnspec of generate_error. --- gcc/fortran/trans-decl.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index a3f037bd07b..b86cfec7d49 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -3821,7 +3821,7 @@ gfc_build_builtin_function_decls (void) void_type_node, -2, pchar_type_node, pchar_type_node); gfor_fndecl_generate_error = gfc_build_library_function_decl_with_spec ( - get_identifier (PREFIX("generate_error")), ". R . R ", + get_identifier (PREFIX("generate_error")), ". W . R ", void_type_node, 3, pvoid_type_node, integer_type_node, pchar_type_node); -- 2.42.0
Re: [PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)
Hello, On Thu, Oct 05 2023, Jan Hubicka wrote: >> gcc/ChangeLog: >> >> 2023-09-19 Martin Jambor >> >> PR ipa/57 >> * ipa-prop.h (struct ipa_argagg_value): Newf flag killed. >> * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function. >> (update_signature): Mark any any IPA-CP aggregate constants at >> positions known to be killed as killed. Move check that there is >> clone_info after this pruning. >> * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag. >> (ipa_argagg_value_list::push_adjusted_values): Clear the new flag. >> (push_agg_values_from_plats): Likewise. >> (ipa_push_agg_values_from_jfunc): Likewise. >> (estimate_local_effects): Likewise. >> (push_agg_values_for_index_from_edge): Likewise. >> * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed >> flag. >> (read_ipcp_transformation_info): Likewise. >> (ipcp_get_aggregate_const): Update comment, assert that encountered >> record does not have killed flag set. >> (ipcp_transform_function): Prune all aggregate constants with killed >> set. >> >> gcc/testsuite/ChangeLog: >> >> 2023-09-18 Martin Jambor >> >> PR ipa/57 >> * gcc.dg/lto/pr57_0.c: New test. >> * gcc.dg/lto/pr57_1.c: Second file of the same new test. > >> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc >> index c04f9f44c06..a8fcf159259 100644 >> --- a/gcc/ipa-modref.cc >> +++ b/gcc/ipa-modref.cc >> @@ -4065,21 +4065,71 @@ remap_kills (vec &kills, const >> vec &map) >>i++; >> } >> >> +/* Return true if the V can overlap with KILL. */ >> + >> +static bool >> +ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v, >> +const modref_access_node &kill) >> +{ >> + if (kill.parm_index == v.index) >> +{ >> + gcc_assert (kill.parm_offset_known); >> + gcc_assert (known_eq (kill.max_size, kill.size)); >> + poly_int64 repl_size; >> + bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)), >> + &repl_size); >> + gcc_assert (ok); >> + poly_int64 repl_offset (v.unit_offset); >> + repl_offset <<= LOG2_BITS_PER_UNIT; >> + poly_int64 combined_offset >> += (kill.parm_offset << LOG2_BITS_PER_UNIT) + kill.offset; > parm_offset may be negative which I think will confuse > ranges_maybe_overlap_p. > I think you need to test for this and if it is negative adjust > repl_offset instead of kill.offset After a discussion with Honza about this in person, we came to the conclusion that the patch works as intended even in presence of negative parm_offsets (I even have a testcase but I need to enhance IPA-CP a bit in order for it to be useful also outside a debugger). >> + if (ranges_maybe_overlap_p (repl_offset, repl_size, >> + combined_offset, kill.size)) >> +return true; >> +} >> + return false; >> +} >> + >> /* If signature changed, update the summary. */ >> >> static void >> update_signature (struct cgraph_node *node) >> { >> - clone_info *info = clone_info::get (node); >> - if (!info || !info->param_adjustments) >> -return; >> - >>modref_summary *r = optimization_summaries >>? optimization_summaries->get (node) : NULL; >>modref_summary_lto *r_lto = summaries_lto >>? summaries_lto->get (node) : NULL; >>if (!r && !r_lto) >> return; >> + >> + ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node); > Please add comment on why this is necessary. >> + if (ipcp_ts) >> +{ >> +for (auto &v : ipcp_ts->m_agg_values) >> + { >> +if (!v.by_ref) >> + continue; >> +if (r) >> + for (const modref_access_node &kill : r->kills) >> +if (ipcp_argagg_and_kill_overlap_p (v, kill)) >> + { >> +v.killed = true; >> +break; >> + } >> +if (!v.killed && r_lto) >> + for (const modref_access_node &kill : r_lto->kills) >> +if (ipcp_argagg_and_kill_overlap_p (v, kill)) >> + { >> +v.killed = 1; > = true? >> +break; >> + } >> + } >> +} >> + >> + c
Re: Darwin: Replace environment runpath with embedded [PR88590]
Hello Iain, On Tue, Aug 15 2023, FX Coudert via Gcc-patches wrote: > [...] > From e1cf04cadb9fa065fb3f7d6bccf9ed6f1e9e3fc1 Mon Sep 17 00:00:00 2001 > From: Iain Sandoe > Date: Sun, 28 Mar 2021 14:48:17 +0100 > Subject: [PATCH 2/4] Darwin: Allow for configuring Darwin to use embedded > runpath. our buildbot checker found that after this patch, there is an uncommitted auto(re)conf generated hunk in fixincludes/configure: diff --git a/fixincludes/configure b/fixincludes/configure index b9770489adc..1bb547a1724 100755 --- a/fixincludes/configure +++ b/fixincludes/configure @@ -3027,6 +3027,7 @@ ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. # --- # _LT_COMPILER_PIC +enable_darwin_at_rpath_$1=no # _LT_LINKER_SHLIBS([TAGNAME]) # @@ -3049,7 +3050,6 @@ ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. # the compiler configuration to `libtool'. # _LT_LANG_CXX_CONFIG - # _LT_SYS_HIDDEN_LIBDEPS([TAGNAME]) # - # Figure out "hidden" library dependencies from verbose Can I commit it (with an appropriate ChangeLog message) or do you want to take care of it yourself? Thanks, Martin > > Recent Darwin versions place contraints on the use of run paths > specified in environment variables. This breaks some assumptions > in the GCC build. > > This change allows the user to configure a Darwin build to use > '@rpath/libraryname.dylib' in library names and then to add an > embedded runpath to executables (and libraries with dependents). > > The embedded runpath is added by default unless the user adds > '-nodefaultrpaths' to the link line. > > For an installed compiler, it means that any executable built with > that compiler will reference the runtimes installed with the > compiler (equivalent to hard-coding the library path into the name > of the library). > > During build-time configurations any "-B" entries will be added to > the runpath thus the newly-built libraries will be found by exes. > > Since the install name is set in libtool, that decision needs to be > available here (but might also cause dependent ones in Makefiles, > so we need to export a conditional). > > This facility is not available for Darwin 8 or earlier, however the > existing environment variable runpath does work there. > > We default this on for systems where the external DYLD_LIBRARY_PATH > does not work and off for Darwin 8 or earlier. For systems that can > use either method, if the value is unset, we use the default (which > is currently DYLD_LIBRARY_PATH). > > > > Ada changes: > add paths relative to @loader-path > > JIT changes: > > This patch expects DARWIN_RPATH to be computed and available; which > means that we will use @rpath or ${libdir} as the name prefix > depending on the system version and the setting of > --enable-darwin-at-rpath. For branches that do not have this > available, the value should be set to ${libdir}. > > added m2 library changes. > > ChangeLog: > > * configure: Regenerate. > * configure.ac: Do not add default runpaths to GCC exes > when we are building -static-libstdc++/-static-libgcc (the > default). > * libtool.m4: Add 'enable-darwin-at-runpath'. Act on the > enable flag to alter Darwin libraries to use @rpath names. > > fixincludes/ChangeLog: > > * configure: Regenerate. > > gcc/ChangeLog: > > * aclocal.m4: Regenerate. > * configure: Regenerate. > * configure.ac: Handle Darwin rpaths. > * config/darwin-driver.cc: Handle Darwin rpaths. > * config/darwin.h: Handle Darwin rpaths. > * config/darwin.opt: Handle Darwin rpaths. > * Makefile.in: Handle Darwin rpaths. > > gcc/ada/ChangeLog: > > * gcc-interface/Makefile.in: Handle Darwin rpaths. > > gcc/jit/ChangeLog: > * Make-lang.in: Handle Darwin rpaths. > > libatomic/ChangeLog: > > * Makefile.am: Handle Darwin rpaths. > * Makefile.in: Regenerate. > * configure: Regenerate. > * configure.ac: Handle Darwin rpaths. > > libbacktrace/ChangeLog: > > * configure: Regenerate. > * configure.ac: Handle Darwin rpaths. > > libcc1/ChangeLog: > > * configure: Regenerate. > > libffi/ChangeLog: > > * Makefile.am: Handle Darwin rpaths. > * Makefile.in: Regenerate. > * configure: Regenerate. > > libgcc/ChangeLog: > > * config/t-slibgcc-darwin: Generate libgcc_s > with an @rpath name. > * config.host: Handle Darwin rpaths. > > libgfortran/ChangeLog: > > * Makefile.am: Handle Darwin rpaths. > * Makefile.in: Regenerate. > * configure: Regenerate. > * configure.ac: Handle Darwin rpaths > > libgm2/ChangeLog: > > * Makefile.am: Handle Darwin rpaths. > * Makefile.in: Regenerate. > * aclocal.m4: Regenerate. > * configure: Regenerate. > * configure.ac: Handle Darwin rpaths. > * libm2cor/Makefile.
[PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)
PR 57 shows that IPA-modref and IPA-CP (when plugged into value numbering) can optimize out a store both before a call (because the call will overwrite it) and in the call (because the store is of the same value) and by eliminating both create miscompilation. This patch fixes that by pruning any constants from the list of IPA-CP aggregate value constants that it knows the contents of the memory can be "killed." Unfortunately, doing so is tricky. First, IPA-modref loads override kills and so only stores not loaded are truly not necessary. Looking stuff up there means doing what most of what modref_may_alias may do but doing exactly what it does is tricky because it takes also aliasing into account and has bail-out counters. To err on the side of caution in order to avoid this miscompilation we have to prune a constant when in doubt. However, pruning can interfere with the mechanism of how clone materialization distinguishes between the cases when a parameter was entirely removed and when it was both IPA-CPed and IPA-SRAed (in order to make up for the removal in debug info, which can bump into an assert when compiling g++.dg/torture/pr103669.C when we are not careful). Therefore this patch: 1) marks constants that IPA-modref has in its kill list with a new "killed" flag, and 2) prunes the list from entries with this flag after materialization and IPA-CP transformation is done using the template introduced in the previous patch It does not try to look up anything in the load lists, this will be done as a follow-up in order to ease review. gcc/ChangeLog: 2023-09-19 Martin Jambor PR ipa/57 * ipa-prop.h (struct ipa_argagg_value): Newf flag killed. * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function. (update_signature): Mark any any IPA-CP aggregate constants at positions known to be killed as killed. Move check that there is clone_info after this pruning. * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag. (ipa_argagg_value_list::push_adjusted_values): Clear the new flag. (push_agg_values_from_plats): Likewise. (ipa_push_agg_values_from_jfunc): Likewise. (estimate_local_effects): Likewise. (push_agg_values_for_index_from_edge): Likewise. * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed flag. (read_ipcp_transformation_info): Likewise. (ipcp_get_aggregate_const): Update comment, assert that encountered record does not have killed flag set. (ipcp_transform_function): Prune all aggregate constants with killed set. gcc/testsuite/ChangeLog: 2023-09-18 Martin Jambor PR ipa/57 * gcc.dg/lto/pr57_0.c: New test. * gcc.dg/lto/pr57_1.c: Second file of the same new test. --- gcc/ipa-cp.cc | 8 gcc/ipa-modref.cc | 58 +-- gcc/ipa-prop.cc | 17 +++- gcc/ipa-prop.h| 4 ++ gcc/testsuite/gcc.dg/lto/pr57_0.c | 24 +++ gcc/testsuite/gcc.dg/lto/pr57_1.c | 10 + 6 files changed, 115 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_1.c diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 071c607fbe8..bb49a1b2959 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -1271,6 +1271,8 @@ ipa_argagg_value_list::dump (FILE *f) print_generic_expr (f, av.value); if (av.by_ref) fprintf (f, "(by_ref)"); + if (av.killed) + fprintf (f, "(killed)"); comma = true; } fprintf (f, "\n"); @@ -1437,6 +1439,8 @@ ipa_argagg_value_list::push_adjusted_values (unsigned src_index, new_av.unit_offset = av->unit_offset - unit_delta; new_av.index = dest_index; new_av.by_ref = av->by_ref; + gcc_assert (!av->killed); + new_av.killed = false; /* Quick check that the offsets we push are indeed increasing. */ gcc_assert (first @@ -1473,6 +1477,7 @@ push_agg_values_from_plats (ipcp_param_lattices *plats, int dest_index, iav.unit_offset = aglat->offset / BITS_PER_UNIT - unit_delta; iav.index = dest_index; iav.by_ref = plats->aggs_by_ref; + iav.killed = false; gcc_assert (first || iav.unit_offset > prev_unit_offset); @@ -2139,6 +2144,7 @@ ipa_push_agg_values_from_jfunc (ipa_node_params *info, cgraph_node *node, iav.unit_offset = item.offset / BITS_PER_UNIT; iav.index = dst_index; iav.by_ref = agg_jfunc->by_ref; + iav.killed = 0; gcc_assert (first || iav.unit_offset > prev_unit_offset); @@ -3970,6 +3976,7 @@ estimate_local_effects (struct cgraph
[PATCH 3/3] ipa: Limit pruning of IPA-CP aggregate constants if there are loads
This patch makes the previous one less conservative by looking whether there are known ipa-modref loads from areas covered by the IPA-CP aggregate constant entry in question. Because ipa-modref relies on alias information which IPA-CP does not have (yet), the test is much more crude and only reports overlapping accesses with known offsets and max_size. I was not able to put together a testcase which would fail without this patch however. It basically needs to be a combination of testcases for PR 92497 (so that IPA-CP transformation phase is not enough), PR 57 (to get a load) and PR 103669 (to get a clobber/kill) in a way that ipa-modref can still track things. Therefore I am not sure if we actually want this patch. gcc/ChangeLog: 2023-10-04 Martin Jambor * ipa-modref.cc (ipcp_argagg_and_access_must_overlap_p): New function. (ipcp_argagg_and_modref_tree_must_overlap_p): Likewise. (update_signature): Use ipcp_argagg_and_modref_tree_must_overlap_p. Combined third step --- gcc/ipa-modref.cc | 65 +-- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc index a8fcf159259..d2bfca3445d 100644 --- a/gcc/ipa-modref.cc +++ b/gcc/ipa-modref.cc @@ -4090,6 +4090,64 @@ ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v, return false; } +/* Return true if V overlaps with ACCESS_NODE. When in doubt, return + false. */ + +static bool +ipcp_argagg_and_access_must_overlap_p (const ipa_argagg_value &v, + const modref_access_node &access_node) +{ + if (access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM + || access_node.parm_index == MODREF_UNKNOWN_PARM + || access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM) + return false; + + if (access_node.parm_index == v.index) +{ + if (!access_node.parm_offset_known) + return false; + + poly_int64 repl_size; + bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)), +&repl_size); + gcc_assert (ok); + poly_int64 repl_offset (v.unit_offset); + repl_offset <<= LOG2_BITS_PER_UNIT; + poly_int64 combined_offset + = (access_node.parm_offset << LOG2_BITS_PER_UNIT) + access_node.offset; + if (ranges_maybe_overlap_p (repl_offset, repl_size, + combined_offset, access_node.max_size)) + return true; +} + return false; +} + +/* Return true if MT contains an access that certainly overlaps with V even + when we cannot evaluate alias references. When in doubt, return false. */ + +template +static bool +ipcp_argagg_and_modref_tree_must_overlap_p (const ipa_argagg_value &v, + const modref_tree &mt) +{ + for (auto base_node : mt.bases) +{ + if (base_node->every_ref) + return false; + for (auto ref_node : base_node->refs) + { + if (ref_node->every_access) + return false; + for (auto access_node : ref_node->accesses) + { + if (ipcp_argagg_and_access_must_overlap_p (v, access_node)) + return true; + } + } +} + return false; +} + /* If signature changed, update the summary. */ static void @@ -4111,14 +4169,17 @@ update_signature (struct cgraph_node *node) continue; if (r) for (const modref_access_node &kill : r->kills) - if (ipcp_argagg_and_kill_overlap_p (v, kill)) + if (ipcp_argagg_and_kill_overlap_p (v, kill) + && !ipcp_argagg_and_modref_tree_must_overlap_p (v, *r->loads)) { v.killed = true; break; } if (!v.killed && r_lto) for (const modref_access_node &kill : r_lto->kills) - if (ipcp_argagg_and_kill_overlap_p (v, kill)) + if (ipcp_argagg_and_kill_overlap_p (v, kill) + && !ipcp_argagg_and_modref_tree_must_overlap_p (v, + *r_lto->loads)) { v.killed = 1; break; -- 2.42.0
[PATCH 1/3] ipa-cp: Templatize filtering of m_agg_values
PR 57 points to another place where IPA-CP collected aggregate compile-time constants need to be filtered, in addition to the one place that already does this in ipa-sra. In order to re-use code, this patch turns the common bit into a template. The functionality is still covered by testcase gcc.dg/ipa/pr108959.c. gcc/ChangeLog: 2023-09-13 Martin Jambor PR ipa/57 * ipa-prop.h (ipcp_transformation): New member function template remove_argaggs_if. * ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to filter aggreagate constants. --- gcc/ipa-prop.h | 33 + gcc/ipa-sra.cc | 33 - 2 files changed, 37 insertions(+), 29 deletions(-) diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index 7e033d2a7b8..815855006e8 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -966,6 +966,39 @@ struct GTY(()) ipcp_transformation void maybe_create_parm_idx_map (tree fndecl); + /* Remove all elements in m_agg_values on which PREDICATE returns true. */ + + template + void remove_argaggs_if (pred_function &&predicate) + { +unsigned ts_len = vec_safe_length (m_agg_values); +if (ts_len == 0) + return; + +bool removed_item = false; +unsigned dst_index = 0; + +for (unsigned i = 0; i < ts_len; i++) + { + ipa_argagg_value *v = &(*m_agg_values)[i]; + if (!predicate (*v)) + { + if (removed_item) + (*m_agg_values)[dst_index] = *v; + dst_index++; + } + else + removed_item = true; + } +if (dst_index == 0) + { + ggc_free (m_agg_values); + m_agg_values = NULL; + } +else if (removed_item) + m_agg_values->truncate (dst_index); + } + /* Known aggregate values. */ vec *m_agg_values; /* Known bits information. */ diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc index edba364f56e..1551b694679 100644 --- a/gcc/ipa-sra.cc +++ b/gcc/ipa-sra.cc @@ -4047,35 +4047,10 @@ mark_callers_calls_comdat_local (struct cgraph_node *node, void *) static void zap_useless_ipcp_results (const isra_func_summary *ifs, ipcp_transformation *ts) { - unsigned ts_len = vec_safe_length (ts->m_agg_values); - - if (ts_len == 0) -return; - - bool removed_item = false; - unsigned dst_index = 0; - - for (unsigned i = 0; i < ts_len; i++) -{ - ipa_argagg_value *v = &(*ts->m_agg_values)[i]; - const isra_param_desc *desc = &(*ifs->m_parameters)[v->index]; - - if (!desc->locally_unused) - { - if (removed_item) - (*ts->m_agg_values)[dst_index] = *v; - dst_index++; - } - else - removed_item = true; -} - if (dst_index == 0) -{ - ggc_free (ts->m_agg_values); - ts->m_agg_values = NULL; -} - else if (removed_item) -ts->m_agg_values->truncate (dst_index); + ts->remove_argaggs_if ([ifs](const ipa_argagg_value &v) + { +return (*ifs->m_parameters)[v.index].locally_unused; + }); bool useful_bits = false; unsigned count = vec_safe_length (ts->bits); -- 2.42.0
[PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"
Hello, I am going to commit the following patch to fix PR 111688 (bootstrap on ppc64le broken) and will re-fix 108007 (issues with IPA-SRA when user explicitely turns off DCE) when I figure out what's going wrong. Sorry for the breakage, Martin [PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)" This reverts commit 1be18ea110a2d69570dbc494588a7c73173883be. As reported in PR bootstrap/111688, it broke ppc64le bootstrap because of a debug-compare failure. --- gcc/cgraph.cc | 10 +--- gcc/cgraph.h| 9 +-- gcc/ipa-param-manipulation.cc | 88 - gcc/ipa-param-manipulation.h| 3 +- gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 --- gcc/tree-inline.cc | 28 - 6 files changed, 38 insertions(+), 132 deletions(-) delete mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index b82367ac342..e41e5ad3ae7 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -1403,17 +1403,11 @@ cgraph_edge::redirect_callee (cgraph_node *n) speculative indirect call, remove "speculative" of the indirect call and also redirect stmt to it's final direct target. - When called from within tree-inline, KILLED_SSAs has to contain the pointer - to killed_new_ssa_names within the copy_body_data structure and SSAs - discovered to be useless (if LHS is removed) will be added to it, otherwise - it needs to be NULL. - It is up to caller to iteratively transform each "speculative" direct call as appropriate. */ gimple * -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, - hash_set *killed_ssas) +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) { tree decl = gimple_call_fndecl (e->call_stmt); gcall *new_stmt; @@ -1533,7 +1527,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, remove_stmt_from_eh_lp (e->call_stmt); tree old_fntype = gimple_call_fntype (e->call_stmt); - new_stmt = padjs->modify_call (e, false, killed_ssas); + new_stmt = padjs->modify_call (e, false); cgraph_node *origin = e->callee; while (origin->clone_of) origin = origin->clone_of; diff --git a/gcc/cgraph.h b/gcc/cgraph.h index d7162efeeb4..cedaaac3a45 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1833,16 +1833,9 @@ public: speculative indirect call, remove "speculative" of the indirect call and also redirect stmt to it's final direct target. - When called from within tree-inline, KILLED_SSAs has to contain the - pointer to killed_new_ssa_names within the copy_body_data structure and - SSAs discovered to be useless (if LHS is removed) will be added to it, - otherwise it needs to be NULL. - It is up to caller to iteratively transform each "speculative" direct call as appropriate. */ - static gimple *redirect_call_stmt_to_callee (cgraph_edge *e, - hash_set - *killed_ssas = nullptr); + static gimple *redirect_call_stmt_to_callee (cgraph_edge *e); /* Create clone of edge in the node N represented by CALL_EXPR the callgraph. */ diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc index 014939bf754..ae52f17b2c9 100644 --- a/gcc/ipa-param-manipulation.cc +++ b/gcc/ipa-param-manipulation.cc @@ -593,66 +593,14 @@ isra_get_ref_base_and_offset (tree expr, tree *base_p, unsigned *unit_offset_p) return true; } -/* Remove all statements that use NAME and transitively those that use the - result of such statements. KILLED_SSAS contains the SSA_NAMEs that are - already being or have been processed and new ones need to be added to it. - The funtction only has to process situations handled by - ssa_name_only_returned_p in ipa-sra.cc with the exception that it can assume - it must never reach a use in a return statement. */ - -static void -purge_transitive_uses (tree name, hash_set *killed_ssas) -{ - imm_use_iterator imm_iter; - gimple *stmt; - auto_vec worklist; - - worklist.safe_push (name); - while (!worklist.is_empty ()) -{ - tree cur_name = worklist.pop (); - FOR_EACH_IMM_USE_STMT (stmt, imm_iter, cur_name) - { - if (gimple_debug_bind_p (stmt)) - { - /* When runing within tree-inline, we will never end up here but -adding the SSAs to killed_ssas will do the trick in this case -and the respective debug statements will get reset. */ - gimple_debug_bind_reset_value (stmt); - update_stmt (stmt); - continue; - } - - tree lhs = NULL_TREE; - if (is_gimple_assign (stmt)) - lhs = gimple_assign_lhs (stmt); - else if (gimple_code (stmt) == GIMPLE_PHI) - lhs = gimple_phi_result (stmt); - gcc_assert (l
[committed] ipa-modref: Fix dumping
Hi, function dump_lto_records ought to dump to its parameter OUT but was dumping expressions to dump_file. This is corrected by this patch and while at at, I also made the modref_summary::dump member function const so that it is callable from more contexts. I have committed this patch as obvious after including it in a bootstrap and testing on an x86_64-linux. Thanks, Martin gcc/ChangeLog: 2023-09-21 Martin Jambor * ipa-modref.h (modref_summary::dump): Make const. * ipa-modref.cc (modref_summary::dump): Likewise. (dump_lto_records): Dump to out instead of dump_file. --- gcc/ipa-modref.cc | 6 +++--- gcc/ipa-modref.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc index c04f9f44c06..fe55621f007 100644 --- a/gcc/ipa-modref.cc +++ b/gcc/ipa-modref.cc @@ -474,7 +474,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out) FOR_EACH_VEC_SAFE_ELT (tt->bases, i, n) { fprintf (out, " Base %i:", (int)i); - print_generic_expr (dump_file, n->base); + print_generic_expr (out, n->base); fprintf (out, " (alias set %i)\n", n->base ? get_alias_set (n->base) : 0); if (n->every_ref) @@ -487,7 +487,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out) FOR_EACH_VEC_SAFE_ELT (n->refs, j, r) { fprintf (out, "Ref %i:", (int)j); - print_generic_expr (dump_file, r->ref); + print_generic_expr (out, r->ref); fprintf (out, " (alias set %i)\n", r->ref ? get_alias_set (r->ref) : 0); if (r->every_access) @@ -567,7 +567,7 @@ remove_modref_edge_summaries (cgraph_node *node) /* Dump summary. */ void -modref_summary::dump (FILE *out) +modref_summary::dump (FILE *out) const { if (loads) { diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h index 2a2d31e86db..f7dedace2da 100644 --- a/gcc/ipa-modref.h +++ b/gcc/ipa-modref.h @@ -66,7 +66,7 @@ struct GTY(()) modref_summary modref_summary (); ~modref_summary (); - void dump (FILE *); + void dump (FILE *) const; bool useful_p (int ecf_flags, bool check_flags = true); void finalize (tree); }; -- 2.42.0
Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)
Hello, On Mon, Sep 25 2023, Jan Hubicka wrote: [...] >> >> +static void >> >> +purge_transitive_uses (tree name, hash_set *killed_ssas) >> >> +{ >> >> + imm_use_iterator imm_iter; >> >> + gimple *stmt; >> >> + >> >> + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name) >> >> +{ >> >> + if (gimple_debug_bind_p (stmt)) >> >> + { >> >> + /* When runing within tree-inline, we will never end up here but >> >> + adding the SSAs to killed_ssas will do the trick in this case and >> >> + the respective debug statements will get reset. */ >> >> + >> >> + gimple_debug_bind_reset_value (stmt); >> >> + update_stmt (stmt); >> >> + continue; >> >> + } >> >> + >> >> + tree lhs = NULL_TREE; >> >> + if (is_gimple_assign (stmt)) >> >> + lhs = gimple_assign_lhs (stmt); >> >> + else if (gimple_code (stmt) == GIMPLE_PHI) >> >> + lhs = gimple_phi_result (stmt); >> >> + gcc_assert (lhs >> >> + && (TREE_CODE (lhs) == SSA_NAME) >> >> + && !gimple_vdef (stmt)); >> >> + >> >> + if (!killed_ssas->contains (lhs)) >> >> + { >> >> + killed_ssas->add (lhs); >> >> + purge_transitive_uses (lhs, killed_ssas); > > SSA graph may be deep so this may cause stack overflow, so I think we > should use worklist here (it is also easy to do). > > OK with that change. > Honza I have just committed the following after a bootstrap and testing on x86_64-linux. Thanks, Martin PR 108007 is another manifestation where we rely on DCE to clean-up after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA can leave behind statements which are fed uninitialized values and trap, even though their results are themselves never used. I have already fixed this for unused parameters in callees, this bug shows that almost the same thing can happen for removed returns, on the side of callers. This means that the issue has to be fixed elsewhere, in call redirection. This patch adds a function which looks for (and through, using a work-list) uses of operations fed specific SSA names and removes them all. That would have been easy if it wasn't for debug statements during tree-inline (from which call redirection is also invoked). Debug statements are decoupled from the rest at this point and iterating over uses of SSAs does not bring them up. During tree-inline they are handled especially at the end, I assume in order to make sure that relative ordering of UIDs are the same with and without debug info. This means that during tree-inline we need to make a hash of killed SSAs, that we already have in copy_body_data, available to the function making the purging. So the patch duly does also that, making the interface slightly ugly. gcc/ChangeLog: 2023-09-27 Martin Jambor PR ipa/108007 * cgraph.h (cgraph_edge): Add a parameter to redirect_call_stmt_to_callee. * ipa-param-manipulation.h (ipa_param_adjustments): Add a parameter to modify_call. * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New parameter killed_ssas, pass it to padjs->modify_call. * ipa-param-manipulation.cc (purge_transitive_uses): New function. (ipa_param_adjustments::modify_call): New parameter killed_ssas. Instead of substituting uses, invoke purge_transitive_uses. If hash of killed SSAs has not been provided, create a temporary one and release SSAs that have been added to it. * tree-inline.cc (redirect_all_calls): Create id->killed_new_ssa_names earlier, pass it to edge redirection, adjust a comment. (copy_body): Release SSAs in id->killed_new_ssa_names. gcc/testsuite/ChangeLog: 2023-05-11 Martin Jambor PR ipa/108007 * gcc.dg/ipa/pr108007.c: New test. --- gcc/cgraph.cc | 10 +++- gcc/cgraph.h| 9 ++- gcc/ipa-param-manipulation.cc | 88 + gcc/ipa-param-manipulation.h| 3 +- gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++ gcc/tree-inline.cc | 28 + 6 files changed, 132 insertions(+), 38 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index e41e5ad3ae7..b82367ac342 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n) speculative indirect call, remove "speculative" of the indirect call and also redirect stmt to it's final direct targ
[PATCH] contrib/mklog.py: Fix issues reported by flake8
Hi, the testing infrastructure built by Martin Liška contains checking a few python scripts in contrib witha tool flake8. That tool recently complains that: contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is None:' contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function definition, found 1 I'd like to silence these with the following, hopefully trivial, changes. However, I have only tested the changes by running flake8 again and running ./contrib/mklog.py --help. Is this good for trunk? (Or should I stop using flake8 instead?) Thanks, Martin contrib/ChangeLog: 2023-10-03 Martin Jambor * mklog.py (skip_line_in_changelog): Compare to None using is instead of ==, add an extra newline after the function. --- contrib/mklog.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/mklog.py b/contrib/mklog.py index effe5aa1ca5..1c2c3216e9e 100755 --- a/contrib/mklog.py +++ b/contrib/mklog.py @@ -357,7 +357,8 @@ def update_copyright(data): def skip_line_in_changelog(line): -return FIRST_LINE_OF_END_RE.match(line) == None +return FIRST_LINE_OF_END_RE.match(line) is None + if __name__ == '__main__': extra_args = os.getenv('GCC_MKLOG_ARGS') -- 2.42.0
Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)
Hello, and ping. Thanks, Martin On Fri, Sep 01 2023, Martin Jambor wrote: > Hello > > and ping. > > Thanks, > > Martin > > > On Fri, May 12 2023, Martin Jambor wrote: >> Hi, >> >> PR 108007 is another manifestation where we rely on DCE to clean-up >> after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA >> can leave behind statements which are fed uninitialized values and >> trap, even though their results are themselves never used. >> >> I have already fixed this for unused parameters in callees, this bug >> shows that almost the same thing can happen for removed returns, on >> the side of callers. This means that the issue has to be fixed >> elsewhere, in call redirection. This patch adds a function which >> recursivewly looks for uses of operations fed specific SSA names and >> removes them all. >> >> That would have been easy if it wasn't for debug statements during >> tree-inline (from which call redirection is also invoked). Debug >> statements are decoupled from the rest at this point and iterating >> over uses of SSAs does not bring them up. During tree-inline they are >> handled especially at the end, I assume in order to make sure that >> relative ordering of UIDs are the same with and without debug info. >> >> This means that during tree-inline we need to make a hash of killed >> SSAs, that we already have in copy_body_data, available to the >> function making the purging. So the patch duly does also that, making >> the interface slightly ugly. >> >> Bootstrapped and tested on x86_64-linux. OK for master? (I am not sure >> the problem is grave enough to warrant backporting to release branches >> but can do that as well if people think I should.) >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2023-05-11 Martin Jambor >> >> PR ipa/108007 >> * cgraph.h (cgraph_edge): Add a parameter to >> redirect_call_stmt_to_callee. >> * ipa-param-manipulation.h (ipa_param_adjustments): Added a >> parameter to modify_call. >> * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New >> parameter killed_ssas, pass it to padjs->modify_call. >> * ipa-param-manipulation.cc (purge_transitive_uses): New function. >> (ipa_param_adjustments::modify_call): New parameter killed_ssas. >> Instead of substitutin uses, invoke purge_transitive_uses. If >> hash of killed SSAs has not been provided, create a temporary one >> and release SSAs that have been added to it. >> * tree-inline.cc (redirect_all_calls): Create >> id->killed_new_ssa_names earlier, pass it to edge redirection, >> adjust a comment. >> (copy_body): Release SSAs in id->killed_new_ssa_names. >> >> gcc/testsuite/ChangeLog: >> >> 2023-05-11 Martin Jambor >> >> PR ipa/108007 >> * gcc.dg/ipa/pr108007.c: New test. >> --- >> gcc/cgraph.cc | 10 +++- >> gcc/cgraph.h| 9 ++- >> gcc/ipa-param-manipulation.cc | 85 + >> gcc/ipa-param-manipulation.h| 3 +- >> gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++ >> gcc/tree-inline.cc | 28 ++ >> 6 files changed, 129 insertions(+), 38 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c >> >> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc >> index e8f9bec8227..5e923bf0557 100644 >> --- a/gcc/cgraph.cc >> +++ b/gcc/cgraph.cc >> @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n) >> speculative indirect call, remove "speculative" of the indirect call and >> also redirect stmt to it's final direct target. >> >> + When called from within tree-inline, KILLED_SSAs has to contain the >> pointer >> + to killed_new_ssa_names within the copy_body_data structure and SSAs >> + discovered to be useless (if LHS is removed) will be added to it, >> otherwise >> + it needs to be NULL. >> + >> It is up to caller to iteratively transform each "speculative" >> direct call as appropriate. */ >> >> gimple * >> -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) >> +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, >> + hash_set *killed_ssas) >> { >>tree decl = gimple_call_fndecl (e->call_stmt); >>gcall *new_stmt; >> @@ -1
Re: [PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed
Hello, and ping. Thanks, Martin On Fri, Sep 01 2023, Martin Jambor wrote: > Hello > > and ping. > > Thanks, > > Martin > > > On Fri, Aug 18 2023, Martin Jambor wrote: >> Hi, >> >> testing on 32bit arm revealed that even the simplest case of PR 110378 >> was still not resolved there because destructors were returning this >> pointer. Needless to say, the return value of those destructors often >> is just not used, which IPA-SRA can already detect in time. Since >> such enhancement seems generally useful, here it is. >> >> The patch simply adds two flag to respective summaries to mark down >> situations when it encounters either a simple direct use of a default >> definition SSA_NAME of a parameter, which means that the parameter may >> still be split when return value is removed, and when any derived use >> of it is returned, allowing for complete removal in that case, instead >> of discarding it as a candidate for removal or splitting like we do >> now. The IPA phase then simply checks that we indeed plan to remove >> the return value before allowing any transformation to be considered >> in such cases. >> >> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux. OK for >> master? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2023-08-18 Martin Jambor >> >> PR ipa/110378 >> * ipa-param-manipulation.cc >> (ipa_param_body_adjustments::mark_dead_statements): Verify that any >> return uses of PARAM will be removed. >> (ipa_param_body_adjustments::mark_clobbers_dead): Likewise. >> * ipa-sra.cc (isra_param_desc): New fields >> remove_only_when_retval_removed and split_only_when_retval_removed. >> (struct gensum_param_desc): Likewise. Fix comment long line. >> (ipa_sra_function_summaries::duplicate): Copy the new flags. >> (dump_gensum_param_descriptor): Dump the new flags. >> (dump_isra_param_descriptor): Likewise. >> (isra_track_scalar_value_uses): New parameter desc. Set its flag >> remove_only_when_retval_removed when encountering a simple return. >> (isra_track_scalar_param_local_uses): Replace parameter call_uses_p >> with desc. Pass it to isra_track_scalar_value_uses and set its >> call_uses. >> (ptr_parm_has_nonarg_uses): Accept parameter descriptor as a >> parameter. If there is a direct return use, mark any.. >> (create_parameter_descriptors): Pass the whole parameter descriptor to >> isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses. >> (process_scan_results): Copy the new flags. >> (isra_write_node_summary): Stream the new flags. >> (isra_read_node_info): Likewise. >> (adjust_parameter_descriptions): Check that transformations >> requring return removal only happen when return value is removed. >> Restructure main loop. Adjust dump message. >> >> gcc/testsuite/ChangeLog: >> >> 2023-08-18 Martin Jambor >> >> PR ipa/110378 >> * gcc.dg/ipa/ipa-sra-32.c: New test. >> * gcc.dg/ipa/pr110378-4.c: Likewise. >> * gcc.dg/ipa/ipa-sra-4.c: Use a return value. >> --- >> gcc/ipa-param-manipulation.cc | 7 +- >> gcc/ipa-sra.cc| 247 +- >> gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c | 30 >> gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c | 4 +- >> gcc/testsuite/gcc.dg/ipa/pr110378-4.c | 50 ++ >> 5 files changed, 251 insertions(+), 87 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c >> >> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc >> index 4a185ddbdf4..ae52f17b2c9 100644 >> --- a/gcc/ipa-param-manipulation.cc >> +++ b/gcc/ipa-param-manipulation.cc >> @@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree >> dead_param, >> stack.safe_push (lhs); >> } >> } >> + else if (gimple_code (stmt) == GIMPLE_RETURN) >> +gcc_assert (m_adjustments && m_adjustments->m_skip_return); >>else >> /* IPA-SRA does not analyze other types of statements. */ >> gcc_unreachable (); >> @@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree >> dead_param, >> } >> >> /* Put all clobbers of of dereference of default definition of PARAM into >> - m_dea
[PATCH] math-opts: Add dbgcounter for FMA formation
Hello, This patch is a simple addition of a debug counter to FMA formation in tree-ssa-math-opts.cc. Given that issues with FMAs do occasionally pop up, it seems genuinely useful. I simply added an if right after the initial checks in convert_mult_to_fma even though when FMA formation deferring is active (i.e. when targeting Zen CPUs) this would interact with it (and at this moment lead to producing all deferred candidates), so when using the dbg counter to find a harmful set of FMAs, it is probably best to also set param_avoid_fma_max_bits to zero. I could not find a better place which would not also make the code unnecessarily more complicated. Bootstrapped and tested on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2023-09-06 Martin Jambor * dbgcnt.def (form_fma): New. * tree-ssa-math-opts.cc: Include dbgcnt.h. (convert_mult_to_fma): Bail out if the debug counter say so. --- gcc/dbgcnt.def| 1 + gcc/tree-ssa-math-opts.cc | 4 2 files changed, 5 insertions(+) diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def index 9e2f1d857b4..871cbf75d93 100644 --- a/gcc/dbgcnt.def +++ b/gcc/dbgcnt.def @@ -162,6 +162,7 @@ DEBUG_COUNTER (dom_unreachable_edges) DEBUG_COUNTER (dse) DEBUG_COUNTER (dse1) DEBUG_COUNTER (dse2) +DEBUG_COUNTER (form_fma) DEBUG_COUNTER (gcse2_delete) DEBUG_COUNTER (gimple_unroll) DEBUG_COUNTER (global_alloc_at_func) diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 95c22694368..3db69ad5733 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -116,6 +116,7 @@ along with GCC; see the file COPYING3. If not see #include "targhooks.h" #include "domwalk.h" #include "tree-ssa-math-opts.h" +#include "dbgcnt.h" /* This structure represents one basic block that either computes a division, or is a common dominator for basic block that compute a @@ -3366,6 +3367,9 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, && !has_single_use (mul_result)) return false; + if (!dbg_cnt (form_fma)) +return false; + /* Make sure that the multiplication statement becomes dead after the transformation, thus that all uses are transformed to FMAs. This means we assume that an FMA operation has the same cost -- 2.41.0
Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)
Hello and ping. Thanks, Martin On Fri, May 12 2023, Martin Jambor wrote: > Hi, > > PR 108007 is another manifestation where we rely on DCE to clean-up > after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA > can leave behind statements which are fed uninitialized values and > trap, even though their results are themselves never used. > > I have already fixed this for unused parameters in callees, this bug > shows that almost the same thing can happen for removed returns, on > the side of callers. This means that the issue has to be fixed > elsewhere, in call redirection. This patch adds a function which > recursivewly looks for uses of operations fed specific SSA names and > removes them all. > > That would have been easy if it wasn't for debug statements during > tree-inline (from which call redirection is also invoked). Debug > statements are decoupled from the rest at this point and iterating > over uses of SSAs does not bring them up. During tree-inline they are > handled especially at the end, I assume in order to make sure that > relative ordering of UIDs are the same with and without debug info. > > This means that during tree-inline we need to make a hash of killed > SSAs, that we already have in copy_body_data, available to the > function making the purging. So the patch duly does also that, making > the interface slightly ugly. > > Bootstrapped and tested on x86_64-linux. OK for master? (I am not sure > the problem is grave enough to warrant backporting to release branches > but can do that as well if people think I should.) > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2023-05-11 Martin Jambor > > PR ipa/108007 > * cgraph.h (cgraph_edge): Add a parameter to > redirect_call_stmt_to_callee. > * ipa-param-manipulation.h (ipa_param_adjustments): Added a > parameter to modify_call. > * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New > parameter killed_ssas, pass it to padjs->modify_call. > * ipa-param-manipulation.cc (purge_transitive_uses): New function. > (ipa_param_adjustments::modify_call): New parameter killed_ssas. > Instead of substitutin uses, invoke purge_transitive_uses. If > hash of killed SSAs has not been provided, create a temporary one > and release SSAs that have been added to it. > * tree-inline.cc (redirect_all_calls): Create > id->killed_new_ssa_names earlier, pass it to edge redirection, > adjust a comment. > (copy_body): Release SSAs in id->killed_new_ssa_names. > > gcc/testsuite/ChangeLog: > > 2023-05-11 Martin Jambor > > PR ipa/108007 > * gcc.dg/ipa/pr108007.c: New test. > --- > gcc/cgraph.cc | 10 +++- > gcc/cgraph.h| 9 ++- > gcc/ipa-param-manipulation.cc | 85 + > gcc/ipa-param-manipulation.h| 3 +- > gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++ > gcc/tree-inline.cc | 28 ++ > 6 files changed, 129 insertions(+), 38 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index e8f9bec8227..5e923bf0557 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n) > speculative indirect call, remove "speculative" of the indirect call and > also redirect stmt to it's final direct target. > > + When called from within tree-inline, KILLED_SSAs has to contain the > pointer > + to killed_new_ssa_names within the copy_body_data structure and SSAs > + discovered to be useless (if LHS is removed) will be added to it, > otherwise > + it needs to be NULL. > + > It is up to caller to iteratively transform each "speculative" > direct call as appropriate. */ > > gimple * > -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) > +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, > +hash_set *killed_ssas) > { >tree decl = gimple_call_fndecl (e->call_stmt); >gcall *new_stmt; > @@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge > *e) > remove_stmt_from_eh_lp (e->call_stmt); > >tree old_fntype = gimple_call_fntype (e->call_stmt); > - new_stmt = padjs->modify_call (e, false); > + new_stmt = padjs->modify_call (e, false, killed_ssas); >cgraph_node *origin = e->callee; >while (origin->clone_of) > origin = origin->clone_of; > diff --git a/gcc/cgraph.h
Re: [PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed
Hello and ping. Thanks, Martin On Fri, Aug 18 2023, Martin Jambor wrote: > Hi, > > testing on 32bit arm revealed that even the simplest case of PR 110378 > was still not resolved there because destructors were returning this > pointer. Needless to say, the return value of those destructors often > is just not used, which IPA-SRA can already detect in time. Since > such enhancement seems generally useful, here it is. > > The patch simply adds two flag to respective summaries to mark down > situations when it encounters either a simple direct use of a default > definition SSA_NAME of a parameter, which means that the parameter may > still be split when return value is removed, and when any derived use > of it is returned, allowing for complete removal in that case, instead > of discarding it as a candidate for removal or splitting like we do > now. The IPA phase then simply checks that we indeed plan to remove > the return value before allowing any transformation to be considered > in such cases. > > Bootstrapped, LTO-bootstrapped and tested on x86_64-linux. OK for > master? > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2023-08-18 Martin Jambor > > PR ipa/110378 > * ipa-param-manipulation.cc > (ipa_param_body_adjustments::mark_dead_statements): Verify that any > return uses of PARAM will be removed. > (ipa_param_body_adjustments::mark_clobbers_dead): Likewise. > * ipa-sra.cc (isra_param_desc): New fields > remove_only_when_retval_removed and split_only_when_retval_removed. > (struct gensum_param_desc): Likewise. Fix comment long line. > (ipa_sra_function_summaries::duplicate): Copy the new flags. > (dump_gensum_param_descriptor): Dump the new flags. > (dump_isra_param_descriptor): Likewise. > (isra_track_scalar_value_uses): New parameter desc. Set its flag > remove_only_when_retval_removed when encountering a simple return. > (isra_track_scalar_param_local_uses): Replace parameter call_uses_p > with desc. Pass it to isra_track_scalar_value_uses and set its > call_uses. > (ptr_parm_has_nonarg_uses): Accept parameter descriptor as a > parameter. If there is a direct return use, mark any.. > (create_parameter_descriptors): Pass the whole parameter descriptor to > isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses. > (process_scan_results): Copy the new flags. > (isra_write_node_summary): Stream the new flags. > (isra_read_node_info): Likewise. > (adjust_parameter_descriptions): Check that transformations > requring return removal only happen when return value is removed. > Restructure main loop. Adjust dump message. > > gcc/testsuite/ChangeLog: > > 2023-08-18 Martin Jambor > > PR ipa/110378 > * gcc.dg/ipa/ipa-sra-32.c: New test. > * gcc.dg/ipa/pr110378-4.c: Likewise. > * gcc.dg/ipa/ipa-sra-4.c: Use a return value. > --- > gcc/ipa-param-manipulation.cc | 7 +- > gcc/ipa-sra.cc| 247 +- > gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c | 30 > gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c | 4 +- > gcc/testsuite/gcc.dg/ipa/pr110378-4.c | 50 ++ > 5 files changed, 251 insertions(+), 87 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c > > diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc > index 4a185ddbdf4..ae52f17b2c9 100644 > --- a/gcc/ipa-param-manipulation.cc > +++ b/gcc/ipa-param-manipulation.cc > @@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree > dead_param, > stack.safe_push (lhs); > } > } > + else if (gimple_code (stmt) == GIMPLE_RETURN) > + gcc_assert (m_adjustments && m_adjustments->m_skip_return); > else > /* IPA-SRA does not analyze other types of statements. */ > gcc_unreachable (); > @@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree > dead_param, > } > > /* Put all clobbers of of dereference of default definition of PARAM into > - m_dead_stmts. */ > + m_dead_stmts. If there are returns among uses of the default definition > of > + PARAM, verify they will be stripped off the return value. */ > > void > ipa_param_body_adjustments::mark_clobbers_dead (tree param) > @@ -1200,6 +1203,8 @@ ipa_param_body_adjustments::mark_clobbers_dead (tree > param) > gimple *stmt = USE_STMT (use_p); > if (gimple_clobber_p (stmt)) > m_d
[PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed
Hi, testing on 32bit arm revealed that even the simplest case of PR 110378 was still not resolved there because destructors were returning this pointer. Needless to say, the return value of those destructors often is just not used, which IPA-SRA can already detect in time. Since such enhancement seems generally useful, here it is. The patch simply adds two flag to respective summaries to mark down situations when it encounters either a simple direct use of a default definition SSA_NAME of a parameter, which means that the parameter may still be split when return value is removed, and when any derived use of it is returned, allowing for complete removal in that case, instead of discarding it as a candidate for removal or splitting like we do now. The IPA phase then simply checks that we indeed plan to remove the return value before allowing any transformation to be considered in such cases. Bootstrapped, LTO-bootstrapped and tested on x86_64-linux. OK for master? Thanks, Martin gcc/ChangeLog: 2023-08-18 Martin Jambor PR ipa/110378 * ipa-param-manipulation.cc (ipa_param_body_adjustments::mark_dead_statements): Verify that any return uses of PARAM will be removed. (ipa_param_body_adjustments::mark_clobbers_dead): Likewise. * ipa-sra.cc (isra_param_desc): New fields remove_only_when_retval_removed and split_only_when_retval_removed. (struct gensum_param_desc): Likewise. Fix comment long line. (ipa_sra_function_summaries::duplicate): Copy the new flags. (dump_gensum_param_descriptor): Dump the new flags. (dump_isra_param_descriptor): Likewise. (isra_track_scalar_value_uses): New parameter desc. Set its flag remove_only_when_retval_removed when encountering a simple return. (isra_track_scalar_param_local_uses): Replace parameter call_uses_p with desc. Pass it to isra_track_scalar_value_uses and set its call_uses. (ptr_parm_has_nonarg_uses): Accept parameter descriptor as a parameter. If there is a direct return use, mark any.. (create_parameter_descriptors): Pass the whole parameter descriptor to isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses. (process_scan_results): Copy the new flags. (isra_write_node_summary): Stream the new flags. (isra_read_node_info): Likewise. (adjust_parameter_descriptions): Check that transformations requring return removal only happen when return value is removed. Restructure main loop. Adjust dump message. gcc/testsuite/ChangeLog: 2023-08-18 Martin Jambor PR ipa/110378 * gcc.dg/ipa/ipa-sra-32.c: New test. * gcc.dg/ipa/pr110378-4.c: Likewise. * gcc.dg/ipa/ipa-sra-4.c: Use a return value. --- gcc/ipa-param-manipulation.cc | 7 +- gcc/ipa-sra.cc| 247 +- gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c | 30 gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c | 4 +- gcc/testsuite/gcc.dg/ipa/pr110378-4.c | 50 ++ 5 files changed, 251 insertions(+), 87 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc index 4a185ddbdf4..ae52f17b2c9 100644 --- a/gcc/ipa-param-manipulation.cc +++ b/gcc/ipa-param-manipulation.cc @@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree dead_param, stack.safe_push (lhs); } } + else if (gimple_code (stmt) == GIMPLE_RETURN) + gcc_assert (m_adjustments && m_adjustments->m_skip_return); else /* IPA-SRA does not analyze other types of statements. */ gcc_unreachable (); @@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree dead_param, } /* Put all clobbers of of dereference of default definition of PARAM into - m_dead_stmts. */ + m_dead_stmts. If there are returns among uses of the default definition of + PARAM, verify they will be stripped off the return value. */ void ipa_param_body_adjustments::mark_clobbers_dead (tree param) @@ -1200,6 +1203,8 @@ ipa_param_body_adjustments::mark_clobbers_dead (tree param) gimple *stmt = USE_STMT (use_p); if (gimple_clobber_p (stmt)) m_dead_stmts.add (stmt); + else if (gimple_code (stmt) == GIMPLE_RETURN) + gcc_assert (m_adjustments && m_adjustments->m_skip_return); } } diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc index edba364f56e..817f29ea62f 100644 --- a/gcc/ipa-sra.cc +++ b/gcc/ipa-sra.cc @@ -185,6 +185,13 @@ struct GTY(()) isra_param_desc unsigned split_candidate : 1; /* Is this a parameter passing stuff by reference? */ unsigned by_ref : 1; + /* If set, this parameter can only be a candidate for removal if the func
Re: [PATCH] Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)
Hello, On Mon, Aug 14 2023, Harald Anlauf via Gcc-patches wrote: > Hi Martin, > > Am 14.08.23 um 19:39 schrieb Martin Jambor: >> Hello, >> >> this patch addresses an issue uncovered by the undefined behavior >> sanitizer. In function resolve_structure_cons in resolve.cc there is >> a test starting with: >> >>if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl >>&& comp->ts.u.cl->length >>&& comp->ts.u.cl->length->expr_type == EXPR_CONSTANT >> >> and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of >> integer value 1818451807 which is outside of the value range expr_t >> enum. If I understand the code correctly it the entire load was >> unwanted because comp->ts.type in those cases is BT_CLASS and not >> BT_CHARACTER. This patch simply adds a check to make sure it is only >> accessed in those cases. >> >> I have verified that the UPBSAN failure goes away with this patch, it >> also passes bootstrap and testing on x86_64-linux. OK for master? > > this looks good to me. > > Looking at that code block, there is a potential other UB a few lines > below, where (hopefully integer) string lengths are to be passed to > mpz_cmp. > > If the string length is ill-defined (e.g. non-integer), value.integer > is undefined. We've seen this elsewhere, where on BE platforms that > undefined value was interpreted as some large integer and giving > failures on those platforms. One could similarly add the following > checks here (on top of your patch): Thank you very much for the approval and the improvement. I have committed the following (after another round of testing). Martin Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677) This patch addresses an issue uncovered by the undefined behavior sanitizer. In function resolve_structure_cons in resolve.cc there is a test starting with: if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl && comp->ts.u.cl->length && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of integer value 1818451807 which is outside of the value range expr_t enum. If I understand the code correctly it the entire load was unwanted because comp->ts.type in those cases is BT_CLASS and not BT_CHARACTER. This patch simply adds a check to make sure it is only accessed in those cases. During review, Harald Anlauf noticed that length types also need to be checked and so I added also checks that he suggested to the condition. Co-authored-by: Harald Anlauf gcc/fortran/ChangeLog: 2023-08-14 Martin Jambor PR fortran/110677 * resolve.cc (resolve_structure_cons): Check comp->ts is character type before accessing stuff through comp->ts.u.cl. --- gcc/fortran/resolve.cc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index e7c8d919bef..f51674f7faa 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -1396,11 +1396,14 @@ resolve_structure_cons (gfc_expr *expr, int init) the one of the structure, ensure this if the lengths are known at compile time and when we are dealing with PARAMETER or structure constructors. */ - if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl - && comp->ts.u.cl->length + if (cons->expr->ts.type == BT_CHARACTER + && comp->ts.type == BT_CHARACTER + && comp->ts.u.cl && comp->ts.u.cl->length && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT + && cons->expr->ts.u.cl->length->ts.type == BT_INTEGER + && comp->ts.u.cl->length->ts.type == BT_INTEGER && mpz_cmp (cons->expr->ts.u.cl->length->value.integer, comp->ts.u.cl->length->value.integer) != 0) { -- 2.41.0
[PATCH] Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)
Hello, this patch addresses an issue uncovered by the undefined behavior sanitizer. In function resolve_structure_cons in resolve.cc there is a test starting with: if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl && comp->ts.u.cl->length && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of integer value 1818451807 which is outside of the value range expr_t enum. If I understand the code correctly it the entire load was unwanted because comp->ts.type in those cases is BT_CLASS and not BT_CHARACTER. This patch simply adds a check to make sure it is only accessed in those cases. I have verified that the UPBSAN failure goes away with this patch, it also passes bootstrap and testing on x86_64-linux. OK for master? Thanks, Martin gcc/fortran/ChangeLog: 2023-08-14 Martin Jambor PR fortran/110677 * resolve.cc (resolve_structure_cons): Check comp->ts is character type before accessing stuff through comp->ts.u.cl. --- gcc/fortran/resolve.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index e7c8d919bef..5b4dfc5fcd2 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -1396,8 +1396,9 @@ resolve_structure_cons (gfc_expr *expr, int init) the one of the structure, ensure this if the lengths are known at compile time and when we are dealing with PARAMETER or structure constructors. */ - if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl - && comp->ts.u.cl->length + if (cons->expr->ts.type == BT_CHARACTER + && comp->ts.type == BT_CHARACTER + && comp->ts.u.cl && comp->ts.u.cl->length && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT -- 2.41.0
Re: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering
Hello Richi, it took me quite time to get back to this but it might have actually helped because it forced me to re-read the code around and in turn simplify the patch. On Mon, Jun 12 2023, Richard Biener wrote: > On Fri, 9 Jun 2023, Martin Jambor wrote: > [...] >> @@ -2327,7 +2330,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd, >> with the current VUSE and performs the expression lookup. */ >> >> static void * >> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_) >> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_) >> { >>vn_walk_cb_data *data = (vn_walk_cb_data *)data_; >>vn_reference_t vr = data->vr; >> @@ -2361,6 +2364,38 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, >> tree vuse, void *data_) >>return *slot; >> } >> >> + if (SSA_NAME_IS_DEFAULT_DEF (vuse) > && data->partial_defs.is_empty ()) > > ^^ do this check early The check is actually done right at the beginning of the function already so I simply removed it. > >> +{ >> + HOST_WIDE_INT offset, size; >> + tree v = NULL_TREE; > tree base = ao_ref_base (op); > if ((TREE_CODE (base) == PARM_DECL > || TREE_CODE (base) == MEM_REF) > > handle both kind of bases with ... > >> + && op->offset.is_constant (&offset) >> + && op->size.is_constant (&size) >> + && op->max_size_known_p () >> + && known_eq (op->size, op->max_size)) > > ^^^ this preconditions (would have been missing in the MEM_REF branch > before) I missed that call to ao_ref_base fills in these fields - and in the pointer case that they are not filled in without it. I hope the patch below is the simplified version you wanted. The patch passed bootstrap and testing and also LTO bootstrap on x86_64-linux. Thanks, Martin PRs 68930 and 92497 show that when IPA-CP figures out constants in aggregate parameters or when passed by reference but the loads happen in an inlined function the information is lost. This happens even when the inlined function itself was known to have - or even cloned to have - such constants in incoming parameters because the transform phase of IPA passes is not run on them. See discussion in the bugs for reasons why. Honza suggested that we can plug the results of IPA-CP analysis into value numbering, so that FRE can figure out that some loads fetch known constants. This is what this patch attempts to do. The patch does not attempt to populate partial_defs with information from IPA-CP, this can be hopefully added as a follow-up. gcc/ChangeLog: 2023-08-11 Martin Jambor PR ipa/68930 PR ipa/92497 * ipa-prop.h (ipcp_get_aggregate_const): Declare. * ipa-prop.cc (ipcp_get_aggregate_const): New function. (ipcp_transform_function): Do not deallocate transformation info. * tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and ipa-prop.h. (vn_reference_lookup_2): When hitting default-def vuse, query IPA-CP transformation info for any known constants. gcc/testsuite/ChangeLog: 2023-06-07 Martin Jambor PR ipa/68930 PR ipa/92497 * gcc.dg/ipa/pr92497-1.c: New test. * gcc.dg/ipa/pr92497-2.c: Likewise. --- gcc/ipa-prop.cc | 33 +++ gcc/ipa-prop.h | 3 +++ gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 + gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 + gcc/tree-ssa-sccvn.cc| 34 +++- 5 files changed, 116 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index 4f6ed7b89bd..9efaa5cb848 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -5760,6 +5760,34 @@ ipcp_modif_dom_walker::before_dom_children (basic_block bb) return NULL; } +/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE + - whether passed by reference or not is given by BY_REF - return that + constant. Otherwise return NULL_TREE. */ + +tree +ipcp_get_aggregate_const (struct function *func, tree parm, bool by_ref, + HOST_WIDE_INT bit_offset, HOST_WIDE_INT bit_size) +{ + cgraph_node *node = cgraph_node::get (func->decl); + ipcp_transformation *ts = ipcp_get_transformation_summary (node); + + if (!ts || !ts->m_agg_values) +return NULL_TREE; + + int index = ts->get_param_index (func->decl, parm); + if (index < 0) +return NULL_TREE; + + ipa_argagg_value_list avl (ts); + unsigned unit_offset = bit_offset / BITS_PER_UNIT; + tree
Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting
Hello, On Fri, Aug 11 2023, Christophe Lyon wrote: > Hi Martin, > > > On Fri, 4 Aug 2023 at 18:26, Martin Jambor wrote: > >> Hello, >> >> On Wed, Aug 02 2023, Richard Biener wrote: >> > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor wrote: >> >> >> >> Hi, >> >> >> >> when IPA-SRA detects whether a parameter passed by reference is >> >> written to, it does not special case CLOBBERs which means it often >> >> bails out unnecessarily, especially when dealing with C++ destructors. >> >> Fixed by the obvious continue in the two relevant loops. >> >> >> >> The (slightly) more complex testcases in the PR need surprisingly more >> >> effort but the simple one can be fixed now easily by this patch and I'll >> >> work on the others incrementally. >> >> >> >> Bootstrapped and currently undergoing testsuite run on x86_64-linux. OK >> >> if it passes too? >> > >> > LGTM, btw - how are the clobbers handled during transform? >> >> it turns out your question is spot on. I assumed that the mini-DCE that >> I implemented into IPA-SRA transform would delete but I had a closer >> look and it is not invoked on split parameters,only on removed ones. >> What was actually happening is that the parameter got remapped to a >> default definition of a replacement VAR_DECL and we were thus >> gimple-clobbering a pointer pointing to nowhere. The clobber then got >> DSEd and so I originally did not notice looking at the optimized dump. >> >> Still that is of course not ideal and so I added a simple function >> removing clobbers when splitting. I as considering adding that >> functionality to ipa_param_body_adjustments::mark_dead_statements but >> that would make the function harder to read without much gain. >> >> So thanks again for the remark. The following passes bootstrap and >> testing on x86_64-linux. I am running LTO bootstrap now. OK if it >> passes? >> >> Martin >> >> >> >> When IPA-SRA detects whether a parameter passed by reference is >> written to, it does not special case CLOBBERs which means it often >> bails out unnecessarily, especially when dealing with C++ destructors. >> Fixed by the obvious continue in the two relevant loops and by adding >> a simple function that marks the clobbers in the transformation code >> as statements to be removed. >> >> > Not sure if you noticed: I updated bugzilla because the new test fails on > arm, and I attached pr110378-1.C.083i.sra there, to help you debug. > I am aware and have actually started looking at the issue a while ago. Sorry, I'm only slowly making my way through my TODO list. The difference on 32bit ARM is that the destructor return this pointer, which means that IPA-SRA cannot just split the loaded bit - without any follow-up IPA analysis that the return value is unused which it does not take into account this way. But now that we remove useless returns before splitting it should be doable. Meanwhile, is there a dejagnu target macro for architectures with destructors returning value so that we could xfail the test there? Thanks for bringing my attention to this. Martin > Thanks, > > Christophe > > gcc/ChangeLog: >> >> 2023-08-04 Martin Jambor >> >> PR ipa/110378 >> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New >> members get_ddef_if_exists_and_is_used and mark_clobbers_dead. >> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers. >> (ptr_parm_has_nonarg_uses): Likewise. >> * ipa-param-manipulation.cc >> (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New. >> (ipa_param_body_adjustments::mark_dead_statements): Move initial >> checks to get_ddef_if_exists_and_is_used. >> (ipa_param_body_adjustments::mark_clobbers_dead): New. >> (ipa_param_body_adjustments::common_initialization): Call >> mark_clobbers_dead when splitting. >> >> gcc/testsuite/ChangeLog: >> >> 2023-07-31 Martin Jambor >> >> PR ipa/110378 >> * g++.dg/ipa/pr110378-1.C: New test. >> --- >> gcc/ipa-param-manipulation.cc | 44 +--- >> gcc/ipa-param-manipulation.h | 2 ++ >> gcc/ipa-sra.cc| 6 ++-- >> gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++ >> 4 files changed, 94 insertions(+), 6 deletions(-) >> create mode 100644 gcc/testsuite/
Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting
Hello, On Wed, Aug 02 2023, Richard Biener wrote: > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor wrote: >> >> Hi, >> >> when IPA-SRA detects whether a parameter passed by reference is >> written to, it does not special case CLOBBERs which means it often >> bails out unnecessarily, especially when dealing with C++ destructors. >> Fixed by the obvious continue in the two relevant loops. >> >> The (slightly) more complex testcases in the PR need surprisingly more >> effort but the simple one can be fixed now easily by this patch and I'll >> work on the others incrementally. >> >> Bootstrapped and currently undergoing testsuite run on x86_64-linux. OK >> if it passes too? > > LGTM, btw - how are the clobbers handled during transform? it turns out your question is spot on. I assumed that the mini-DCE that I implemented into IPA-SRA transform would delete but I had a closer look and it is not invoked on split parameters,only on removed ones. What was actually happening is that the parameter got remapped to a default definition of a replacement VAR_DECL and we were thus gimple-clobbering a pointer pointing to nowhere. The clobber then got DSEd and so I originally did not notice looking at the optimized dump. Still that is of course not ideal and so I added a simple function removing clobbers when splitting. I as considering adding that functionality to ipa_param_body_adjustments::mark_dead_statements but that would make the function harder to read without much gain. So thanks again for the remark. The following passes bootstrap and testing on x86_64-linux. I am running LTO bootstrap now. OK if it passes? Martin When IPA-SRA detects whether a parameter passed by reference is written to, it does not special case CLOBBERs which means it often bails out unnecessarily, especially when dealing with C++ destructors. Fixed by the obvious continue in the two relevant loops and by adding a simple function that marks the clobbers in the transformation code as statements to be removed. gcc/ChangeLog: 2023-08-04 Martin Jambor PR ipa/110378 * ipa-param-manipulation.h (class ipa_param_body_adjustments): New members get_ddef_if_exists_and_is_used and mark_clobbers_dead. * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers. (ptr_parm_has_nonarg_uses): Likewise. * ipa-param-manipulation.cc (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New. (ipa_param_body_adjustments::mark_dead_statements): Move initial checks to get_ddef_if_exists_and_is_used. (ipa_param_body_adjustments::mark_clobbers_dead): New. (ipa_param_body_adjustments::common_initialization): Call mark_clobbers_dead when splitting. gcc/testsuite/ChangeLog: 2023-07-31 Martin Jambor PR ipa/110378 * g++.dg/ipa/pr110378-1.C: New test. --- gcc/ipa-param-manipulation.cc | 44 +--- gcc/ipa-param-manipulation.h | 2 ++ gcc/ipa-sra.cc| 6 ++-- gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++ 4 files changed, 94 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc index a286af7f5d9..4a185ddbdf4 100644 --- a/gcc/ipa-param-manipulation.cc +++ b/gcc/ipa-param-manipulation.cc @@ -1072,6 +1072,20 @@ ipa_param_body_adjustments::carry_over_param (tree t) return new_parm; } +/* If DECL is a gimple register that has a default definition SSA name and that + has some uses, return the default definition, otherwise return NULL_TREE. */ + +tree +ipa_param_body_adjustments::get_ddef_if_exists_and_is_used (tree decl) +{ + if (!is_gimple_reg (decl)) +return NULL_TREE; + tree ddef = ssa_default_def (m_id->src_cfun, decl); + if (!ddef || has_zero_uses (ddef)) +return NULL_TREE; + return ddef; +} + /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without any replacement or splitting. REPL is the replacement VAR_SECL to base any remaining uses of a removed parameter on. Push all removed SSA names that @@ -1084,10 +1098,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree dead_param, /* Current IPA analyses which remove unused parameters never remove a non-gimple register ones which have any use except as parameters in other calls, so we can safely leve them as they are. */ - if (!is_gimple_reg (dead_param)) -return; - tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param); - if (!parm_ddef || has_zero_uses (parm_ddef)) + tree parm_ddef = get_ddef_if_exists_and_is_used (dead_param); + if (!parm_ddef) return; auto_vec stack; @@ -1169,6 +1181,28 @@ ipa_param_body_adjustments::mark_dead_statements (tree dead_param, m_dead_ssa_debug_equiv.pu
[PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting
Hi, when IPA-SRA detects whether a parameter passed by reference is written to, it does not special case CLOBBERs which means it often bails out unnecessarily, especially when dealing with C++ destructors. Fixed by the obvious continue in the two relevant loops. The (slightly) more complex testcases in the PR need surprisingly more effort but the simple one can be fixed now easily by this patch and I'll work on the others incrementally. Bootstrapped and currently undergoing testsuite run on x86_64-linux. OK if it passes too? Thanks, Martin gcc/ChangeLog: 2023-07-31 Martin Jambor PR ipa/110378 * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers. (ptr_parm_has_nonarg_uses): Likewise. gcc/testsuite/ChangeLog: 2023-07-31 Martin Jambor PR ipa/110378 * g++.dg/ipa/pr110378-1.C: New test. --- gcc/ipa-sra.cc| 6 ++-- gcc/testsuite/g++.dg/ipa/pr110378-1.C | 47 +++ 2 files changed, 51 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc index c35e03b7abd..edba364f56e 100644 --- a/gcc/ipa-sra.cc +++ b/gcc/ipa-sra.cc @@ -898,7 +898,8 @@ isra_track_scalar_value_uses (function *fun, cgraph_node *node, tree name, FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name) { - if (is_gimple_debug (stmt)) + if (is_gimple_debug (stmt) + || gimple_clobber_p (stmt)) continue; /* TODO: We could handle at least const builtin functions like arithmetic @@ -1056,7 +1057,8 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function *fun, tree parm, unsigned uses_ok = 0; use_operand_p use_p; - if (is_gimple_debug (stmt)) + if (is_gimple_debug (stmt) + || gimple_clobber_p (stmt)) continue; if (gimple_assign_single_p (stmt)) diff --git a/gcc/testsuite/g++.dg/ipa/pr110378-1.C b/gcc/testsuite/g++.dg/ipa/pr110378-1.C new file mode 100644 index 000..aabe326b8b2 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr110378-1.C @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-sra -fdump-tree-optimized-slim" } */ + +/* Test that even though destructors end with clobbering all of *this, it + should not prevent IPA-SRA. */ + +namespace { + + class foo + { + public: +int *a; +foo(int c) +{ + a = new int[c]; + a[0] = 4; +} +__attribute__((noinline)) ~foo(); +int f () +{ + return a[0] + 1; +} + }; + + volatile int v1 = 4; + + __attribute__((noinline)) foo::~foo() + { +delete[] a; +return; + } + + +} + +volatile int v2 = 20; + +int test (void) +{ + foo shouldnotexist(v2); + v2 = shouldnotexist.f(); + return 0; +} + + +/* { dg-final { scan-ipa-dump "Will split parameter 0" "sra" } } */ +/* { dg-final { scan-tree-dump-not "shouldnotexist" "optimized" } } */ -- 2.41.0
Re: [PATCH] Read global value/mask in IPA.
Hello, On Tue, Jul 18 2023, Aldy Hernandez wrote: > On 7/17/23 15:14, Aldy Hernandez wrote: >> Instead of reading the known zero bits in IPA, read the value/mask >> pair which is available. >> >> There is a slight change of behavior here. I have removed the check >> for SSA_NAME, as the ranger can calculate the range and value/mask for >> INTEGER_CST. This simplifies the code a bit, since there's no special >> casing when setting the jfunc bits. The default range for VR is >> undefined, so I think it's safe just to check for undefined_p(). > > Final round of tests revealed a regression for which I've adjusted the > testcase. > > It turns out g++.dg/ipa/pure-const-3.C fails because IPA can now pick up > value/mask from any pass that has an integrated ranger. The test was > previously disabling evrp and CCP, but now VRP[12], jump threading, and > DOM can make value/mask adjustments visible to IPA so they must be > disabled as well. So can this be then converted into a new testcase that would test that we can now derive something we could not in the past? The patch is OK (but the testcase above is highly desirable). Thanks for keeping looking at IPA-VR. Martin > > We've run into these scenarios multiple times in the past-- any > improvements to the ranger pipeline causes everyone to get smarter, > making changes visible earlier in the pipeline. > > Aldy > From e1dfd4d6b3d3bf09d55b6ea3ac732462c7030802 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez > Date: Fri, 14 Jul 2023 12:38:16 +0200 > Subject: [PATCH] Read global value/mask in IPA. > > Instead of reading the known zero bits in IPA, read the value/mask > pair which is available. > > There is a slight change of behavior here. I have removed the check > for SSA_NAME, as the ranger can calculate the range and value/mask for > INTEGER_CST. This simplifies the code a bit, since there's no special > casing when setting the jfunc bits. The default range for VR is > undefined, so I think it's safe just to check for undefined_p(). > > gcc/ChangeLog: > > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global > value/mask. > > gcc/testsuite/ChangeLog: > > * g++.dg/ipa/pure-const-3.C: Adjust for smarter value/mask being > read by ranger earlier than expected by test. > --- > gcc/ipa-prop.cc | 18 -- > gcc/testsuite/g++.dg/ipa/pure-const-3.C | 2 +- > 2 files changed, 9 insertions(+), 11 deletions(-) > > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index 5d790ff1265..4f6ed7b89bd 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, > } >else > { > - if (TREE_CODE (arg) == SSA_NAME > - && param_type > + if (param_type > && Value_Range::supports_type_p (TREE_TYPE (arg)) > && Value_Range::supports_type_p (param_type) > && irange::supports_p (TREE_TYPE (arg)) > @@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, > gcc_assert (!jfunc->m_vr); > } > > - if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) > - && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST)) > + if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ()) > { > - if (TREE_CODE (arg) == SSA_NAME) > - ipa_set_jfunc_bits (jfunc, 0, > - widest_int::from (get_nonzero_bits (arg), > - TYPE_SIGN (TREE_TYPE (arg; > - else > - ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0); > + irange &r = as_a (vr); > + irange_bitmask bm = r.get_bitmask (); > + signop sign = TYPE_SIGN (TREE_TYPE (arg)); > + ipa_set_jfunc_bits (jfunc, > + widest_int::from (bm.value (), sign), > + widest_int::from (bm.mask (), sign)); > } >else if (POINTER_TYPE_P (TREE_TYPE (arg))) > { > diff --git a/gcc/testsuite/g++.dg/ipa/pure-const-3.C > b/gcc/testsuite/g++.dg/ipa/pure-const-3.C > index b4a4673e86e..e43cf09af27 100644 > --- a/gcc/testsuite/g++.dg/ipa/pure-const-3.C > +++ b/gcc/testsuite/g++.dg/ipa/pure-const-3.C > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp > -fdisable-tree-evrp" } */ > +/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp > -fdisable-tree-evrp -fdisable-tree-vrp1 -fdisable-tree-vrp2 -fno-thread-jumps > -fno-tree-dominator-opts" } */ > int *ptr; > static int barvar; > static int b(int a); > -- > 2.40.1
Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file
Hello Lehua, On Fri, Jul 21 2023, Lehua Ding wrote: > Hi Martin, > > > By the way, is there a standard format required for these Python files? Generally, our Python coding conventions are at https://gcc.gnu.org/codingconventions.html#python > I see that other Python files have similar format error when checked > using flake8. For historic reasons (i.e. Martin Liška set it up that way), we currently use flake8 to check python formatting of contrib/gcc-changelog, contrib/mklog.py and maintainer-scripts/branch_changer.py and use pytest to check contrib/gcc-changelog and contrib/test_mklog.py. That is how I found out. I guess many of the files predate the coding conventions and so don't adhere to them. Patches to fix them are welcome (I guess) but at least we should not regress (I guess). > If so, it feels necessary to configure a git hook on git server to do > this check. Performing more thorough checks on pushed commits is a much larger topic than this thread. FWIW, I would not oppose to checking python scripts that are known to be OK. Martin
Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file
Hello Lehua, On Fri, Jul 21 2023, Lehua Ding wrote: > Hi Martin, > > > > this patch caused flake8 to complain about contrib/mklog.py: > > > > $ flake8 contrib/mklog.py > > contrib/mklog.py:377:80: E501 line too long (85 > 79 characters) > > contrib/mklog.py:388:26: E127 continuation line over-indented for > visual indent > > contrib/mklog.py:388:36: W605 invalid escape sequence '\s' > > contrib/mklog.py:388:40: W605 invalid escape sequence '\s' > > contrib/mklog.py:388:44: W605 invalid escape sequence '\s' > > contrib/mklog.py:388:47: W605 invalid escape sequence '\|' > > contrib/mklog.py:388:49: W605 invalid escape sequence '\s' > > contrib/mklog.py:388:51: W605 invalid escape sequence '\d' > > contrib/mklog.py:388:54: W605 invalid escape sequence '\s' > > contrib/mklog.py:388:58: W605 invalid escape sequence '\-' > > > > Can you please have a look and ideally fix the issues? > > > Thank you for pointing out this. > I will fix these format errors in another fix patch[1]. Thanks! > I tried to fix the following format error but couldn't > find a way, do you know how to fix this error? > > > contrib/mklog.py:388:26: E127 continuation line over-indented for visual > indent I am no python expert but the following seems to work: diff --git a/contrib/mklog.py b/contrib/mklog.py index 26230b9b4f2..2563d19bc99 100755 --- a/contrib/mklog.py +++ b/contrib/mklog.py @@ -384,8 +384,8 @@ if __name__ == '__main__': for line in f: if maybe_diff_log == 1 and line == "---\n": maybe_diff_log = 2 -elif maybe_diff_log == 2 and \ - re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line): +elif (maybe_diff_log == 2 and + re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line)): lines += [output, "---\n", line] maybe_diff_log = 3 else: Martin
Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file
Hello Lehua, On Wed, Jul 12 2023, Lehua Ding wrote: > Hi, > > This tiny patch add --append option to mklog.py that support add generated > ChangeLog to the corresponding patch file. With this option there is no need > to manually copy the generated ChangeLog to the patch file. e.g.: > > Run `mklog.py -a /path/to/this/patch` will add the generated ChangeLog > > ``` > contrib/ChangeLog: > > * mklog.py: > ``` this patch caused flake8 to complain about contrib/mklog.py: $ flake8 contrib/mklog.py contrib/mklog.py:377:80: E501 line too long (85 > 79 characters) contrib/mklog.py:388:26: E127 continuation line over-indented for visual indent contrib/mklog.py:388:36: W605 invalid escape sequence '\s' contrib/mklog.py:388:40: W605 invalid escape sequence '\s' contrib/mklog.py:388:44: W605 invalid escape sequence '\s' contrib/mklog.py:388:47: W605 invalid escape sequence '\|' contrib/mklog.py:388:49: W605 invalid escape sequence '\s' contrib/mklog.py:388:51: W605 invalid escape sequence '\d' contrib/mklog.py:388:54: W605 invalid escape sequence '\s' contrib/mklog.py:388:58: W605 invalid escape sequence '\-' Can you please have a look and ideally fix the issues? Thanks, Martin > > to the right place of the /path/to/this/patch file. > > Best, > Lehua > > contrib/ChangeLog: > > * mklog.py: Add --append option. > > --- > contrib/mklog.py | 27 ++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/contrib/mklog.py b/contrib/mklog.py > index 777212c98d7..26230b9b4f2 100755 > --- a/contrib/mklog.py > +++ b/contrib/mklog.py > @@ -358,6 +358,8 @@ if __name__ == '__main__': > 'file') > parser.add_argument('--update-copyright', action='store_true', > help='Update copyright in ChangeLog files') > +parser.add_argument('-a', '--append', action='store_true', > +help='Append the generate ChangeLog to the patch > file') > args = parser.parse_args() > if args.input == '-': > args.input = None > @@ -370,7 +372,30 @@ if __name__ == '__main__': > else: > output = generate_changelog(data, args.no_functions, > args.fill_up_bug_titles, args.pr_numbers) > -if args.changelog: > +if args.append: > +if (not args.input): > +raise Exception("`-a or --append` option not support > standard input") > +lines = [] > +with open(args.input, 'r', newline='\n') as f: > +# 1 -> not find the possible start of diff log > +# 2 -> find the possible start of diff log > +# 3 -> finish add ChangeLog to the patch file > +maybe_diff_log = 1 > +for line in f: > +if maybe_diff_log == 1 and line == "---\n": > +maybe_diff_log = 2 > +elif maybe_diff_log == 2 and \ > + re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line): > +lines += [output, "---\n", line] > +maybe_diff_log = 3 > +else: > +# the possible start is not the true start. > +if maybe_diff_log == 2: > +maybe_diff_log = 1 > +lines.append(line) > +with open(args.input, "w") as f: > +f.writelines(lines) > +elif args.changelog: > lines = open(args.changelog).read().split('\n') > start = list(takewhile(skip_line_in_changelog, lines)) > end = lines[len(start):] > -- > 2.36.1
[committed] Document new analyzer parameters
Hi, This patch documents the analyzer parameters introduced in r14-2029-g0e466e978c7286 also in gcc/doc/invoke.texi. Committed as obvious after testing with make pdf and make info and eyeballing the result. Thanks, Martin 2023-07-20 Martin Jambor * doc/invoke.texi (analyzer-text-art-string-ellipsis-threshold): New. (analyzer-text-art-ideal-canvas-width): Likewise. (analyzer-text-art-string-ellipsis-head-len): Likewise. (analyzer-text-art-string-ellipsis-tail-len): Likewise. --- gcc/doc/invoke.texi | 12 1 file changed, 12 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d3c821e208a..5628c08214d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16324,6 +16324,18 @@ The parameter is used only in GIMPLE FE. The maximum number of 'after supernode' exploded nodes within the analyzer per supernode, before terminating analysis. +@item analyzer-text-art-string-ellipsis-threshold +The number of bytes at which to ellipsize string literals in analyzer text art diagrams. + +@item analyzer-text-art-ideal-canvas-width +The ideal width in characters of text art diagrams generated by the analyzer. + +@item analyzer-text-art-string-ellipsis-head-len +The number of literal bytes to show at the head of a string literal in text art when ellipsizing it. + +@item analyzer-text-art-string-ellipsis-tail-len +The number of literal bytes to show at the tail of a string literal in text art when ellipsizing it. + @item ranger-logical-depth Maximum depth of logical expression evaluation ranger will look through when evaluating outgoing edge ranges. -- 2.41.0
[committed] Restore bootstrap by removing unused variable in tree-ssa-loop-ivcanon.cc
Hi, This restores bootstrap by removing the variable causing: /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc: In function ‘bool try_peel_loop(loop*, edge, tree, bool, long int)’: /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc:1170:17: error: variable ‘entry_count’ set but not used [-Werror=unused-but-set-variable] 1170 | profile_count entry_count = profile_count::zero (); | ^~~ cc1plus: all warnings being treated as errors ACKed by Honza in a chat, passed a bootstrap on x86_64-linux, committed. Thanks, Martin gcc/ChangeLog: 2023-07-17 Martin Jambor * tree-ssa-loop-ivcanon.cc (try_peel_loop): Remove unused variable entry_count. --- gcc/tree-ssa-loop-ivcanon.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index bdb738af7a8..a895e8e65be 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++ b/gcc/tree-ssa-loop-ivcanon.cc @@ -1167,7 +1167,6 @@ try_peel_loop (class loop *loop, loop->num, (int) npeel); } adjust_loop_info_after_peeling (loop, npeel, true); - profile_count entry_count = profile_count::zero (); bitmap_set_bit (peeled_loops, loop->num); return true; -- 2.41.0
Re: [PATCH] Export value/mask known bits from IPA.
Hi Aldy, On Mon, Jul 17 2023, Aldy Hernandez wrote: > Currently IPA throws away the known 1 bits because VRP and irange have > traditionally only had a way of tracking known 0s (set_nonzero_bits). > With the ability to keep all the known bits in the irange, we can now > save this between passes. > > OK? > > gcc/ChangeLog: > > * ipa-prop.cc (ipcp_update_bits): Export value/mask known bits. OK, thanks. Martin > --- > gcc/ipa-prop.cc | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index d2b998f8af5..5d790ff1265 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, > ipcp_transformation *ts) > { > unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef)); > signop sgn = TYPE_SIGN (TREE_TYPE (ddef)); > - > - wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED) > - | wide_int::from (bits[i]->value, prec, sgn); > - set_nonzero_bits (ddef, nonzero_bits); > + wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED); > + wide_int value = wide_int::from (bits[i]->value, prec, sgn); > + set_bitmask (ddef, value, mask); > } >else > { > -- > 2.40.1
Re: [PATCH 3/3] analyzer: add text-art visualizations of out-of-bounds accesses [PR106626]
Hi David, On Wed, May 31 2023, David Malcolm via Gcc-patches wrote: > This patch extends -Wanalyzer-out-of-bounds so that, where possible, it > will emit a text art diagram visualizing the spatial relationship between [...] > > gcc/ChangeLog: > PR analyzer/106626 > * Makefile.in (ANALYZER_OBJS): Add analyzer/access-diagram.o. > * doc/invoke.texi (Wanalyzer-out-of-bounds): Add description of > text art. > (fanalyzer-debug-text-art): New. > > gcc/analyzer/ChangeLog: > PR analyzer/106626 > * access-diagram.cc: New file. > * access-diagram.h: New file. > * analyzer.h (class region_offset): Add default ctor. > (region_offset::make_byte_offset): New decl. > (region_offset::concrete_p): New. > (region_offset::get_concrete_byte_offset): New. > (region_offset::calc_symbolic_bit_offset): New decl. > (region_offset::calc_symbolic_byte_offset): New decl. > (region_offset::dump_to_pp): New decl. > (region_offset::dump): New decl. > (operator<, operator<=, operator>, operator>=): New decls for > region_offset. > * analyzer.opt > (-param=analyzer-text-art-string-ellipsis-threshold=): New. > (-param=analyzer-text-art-string-ellipsis-head-len=): New. > (-param=analyzer-text-art-string-ellipsis-tail-len=): New. > (-param=analyzer-text-art-ideal-canvas-width=): New. contrib/check-params-in-docs.py now complains that: $ ./gcc/xgcc -Bgcc --help=param &>/tmp/params.txt $ ../src/contrib/check-params-in-docs.py ../src/gcc/doc/invoke.texi /tmp/params.txt Missing: @item analyzer-text-art-string-ellipsis-threshold The number of bytes at which to ellipsize string literals in @item analyzer-text-art-string-ellipsis-head-len The number of literal bytes to show at the head of a string @item analyzer-text-art-string-ellipsis-tail-len The number of literal bytes to show at the tail of a string @item analyzer-text-art-ideal-canvas-width The ideal width in characters of text art diagrams generated by the Can you please add the respective documentation entries? Thanks! Martin
[committed] Regenrate lto-plugin/Makefile.in
Hi, On Thu, Jun 29 2023, Marek Polacek wrote: > On Thu, Jun 29, 2023 at 05:58:22PM +0200, Martin Jambor wrote: [...] >> >> Unfortunately I won't have time to actually look at this in the next 2-3 >> weeks, so I am inclined to just trust the verification script (which >> essentially runs autoconf/automake everywhere and then expects no diff) >> and commit the one-line change. What do you think, does that make sense >> (even without looking at why other Makefile.in files did not change)? > > Yes please, go ahead with the one line change meanwhile. Thanks! > > I've opened PR110467 for the build problem. > > Marek Commit regenerated lto-plugin/Makefile.in in order to reflect changes introduction of --enable-host-pie. lto-plugin/ChangeLog: 2023-06-30 Martin Jambor * Makefile.in: Regenerate. --- lto-plugin/Makefile.in | 1 + 1 file changed, 1 insertion(+) diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index cb568e1e09f..f6f5b020ff5 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -298,6 +298,7 @@ datadir = @datadir@ datarootdir = @datarootdir@ docdir = @docdir@ dvidir = @dvidir@ +enable_host_bind_now = @enable_host_bind_now@ exec_prefix = @exec_prefix@ gcc_build_dir = @gcc_build_dir@ get_gcc_base_ver = @get_gcc_base_ver@ -- 2.41.0
Re: [PATCH] configure: Implement --enable-host-bind-now
Hi, On Tue, Jun 27 2023, Marek Polacek wrote: > On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote: >> Hello, >> >> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote: >> > As promised in the --enable-host-pie patch, this patch adds another >> > configure option, --enable-host-bind-now, which adds -z now when linking >> > the compiler executables in order to extend hardening. BIND_NOW with RELRO >> > allows the GOT to be marked RO; this prevents GOT modification attacks. >> > >> > This option does not affect linking of target libraries; you can use >> > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW. >> > >> > With this patch: >> > $ readelf -Wd cc1{,plus} | grep FLAGS >> > 0x001e (FLAGS) BIND_NOW >> > 0x6ffb (FLAGS_1)Flags: NOW PIE >> > 0x001e (FLAGS) BIND_NOW >> > 0x6ffb (FLAGS_1)Flags: NOW PIE >> > >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? >> > >> > c++tools/ChangeLog: >> > >> >* configure.ac (--enable-host-bind-now): New check. >> >* configure: Regenerate. >> > >> > gcc/ChangeLog: >> > >> >* configure.ac (--enable-host-bind-now): New check. Add >> >-Wl,-z,now to LD_PICFLAG if --enable-host-bind-now. >> >* configure: Regenerate. >> >* doc/install.texi: Document --enable-host-bind-now. >> > >> > lto-plugin/ChangeLog: >> > >> >* configure.ac (--enable-host-bind-now): New check. Link with >> >-z,now. >> >* configure: Regenerate. >> >> Our reconfiguration checking script complains about a missing hunk in >> lto-plugin/Makefile.in: >> >> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in >> index cb568e1e09f..f6f5b020ff5 100644 >> --- a/lto-plugin/Makefile.in >> +++ b/lto-plugin/Makefile.in >> @@ -298,6 +298,7 @@ datadir = @datadir@ >> datarootdir = @datarootdir@ >> docdir = @docdir@ >> dvidir = @dvidir@ >> +enable_host_bind_now = @enable_host_bind_now@ >> exec_prefix = @exec_prefix@ >> gcc_build_dir = @gcc_build_dir@ >> get_gcc_base_ver = @get_gcc_base_ver@ >> >> >> I am somewhat puzzled why the line is not missing in any of the other >> Makefile.in files. Can you please check whether that is the only thing >> that is missing (assuming it is actually missing)? > > Arg, once again, I'm sorry. I don't know how this happened. It would > be trivial to fix it but since > > commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f > Author: Eric Botcazou > Date: Wed Jun 21 18:19:36 2023 +0200 > > ada: Fix build of GNAT tools > > the build with Ada included fails with --enable-host-pie. So that needs > to be fixed first. > > Eric, I'm not asking you to fix that, but I'm curious, what did the > commit above fix? The patch looks correct; I'm just puzzled why I > hadn't seen any build failures. > > The --enable-host-pie patch has been a nightmare :(. > No worries, I can see how these things can easily get difficult. Unfortunately I won't have time to actually look at this in the next 2-3 weeks, so I am inclined to just trust the verification script (which essentially runs autoconf/automake everywhere and then expects no diff) and commit the one-line change. What do you think, does that make sense (even without looking at why other Makefile.in files did not change)? Thanks, Martin
Re: Enable ranger for ipa-prop
On Tue, Jun 27 2023, Jan Hubicka wrote: > Hi, > as shown in the testcase (which would eventually be useful for > optimizing std::vector's push_back), ipa-prop can use context dependent ranger > queries for better value range info. > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > > gcc/ChangeLog: > > PR middle-end/110377 > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Add ranger > parameter; use ranger instance for rnage queries. > (ipa_compute_jump_functions_for_bb): Pass around ranger. > (analysis_dom_walker::before_dom_children): Enable ranger. Looks good to me (with or without passing a ranger parameter around). Martin > > gcc/testsuite/ChangeLog: > > PR middle-end/110377 > * gcc.dg/tree-ssa/pr110377.c: New test. > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c > new file mode 100644 > index 000..cbe3441caea > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile */ > +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */ > +int test3(int); > +__attribute__ ((noinline)) > +void test2(int a) > +{ > + test3(a); > +} > +void > +test(int n) > +{ > +if (n > 5) > + __builtin_unreachable (); > +test2(n); > +} > +/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } } */ > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index 41c812194ca..693d4805d93 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -2341,7 +2341,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr) > > static void > ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, > - struct cgraph_edge *cs) > + struct cgraph_edge *cs, > + gimple_ranger *ranger) > { >ipa_node_params *info = ipa_node_params_sum->get (cs->caller); >ipa_edge_args *args = ipa_edge_args_sum->get_create (cs); > @@ -2386,7 +2387,7 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, > > if (TREE_CODE (arg) == SSA_NAME > && param_type > - && get_range_query (cfun)->range_of_expr (vr, arg) > + && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt) > && vr.nonzero_p ()) > addr_nonzero = true; > else if (tree_single_nonzero_warnv_p (arg, &strict_overflow)) > @@ -2408,7 +2409,7 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, > && Value_Range::supports_type_p (param_type) > && irange::supports_p (TREE_TYPE (arg)) > && irange::supports_p (param_type) > - && get_range_query (cfun)->range_of_expr (vr, arg) > + && ranger->range_of_expr (vr, arg, cs->call_stmt) > && !vr.undefined_p ()) > { > Value_Range resvr (vr); > @@ -2517,7 +2518,8 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, > from BB. */ > > static void > -ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, > basic_block bb) > +ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, > basic_block bb, > +gimple_ranger *ranger) > { >struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb); >int i; > @@ -2536,7 +2538,7 @@ ipa_compute_jump_functions_for_bb (struct > ipa_func_body_info *fbi, basic_block b > && !gimple_call_fnspec (cs->call_stmt).known_p ()) > continue; > } > - ipa_compute_jump_functions_for_edge (fbi, cs); > + ipa_compute_jump_functions_for_edge (fbi, cs, ranger); > } > } > > @@ -3110,19 +3112,27 @@ class analysis_dom_walker : public dom_walker > { > public: >analysis_dom_walker (struct ipa_func_body_info *fbi) > -: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {} > +: dom_walker (CDI_DOMINATORS), m_fbi (fbi) > + { > +m_ranger = enable_ranger (cfun, false); > + } > + ~analysis_dom_walker () > + { > +disable_ranger (cfun); > + } > >edge before_dom_children (basic_block) final override; > > private: >struct ipa_func_body_info *m_fbi; > + gimple_ranger *m_ranger; > }; > > edge > analysis_dom_walker::before_dom_children (basic_block bb) > { >ipa_analyze_params_uses_in_bb (m_fbi, bb); > - ipa_compute_jump_functions_for_bb (m_fbi, bb); > + ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger); >return NULL; > } >
Re: [PATCH] configure: Implement --enable-host-bind-now
Hello, On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote: > As promised in the --enable-host-pie patch, this patch adds another > configure option, --enable-host-bind-now, which adds -z now when linking > the compiler executables in order to extend hardening. BIND_NOW with RELRO > allows the GOT to be marked RO; this prevents GOT modification attacks. > > This option does not affect linking of target libraries; you can use > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW. > > With this patch: > $ readelf -Wd cc1{,plus} | grep FLAGS > 0x001e (FLAGS) BIND_NOW > 0x6ffb (FLAGS_1)Flags: NOW PIE > 0x001e (FLAGS) BIND_NOW > 0x6ffb (FLAGS_1)Flags: NOW PIE > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > c++tools/ChangeLog: > > * configure.ac (--enable-host-bind-now): New check. > * configure: Regenerate. > > gcc/ChangeLog: > > * configure.ac (--enable-host-bind-now): New check. Add > -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now. > * configure: Regenerate. > * doc/install.texi: Document --enable-host-bind-now. > > lto-plugin/ChangeLog: > > * configure.ac (--enable-host-bind-now): New check. Link with > -z,now. > * configure: Regenerate. Our reconfiguration checking script complains about a missing hunk in lto-plugin/Makefile.in: diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index cb568e1e09f..f6f5b020ff5 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -298,6 +298,7 @@ datadir = @datadir@ datarootdir = @datarootdir@ docdir = @docdir@ dvidir = @dvidir@ +enable_host_bind_now = @enable_host_bind_now@ exec_prefix = @exec_prefix@ gcc_build_dir = @gcc_build_dir@ get_gcc_base_ver = @get_gcc_base_ver@ I am somewhat puzzled why the line is not missing in any of the other Makefile.in files. Can you please check whether that is the only thing that is missing (assuming it is actually missing)? Thanks, Martin
Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.
Hi, On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote: > Minor cleanups to get rid of value_range in IPA. There's only one left, > but it's in the switch code which is integer specific. > > OK? With the same request that... > > gcc/ChangeLog: > > * ipa-cp.cc (decide_whether_version_node): Adjust comment. > * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust > for Value_Range. > (set_switch_stmt_execution_predicate): Same. > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same. > --- > gcc/ipa-cp.cc| 3 +-- > gcc/ipa-fnsummary.cc | 22 ++ > gcc/ipa-prop.cc | 9 +++-- > 3 files changed, 18 insertions(+), 16 deletions(-) > > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc > index 03273666ea2..2e64415096e 100644 > --- a/gcc/ipa-cp.cc > +++ b/gcc/ipa-cp.cc > @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node *node) > { > /* If some values generated for self-recursive calls with >arithmetic jump functions fall outside of the known > - value_range for the parameter, we can skip them. VR interface > - supports this only for integers now. */ > + range for the parameter, we can skip them. */ > if (TREE_CODE (val->value) == INTEGER_CST > && !plats->m_value_range.bottom_p () > && !ipa_range_contains_p (plats->m_value_range.m_vr, > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc > index 0474af8991e..1ce8501fe85 100644 > --- a/gcc/ipa-fnsummary.cc > +++ b/gcc/ipa-fnsummary.cc > @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct cgraph_node > *node, > if (vr.varying_p () || vr.undefined_p ()) > break; > > - value_range res; > + Value_Range res (op->type); > if (!op->val[0]) > { > + Value_Range varying (op->type); > + varying.set_varying (op->type); > range_op_handler handler (op->code, op->type); > if (!handler > || !res.supports_type_p (op->type) > - || !handler.fold_range (res, op->type, vr, > - value_range (op->type))) > + || !handler.fold_range (res, op->type, vr, varying)) > res.set_varying (op->type); > } > else if (!op->val[1]) > { > - value_range op0; > + Value_Range op0 (op->type); > range_op_handler handler (op->code, op->type); > > ipa_range_set_and_normalize (op0, op->val[0]); > @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct cgraph_node > *node, > } > if (!vr.varying_p () && !vr.undefined_p ()) > { > - value_range res; > - value_range val_vr; > + int_range<2> res; > + Value_Range val_vr (TREE_TYPE (c->val)); > range_op_handler handler (c->code, boolean_type_node); > > ipa_range_set_and_normalize (val_vr, c->val); > > if (!handler > - || !res.supports_type_p (boolean_type_node) > + || !val_vr.supports_type_p (TREE_TYPE (c->val)) > || !handler.fold_range (res, boolean_type_node, vr, > val_vr)) > res.set_varying (boolean_type_node); > > @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct > ipa_func_body_info *fbi, >int bound_limit = opt_for_fn (fbi->node->decl, > param_ipa_max_switch_predicate_bounds); >int bound_count = 0; > - value_range vr; > + // This can safely be an integer range, as switches can only hold > + // integers. > + int_range<2> vr; > >get_range_query (cfun)->range_of_expr (vr, op); >if (vr.undefined_p ()) > vr.set_varying (TREE_TYPE (op)); >tree vr_min, vr_max; > + // ?? This entire function could use a rewrite to use the irange > + // API, instead of trying to recreate its intersection/union logic. > + // Any use of get_legacy_range() is a serious code smell. you replace "??" with TODO, because that is presumably what you mean. OK with that change. Thanks, Martin >value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max); >wide_int vr_wmin = wi::to_wide (vr_min); >wide_int vr_wmax = wi::to_wide (vr_max); > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index 6383bc11e0a..5f9e6dbbff2 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -2348,7 +2348,6 @@ ipa_compute_jump_functions_for_edge (struct > ipa_func_body_info *fbi, >gcall *call = cs->call_stmt; >int n, arg_num = gimple_call_num_args (call); >bool useful_context = false; > - value_ra
Re: [PATCH] Implement ipa_vr hashing.
Hi, On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote: > Implement hashing for ipa_vr. When all is said and done, all these > patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by > the similar 7% increase in this area last week. So we get type agnostic > ranges with "infinite" range precision close to free. > > There is no change in overall compilation. > > OK? > One small request > gcc/ChangeLog: > > * ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Adjust for use with > ipa_vr instead of value_range. > (gt_pch_nx): Same. > (gt_ggc_mx): Same. > (ipa_get_value_range): Same. > * value-range.cc (gt_pch_nx): Move to ipa-prop.cc and adjust for > ipa_vr. > (gt_ggc_mx): Same. > --- > gcc/ipa-prop.cc| 76 +++--- > gcc/value-range.cc | 15 - > 2 files changed, 45 insertions(+), 46 deletions(-) > > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index c46a89f1b49..6383bc11e0a 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -109,53 +109,53 @@ struct ipa_bit_ggc_hash_traits : public > ggc_cache_remove > /* Hash table for avoid repeated allocations of equal ipa_bits. */ > static GTY ((cache)) hash_table > *ipa_bits_hash_table; > > -/* Traits for a hash table for reusing value_ranges used for IPA. Note that > - the equiv bitmap is not hashed and is expected to be NULL. */ > +/* Traits for a hash table for reusing ranges. */ > > -struct ipa_vr_ggc_hash_traits : public ggc_cache_remove > +struct ipa_vr_ggc_hash_traits : public ggc_cache_remove > { > - typedef value_range *value_type; > - typedef value_range *compare_type; > + typedef ipa_vr *value_type; > + typedef const vrange *compare_type; >static hashval_t > - hash (const value_range *p) > + hash (const ipa_vr *p) > { > - tree min, max; > - value_range_kind kind = get_legacy_range (*p, min, max); > - inchash::hash hstate (kind); > - inchash::add_expr (min, hstate); > - inchash::add_expr (max, hstate); > + // This never get called, except in the verification code, as > + // ipa_get_value_range() calculates the hash itself. This > + // function is mostly here for completness' sake. > + Value_Range vr; > + p->get_vrange (vr); > + inchash::hash hstate; > + add_vrange (vr, hstate); >return hstate.end (); > } >static bool > - equal (const value_range *a, const value_range *b) > + equal (const ipa_vr *a, const vrange *b) > { > - return (types_compatible_p (a->type (), b->type ()) > - && *a == *b); > + return a->equal_p (*b); > } >static const bool empty_zero_p = true; >static void > - mark_empty (value_range *&p) > + mark_empty (ipa_vr *&p) > { >p = NULL; > } >static bool > - is_empty (const value_range *p) > + is_empty (const ipa_vr *p) > { >return p == NULL; > } >static bool > - is_deleted (const value_range *p) > + is_deleted (const ipa_vr *p) > { > - return p == reinterpret_cast (1); > + return p == reinterpret_cast (1); > } >static void > - mark_deleted (value_range *&p) > + mark_deleted (ipa_vr *&p) > { > - p = reinterpret_cast (1); > + p = reinterpret_cast (1); > } > }; > > -/* Hash table for avoid repeated allocations of equal value_ranges. */ > +/* Hash table for avoid repeated allocations of equal ranges. */ > static GTY ((cache)) hash_table *ipa_vr_hash_table; > > /* Holders of ipa cgraph hooks: */ > @@ -265,6 +265,22 @@ ipa_vr::dump (FILE *out) const > fprintf (out, "NO RANGE"); > } > > +// ?? These stubs are because we use an ipa_vr in a hash_traits and > +// hash-traits.h defines an extern of gt_ggc_mx (T &) instead of > +// picking up the gt_ggc_mx (T *) version. If you mean FIXME or TODO, please replace the "??" string with one of those. Otherwise please just remove it or specify what you mean in some clearer way. OK with that change. Thanks, Martin > +void > +gt_pch_nx (ipa_vr *&x) > +{ > + return gt_pch_nx ((ipa_vr *) x); > +} > + > +void > +gt_ggc_mx (ipa_vr *&x) > +{ > + return gt_ggc_mx ((ipa_vr *) x); > +} > + > + [...]
Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.
Hi, On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote: > This patch converts the ipa_jump_func code to use the type agnostic > ipa_vr suitable for GC instead of value_range which is integer specific. > > I've disabled the range cacheing to simplify the patch for review, but > it is handled in the next patch in the series. > > OK? > > gcc/ChangeLog: > > * ipa-cp.cc (ipa_vr_operation_and_type_effects): New. > * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr. > (ipa_set_jfunc_vr): Take a range. > (ipa_compute_jump_functions_for_edge): Pass range to > ipa_set_jfunc_vr. > (ipa_write_jump_function): Call streamer write helper. > (ipa_read_jump_function): Call streamer read helper. > * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr. OK, thanks and sorry for the waiting, I've been unexpectedly traveling last week. Martin > --- > gcc/ipa-cp.cc | 15 +++ > gcc/ipa-prop.cc | 70 ++--- > gcc/ipa-prop.h | 5 +++- > 3 files changed, 44 insertions(+), 46 deletions(-) > [...]
[PATCH] ipa-sra: Disable candidates with no known callers (PR 110276)
Hi, In IPA-SRA we use can_be_local_p () predicate rather than just plain local call graph flag in order to figure out whether the node is a part of an external API that we cannot change. Although there are cases where this can allow more transformations, it also means we can analyze functions which have no callers at all, which is pointless. Moreover, it makes an assert of hint propagation trigger, which checks that we have looked at callers before processing hints that come from them. This has been reported as PR 110276. This patch simply adds a check that a node has at least one caller into the early checks and makes the node a non-candidate for any transformation if it does not. Bootstrapped and tested on x86_64-linux, LTO bootstrap is still underway. OK if it passes too? Thanks, Martin gcc/ChangeLog: 2023-06-16 Martin Jambor PR ipa/110276 * ipa-sra.cc (struct caller_issues): New field there_is_one. (check_for_caller_issues): Set it. (check_all_callers_for_issues): Check it. gcc/testsuite/ChangeLog: 2023-06-16 Martin Jambor PR ipa/110276 * gcc.dg/ipa/pr110276.c: New test. --- gcc/ipa-sra.cc | 11 +++ gcc/testsuite/gcc.dg/ipa/pr110276.c | 15 +++ 2 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110276.c diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc index 3fee8fb22ce..21d281a9756 100644 --- a/gcc/ipa-sra.cc +++ b/gcc/ipa-sra.cc @@ -3074,6 +3074,8 @@ struct caller_issues cgraph_node *candidate; /* There is a thunk among callers. */ bool thunk; + /* Set if there is at least one caller that is OK. */ + bool there_is_one; /* Call site with no available information. */ bool unknown_callsite; /* Call from outside the candidate's comdat group. */ @@ -3116,6 +3118,8 @@ check_for_caller_issues (struct cgraph_node *node, void *data) if (csum->m_bit_aligned_arg) issues->bit_aligned_aggregate_argument = true; + + issues->there_is_one = true; } return false; } @@ -3170,6 +3174,13 @@ check_all_callers_for_issues (cgraph_node *node) for (unsigned i = 0; i < param_count; i++) (*ifs->m_parameters)[i].split_candidate = false; } + if (!issues.there_is_one) +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "There is no call to %s that we can modify. " +"Disabling all modifications.\n", node->dump_name ()); + return true; +} return false; } diff --git a/gcc/testsuite/gcc.dg/ipa/pr110276.c b/gcc/testsuite/gcc.dg/ipa/pr110276.c new file mode 100644 index 000..5a1e2f3fb1c --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr110276.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef long (*EFI_PCI_IO_PROTOCOL_CONFIG)(); +typedef struct { + EFI_PCI_IO_PROTOCOL_CONFIG Read; +} EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS; +typedef struct { + EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS Pci; +} EFI_PCI_IO_PROTOCOL; +int init_regs_0; +static void __attribute__((constructor)) init(EFI_PCI_IO_PROTOCOL *pci_io) { + if (init_regs_0) +pci_io->Pci.Read(); +} -- 2.40.1
[PATCH] Regenerate some autotools generated files (Was: Re: [PATCH v3] configure: Implement --enable-host-pie)
On Fri, Jun 16 2023, Marek Polacek wrote: > On Fri, Jun 16, 2023 at 12:26:23PM +0200, Martin Jambor wrote: >> Hello, >> >> On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote: >> > On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote: >> >> >> >> >> >> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote: >> >> > Ping. Anyone have any further comments? >> >> Given this was approved before, but got reverted due to issues (which have >> >> since been addressed) -- I think you might as well go forward and sooner >> >> rather than later so that we can catch fallout earlier. >> > >> > Thanks, pushed now, after rebasing, adjusting the patch for >> > r14-1385, and testing with and without --enable-host-pie on >> > both Debian and Fedora. >> > >> > If something comes up and I can't fix it quickly enough, I'll >> > have to revert the patch. We'll see. >> > >> >> The script that regularly checks that the checked-in autotools-generated >> files are in sync now complain about the following diff. Unless someone >> stops me because I overlooked something or for some other reason, I will >> commit it later on as obvious. > > Please, go ahead. > >> I wonder where the "line" differences come from, perhaps you added a >> comment after running autoconf/automake/...? The zlib/Makefile.in hunks > > Arg, I think I must've messed up the #lines when rebasing though I don't > know what went wrong with zlib/Makefile.in. But I don't think the latter > will actually make any difference. > >> like something we should have, though, even if I did not check whether >> it makes any difference in practice. And I want the checking script to >> shut up too ;-) > > Thanks and sorry. > No worries, I have committed the following. Thanks and have a nice weekend, Martin As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621976.html this should put the autotools generated files in sync to what they were generated from (and make an automated checker happy). Tested by bootstrapping on top of only a few revisions ago. zlib/ChangeLog: 2023-06-16 Martin Jambor * Makefile.in: Regenerate. * configure: Likewise. gcc/ChangeLog: 2023-06-16 Martin Jambor * configure: Regenerate. --- gcc/configure| 4 ++-- zlib/Makefile.in | 2 ++ zlib/configure | 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/configure b/gcc/configure index a4563a9cade..f7b4b283ca2 100755 --- a/gcc/configure +++ b/gcc/configure @@ -19847,7 +19847,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19848 "configure" +#line 19850 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -19953,7 +19953,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19954 "configure" +#line 19956 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/zlib/Makefile.in b/zlib/Makefile.in index 3f5102d1b87..80fe3b69116 100644 --- a/zlib/Makefile.in +++ b/zlib/Makefile.in @@ -353,6 +353,8 @@ datadir = @datadir@ datarootdir = @datarootdir@ docdir = @docdir@ dvidir = @dvidir@ +enable_host_pie = @enable_host_pie@ +enable_host_shared = @enable_host_shared@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ diff --git a/zlib/configure b/zlib/configure index 77be6c284e3..9308866a636 100755 --- a/zlib/configure +++ b/zlib/configure @@ -10763,7 +10763,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 10778 "configure" +#line 10766 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -10869,7 +10869,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 10884 "configure" +#line 10872 "configure" #include "confdefs.h" #if HAVE_DLFCN_H -- 2.40.1
Re: [PATCH v3] configure: Implement --enable-host-pie
Hello, On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote: > On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote: >> >> >> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote: >> > Ping. Anyone have any further comments? >> Given this was approved before, but got reverted due to issues (which have >> since been addressed) -- I think you might as well go forward and sooner >> rather than later so that we can catch fallout earlier. > > Thanks, pushed now, after rebasing, adjusting the patch for > r14-1385, and testing with and without --enable-host-pie on > both Debian and Fedora. > > If something comes up and I can't fix it quickly enough, I'll > have to revert the patch. We'll see. > The script that regularly checks that the checked-in autotools-generated files are in sync now complain about the following diff. Unless someone stops me because I overlooked something or for some other reason, I will commit it later on as obvious. I wonder where the "line" differences come from, perhaps you added a comment after running autoconf/automake/...? The zlib/Makefile.in hunks like something we should have, though, even if I did not check whether it makes any difference in practice. And I want the checking script to shut up too ;-) Thanks, Martin diff --git a/gcc/configure b/gcc/configure index a4563a9cade..f7b4b283ca2 100755 --- a/gcc/configure +++ b/gcc/configure @@ -19847,7 +19847,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19848 "configure" +#line 19850 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -19953,7 +19953,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19954 "configure" +#line 19956 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/zlib/Makefile.in b/zlib/Makefile.in index 3f5102d1b87..80fe3b69116 100644 --- a/zlib/Makefile.in +++ b/zlib/Makefile.in @@ -353,6 +353,8 @@ datadir = @datadir@ datarootdir = @datarootdir@ docdir = @docdir@ dvidir = @dvidir@ +enable_host_pie = @enable_host_pie@ +enable_host_shared = @enable_host_shared@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ diff --git a/zlib/configure b/zlib/configure index 77be6c284e3..9308866a636 100755 --- a/zlib/configure +++ b/zlib/configure @@ -10763,7 +10763,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 10778 "configure" +#line 10766 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -10869,7 +10869,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 10884 "configure" +#line 10872 "configure" #include "confdefs.h" #if HAVE_DLFCN_H
Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)
Ping. Thanks, Martin On Fri, May 12 2023, Martin Jambor wrote: > Hi, > > PR 108007 is another manifestation where we rely on DCE to clean-up > after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA > can leave behind statements which are fed uninitialized values and > trap, even though their results are themselves never used. > > I have already fixed this for unused parameters in callees, this bug > shows that almost the same thing can happen for removed returns, on > the side of callers. This means that the issue has to be fixed > elsewhere, in call redirection. This patch adds a function which > recursivewly looks for uses of operations fed specific SSA names and > removes them all. > > That would have been easy if it wasn't for debug statements during > tree-inline (from which call redirection is also invoked). Debug > statements are decoupled from the rest at this point and iterating > over uses of SSAs does not bring them up. During tree-inline they are > handled especially at the end, I assume in order to make sure that > relative ordering of UIDs are the same with and without debug info. > > This means that during tree-inline we need to make a hash of killed > SSAs, that we already have in copy_body_data, available to the > function making the purging. So the patch duly does also that, making > the interface slightly ugly. > > Bootstrapped and tested on x86_64-linux. OK for master? (I am not sure > the problem is grave enough to warrant backporting to release branches > but can do that as well if people think I should.) > > Thanks, > > Martin > > > gcc/ChangeLog: > > 2023-05-11 Martin Jambor > > PR ipa/108007 > * cgraph.h (cgraph_edge): Add a parameter to > redirect_call_stmt_to_callee. > * ipa-param-manipulation.h (ipa_param_adjustments): Added a > parameter to modify_call. > * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New > parameter killed_ssas, pass it to padjs->modify_call. > * ipa-param-manipulation.cc (purge_transitive_uses): New function. > (ipa_param_adjustments::modify_call): New parameter killed_ssas. > Instead of substitutin uses, invoke purge_transitive_uses. If > hash of killed SSAs has not been provided, create a temporary one > and release SSAs that have been added to it. > * tree-inline.cc (redirect_all_calls): Create > id->killed_new_ssa_names earlier, pass it to edge redirection, > adjust a comment. > (copy_body): Release SSAs in id->killed_new_ssa_names. > > gcc/testsuite/ChangeLog: > > 2023-05-11 Martin Jambor > > PR ipa/108007 > * gcc.dg/ipa/pr108007.c: New test. > --- > gcc/cgraph.cc | 10 +++- > gcc/cgraph.h| 9 ++- > gcc/ipa-param-manipulation.cc | 85 + > gcc/ipa-param-manipulation.h| 3 +- > gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++ > gcc/tree-inline.cc | 28 ++ > 6 files changed, 129 insertions(+), 38 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index e8f9bec8227..5e923bf0557 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n) > speculative indirect call, remove "speculative" of the indirect call and > also redirect stmt to it's final direct target. > > + When called from within tree-inline, KILLED_SSAs has to contain the > pointer > + to killed_new_ssa_names within the copy_body_data structure and SSAs > + discovered to be useless (if LHS is removed) will be added to it, > otherwise > + it needs to be NULL. > + > It is up to caller to iteratively transform each "speculative" > direct call as appropriate. */ > > gimple * > -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e) > +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e, > +hash_set *killed_ssas) > { >tree decl = gimple_call_fndecl (e->call_stmt); >gcall *new_stmt; > @@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge > *e) > remove_stmt_from_eh_lp (e->call_stmt); > >tree old_fntype = gimple_call_fntype (e->call_stmt); > - new_stmt = padjs->modify_call (e, false); > + new_stmt = padjs->modify_call (e, false, killed_ssas); >cgraph_node *origin = e->callee; >while (origin->clone_of) > origin = origin->clone_of; > diff --git a/gcc/cgraph.h b/gcc/cgra
Re: [PATCH] Convert ipcp_vr_lattice to type agnostic framework.
Hi, thanks for dealing with my requests. On Wed, Jun 07 2023, Aldy Hernandez wrote: > On 5/26/23 18:17, Martin Jambor wrote: >> Hello, >> >> On Mon, May 22 2023, Aldy Hernandez wrote: >>> I've adjusted the patch with some minor cleanups that came up when I >>> implemented the rest of the IPA revamp. >>> >>> Rested. OK? >>> >>> On Wed, May 17, 2023 at 4:31 PM Aldy Hernandez wrote: >>>> >>>> This converts the lattice to store ranges in Value_Range instead of >>>> value_range (*) to make it type agnostic, and adjust all users >>>> accordingly. >>>> >>>> I think it is a good example on converting from static ranges to more >>>> general, type agnostic ones. >>>> >>>> I've been careful to make sure Value_Range never ends up on GC, since >>>> it contains an int_range_max and can expand on-demand onto the heap. >>>> Longer term storage for ranges should be done with vrange_storage, as >>>> per the previous patch ("Provide an API for ipa_vr"). >>>> >>>> (*) I do know the Value_Range naming versus value_range is quite >>>> annoying, but it was a judgement call last release for the eventual >>>> migration to having "value_range" be a type agnostic range object. We >>>> will ultimately rename Value_Range to value_range. [...] >>>> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc >>>> index d4b9d4ac27e..bd5b1da17b2 100644 >>>> --- a/gcc/ipa-cp.cc >>>> +++ b/gcc/ipa-cp.cc >>>> @@ -343,20 +343,29 @@ private: >>>> class ipcp_vr_lattice >>>> { >>>> public: >>>> - value_range m_vr; >>>> + Value_Range m_vr; >>>> >>>> inline bool bottom_p () const; >>>> inline bool top_p () const; >>>> - inline bool set_to_bottom (); >>>> - bool meet_with (const value_range *p_vr); >>>> + inline bool set_to_bottom (tree type); >> >> Requiring a type when setting a lattice to bottom makes for a weird >> interface, can't we set the underlying Value_Range to whatever... > >>>> + bool meet_with (const vrange &p_vr); >>>> bool meet_with (const ipcp_vr_lattice &other); >>>> - void init () { gcc_assert (m_vr.undefined_p ()); } >>>> + void init (tree type); >>>> void print (FILE * f); >>>> >>>> private: >>>> - bool meet_with_1 (const value_range *other_vr); >>>> + bool meet_with_1 (const vrange &other_vr); >>>> }; >>>> >>>> +inline void >>>> +ipcp_vr_lattice::init (tree type) >>>> +{ >>>> + if (type) >>>> +m_vr.set_type (type); >>>> + >>>> + // Otherwise m_vr will default to unsupported_range. >> >> ...this does? >> >> All users of the lattice check it for not being bottom first, so it >> should be safe. >> >> If it is not possible for some reason, then I guess we should add a bool >> flag to ipcp_vr_lattice instead, rather than looking up types of >> unusable lattices. ipcp_vr_lattices don't live for long. > > The type was my least favorite part of this work. And yes, your > suggestion would work. I have tweaked the patch to force a VARYING for > an unsupported range which seems to do the trick. It looks much > cleaner. Thanks. This version is much better indeed. [...] >>>> @@ -1912,29 +1917,33 @@ ipa_vr_operation_and_type_effects (value_range >>>> *dst_vr, >>>> return false; >>>> >>>> range_op_handler handler (operation, dst_type); >>>> - return (handler >>>> - && handler.fold_range (*dst_vr, dst_type, >>>> -*src_vr, value_range (dst_type)) >>>> - && !dst_vr->varying_p () >>>> - && !dst_vr->undefined_p ()); >>>> + if (!handler) >>>> +return false; >>>> + >>>> + Value_Range varying (dst_type); >>>> + varying.set_varying (dst_type); >>>> + >>>> + return (handler.fold_range (dst_vr, dst_type, src_vr, varying) >>>> + && !dst_vr.varying_p () >>>> + && !dst_vr.undefined_p ()); >>>> } >>>> >>>> /* Determine value_range of JFUNC given that INFO describes the caller