Re: [PATCH] alias: Optimise call_may_clobber_ref_p
> > Yeah, the fast summary array lookup itself seems fine. What slowed > this down for me was instead: > > /* A single function body may be represented by multiple symbols with > different visibility. For example, if FUNC is an interposable alias, > we don't want to return anything, even if we have summary for the target > function. */ > enum availability avail; > func = func->function_or_virtual_thunk_symbol >(, current_function_decl ? > cgraph_node::get (current_function_decl) : NULL); > if (avail <= AVAIL_INTERPOSABLE) > return NULL; > > which goes through get_availability (via ultimate_alias_target). > The most common path for the optabs.ii testcase is: > > /* Inline functions are safe to be analyzed even if their symbol can > be overwritten at runtime. It is not meaningful to enforce any sane > behavior on replacing inline function by different body. */ > else if (DECL_DECLARED_INLINE_P (decl)) > avail = AVAIL_AVAILABLE; > > But to get there we had to go through: > > else if (ifunc_resolver > || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl))) > avail = AVAIL_INTERPOSABLE; Hmm, that is kind of hack (I do not remember doing this). The visibility checks are frequent in pretty much all IPA propagation code, so I was always meaning to keep it quite cheap. Pehraps simply adding noipa flag into cgraph_node itself? I was also thinking about simply storing visibility to cgraph node. The catch is that it is not completely invaraint (i.e. if you are seeing access from a comdat group to other symbol in same comdat group it is not interposable, while code outside of the comdat group may see it is as interposable). I am not even sure how important is to handle that case except that I did, when I originally implemented the logic. > > Last night I tried caching the noipa attribute, like we do for ifunc. > That did improve compile time by 0.5% (relative to with the patch > applied rather than without it). It also made the cost of the lookup > part, up to this line: > > > 2988 : 240234208 : if (summary) > > negligible even with the original order of the checks. So I think > with that change, the lookup becomes cheap, like you say. Thanks for noticing the attribut lookup. I guess I should be more careful about having these in hot parts of IPA code (such as in inliner). I will do some coveraging to see how bad it is in practice. > > I don't think adding the caching is safe at this point in stage 3 > though. Attributes are added by direct manipulation of the tree data > structure and there are multiple places that in principle could > (directly or indirectly) add new noipa attributes after the decl > has been entered into the symbol table. No existing tests seemed > to expose that, but it was possible to write a new test that did. Indeed one needs to be bit careful and not set it while frontend is still on its (somewhat messy job) of changing trees. However we only need it for functions with are definitions and analysyed and we could easily set it as part of the analyse_function path. I will cook up a patch. Also I disucssed the PTA issues with Richi and I think we are in agreement that we want to simply fixup the ??? comment and make the PTA right. It is also on my todo list. > > > 2989 :: { > > 2990 : 19732392 : if > > (!modref_may_conflict (call, summary->stores, ref, tbaa_p) > > > > And we get here in 8% of invocations > > Once the lookup is neglible as above, so that the cost is purely on this > test, I still see a ~2% slowdown relative to the “PT first” ordering. > I can look into why in more detail, if that's unexpected, It seemed > from a quick look to be inner ao_ref stuff. > > Like you said at the top of the quote above, the number of loop > iterations per query was low. By far the most common in the optabs.ii > testcase was: > > 1 outer > 2 middle > 2 inner Yep. Most common case where we end up with bigger load/store trees are consturctors that store various fields of the structure. So reversing PTA and modref query still makes sense to me. Honza > > Thanks, > Richard
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
Thanks for the info. Jan Hubicka writes: >> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener >> wrote: >> > >> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford >> > wrote: >> > > >> > > Richard Biener writes: >> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches >> > > > wrote: >> > > >> >> > > >> When compiling an optabs.ii at -O2 with a release-checking build, >> > > >> we spent a lot of time in call_may_clobber_ref_p. One of the >> > > >> first things the function tries is the get_modref_function_summary >> > > >> approach, but that also seems to be the most expensive check. > > In general it should not be that expensive since the trees are generally > small (despite the 3 nested loops making them look dangerously big) and > most of time we resort to simple pointer/array lookups. > > From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html > (which are quite useful when trying to understand typical behaviour of > the alias oracle) > > In call_may_clobber_ref_p we get > > 2980 : 256141790 : callee = gimple_call_fndecl (call); > 2981 :: > 2982 : 256141790 : if (callee != NULL_TREE && > !ref->volatile_p) > 2983 :: { > 2984 : 240446150 : struct cgraph_node *node = > cgraph_node::get (callee); > this is direct pointer from node > 2985 : 240446150 : if (node) > 2986 :: { > 2987 : 240234208 : modref_summary *summary = > get_modref_function_summary (node); > This is fast summary so it is array lookup Yeah, the fast summary array lookup itself seems fine. What slowed this down for me was instead: /* A single function body may be represented by multiple symbols with different visibility. For example, if FUNC is an interposable alias, we don't want to return anything, even if we have summary for the target function. */ enum availability avail; func = func->function_or_virtual_thunk_symbol (, current_function_decl ? cgraph_node::get (current_function_decl) : NULL); if (avail <= AVAIL_INTERPOSABLE) return NULL; which goes through get_availability (via ultimate_alias_target). The most common path for the optabs.ii testcase is: /* Inline functions are safe to be analyzed even if their symbol can be overwritten at runtime. It is not meaningful to enforce any sane behavior on replacing inline function by different body. */ else if (DECL_DECLARED_INLINE_P (decl)) avail = AVAIL_AVAILABLE; But to get there we had to go through: else if (ifunc_resolver || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl))) avail = AVAIL_INTERPOSABLE; and most functions in this test have nonnull attributes. (I can look into why if that's unexpected.) Last night I tried caching the noipa attribute, like we do for ifunc. That did improve compile time by 0.5% (relative to with the patch applied rather than without it). It also made the cost of the lookup part, up to this line: > 2988 : 240234208 : if (summary) negligible even with the original order of the checks. So I think with that change, the lookup becomes cheap, like you say. I don't think adding the caching is safe at this point in stage 3 though. Attributes are added by direct manipulation of the tree data structure and there are multiple places that in principle could (directly or indirectly) add new noipa attributes after the decl has been entered into the symbol table. No existing tests seemed to expose that, but it was possible to write a new test that did. > 2989 :: { > 2990 : 19732392 : if > (!modref_may_conflict (call, summary->stores, ref, tbaa_p) > > And we get here in 8% of invocations Once the lookup is neglible as above, so that the cost is purely on this test, I still see a ~2% slowdown relative to the “PT first” ordering. I can look into why in more detail, if that's unexpected, It seemed from a quick look to be inner ao_ref stuff. Like you said at the top of the quote above, the number of loop iterations per query was low. By far the most common in the optabs.ii testcase was: 1 outer 2 middle 2 inner Thanks, Richard
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
> > Notice the ??? comment. The code does not set clobbers here because it > > assumes that tree-ssa-alias will do the right thing. > > So one may make builtins handling first, PTA next and only if both say > > "may alias" continue. Other option is to extend the code here to add > > propert clobber/use PTA to make things more regular. I would be happy > > to help with that. > > Or avoid setting the gimple_call_clobber/use sets for builtins handled in > the above way. As said, the info there is currently incorrect. Note that > originally the call use/clobber vi were only added in IPA mode. I suppose > you now add them unconditionally (I forgot about all the changes). I think even before my changes we did non-trivial call clobbers in some cases. I just generalized existing clode that was generating non-trivial constraints, for example, for readonly non-escape args. If you search in gcc10 tree, call clobber vi is set in handle_rhs_call and some builtins handling and later copied to the gimple_call_clobber in compute_points_to_sets which is the non-ipa path. Similarly there is logic for call use for pure/const etc. At some point of fnspec development I disabled the special case path in tree-ssa-alias.c and it also broke on some builtins testcases due to wrong PTA and this happened before the PTA changes. That is how I know about the ??? comment. With my changes I think it only started to hit more cases mostly because I added the global memory handling which makes the clobber/use sets more likely to disambiguate. I agree that we want to get rid of wrong uses/clobbers from the IL - it is slopy to rely that other oracle will prevail PTA's wrong answers. If we want to just keep clobber/uses set to "all" I guess we need some special way to communicate that given call is specially handled builtin (listing all builtins for third time there seems like to too cool option) or we want to initialize call_use_vi/call_clobber_vi in find_func_aliases_for_builtin_call and in that case maybe it is not that hard to initialize them to the right thing. Do we know what makes ref_clobbered_by_call slow on optabs? (We have snow so I would like to spend some time on cross country ski today and tomorrow, but can definitly profile that at thursday). Honza > > Richard. > > > > > Honza > > > > > > > > It looks like Honza added computation of these sets at some point, also > > > > maybe > > > > forgetting to set some bits here. > > > > > > > > I prefer to investigate / fix this before refactoring the uses in this > > > > way. Can > > > > you file a bugreport please? I can reproduce the memcpy FAIL with > > > > just moving check_fnspec down. > > > > > > So the issue seems that for > > > # .MEM_22 = VDEF <.MEM_19> > > > memcpy (, , 1024); > > > > > > determine_global_memory_access determines the stmt doesn't write > > > to global memory (well, yes!), but it fails to add the actual arguments > > > to the set of clobbered vars. > > > > > > Honza, can you please look into this? Seems to be caused > > > by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8 > > > > > > Richard. > > > > > > > Thanks, > > > > Richard. > > > > > > > > > Setting ASSERTS to 0 gives: > > > > > > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 execution test > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto > > > > > -fno-use-linker-plugin -flto-partition=none execution test > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto > > > > > -fuse-linker-plugin -fno-fat-lto-objects execution test > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -fomit-frame-pointer > > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution > > > > > test > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -g execution test > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -Os execution test > > > > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer > > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution > > > > > test > > > > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test > > > > > > > > > > # of expected passes23138 > > > > > # of unexpected failures8 > > > > > # of unsupported tests 24 > > > > > > > > > > (same for -m32). > > > > > > > > > > Originally I saw it as a bootstrap failure building stage 2 > > > > > build-side libcpp, due to a bogus uninitialised warning. > > > > > > > > > > But alongside the memset and free examples, there's: > > > > > > > > > > /* All the following functions do not return pointers, do not > > > > > modify the points-to sets of memory reachable from their > > > > > arguments and do not add to the ESCAPED solution. */ > > > > > case BUILT_IN_SINCOS: > > > > > case BUILT_IN_SINCOSF: > > > > > ... > > > > > return true; > > > > > > > > > > (always an empty set), whereas check_fnspec takes errno into account. > > > > > > > > > > > I wonder if, at this
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
On Mon, Dec 6, 2021 at 4:45 PM Jan Hubicka wrote: > > > On Mon, Dec 6, 2021 at 4:03 PM Richard Biener > > wrote: > > > > > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford > > > wrote: > > > > > > > > Richard Biener writes: > > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches > > > > > wrote: > > > > >> > > > > >> When compiling an optabs.ii at -O2 with a release-checking build, > > > > >> we spent a lot of time in call_may_clobber_ref_p. One of the > > > > >> first things the function tries is the get_modref_function_summary > > > > >> approach, but that also seems to be the most expensive check. > > In general it should not be that expensive since the trees are generally > small (despite the 3 nested loops making them look dangerously big) and > most of time we resort to simple pointer/array lookups. > > From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html > (which are quite useful when trying to understand typical behaviour of > the alias oracle) > > In call_may_clobber_ref_p we get > > 2980 : 256141790 : callee = gimple_call_fndecl (call); > 2981 :: > 2982 : 256141790 : if (callee != NULL_TREE && > !ref->volatile_p) > 2983 :: { > 2984 : 240446150 : struct cgraph_node *node = > cgraph_node::get (callee); > this is direct pointer from node > 2985 : 240446150 : if (node) > 2986 :: { > 2987 : 240234208 : modref_summary *summary = > get_modref_function_summary (node); > This is fast summary so it is array lookup > 2988 : 240234208 : if (summary) > 2989 :: { > 2990 : 19732392 : if > (!modref_may_conflict (call, summary->stores, ref, tbaa_p) > > And we get here in 8% of invocations > > And in modref_may_conflict we have: > > >21147836 : modref_may_conflict (const gcall *stmt, > 2552 :: modref_tree > *tt, ao_ref *ref, bool tbaa_p) > 2553 :: { > 2554 : 21147836 : alias_set_type base_set, ref_set; > 2555 :: > 2556 : 21147836 : if (tt->every_base) > 2557 :: return true; > 2558 :: > 2559 :3407876 : if (!dbg_cnt (ipa_mod_ref)) > 2560 :: return true; > 2561 :: > 2562 :3407876 : base_set = ao_ref_base_alias_set > (ref); > 2563 :: > 2564 :3407876 : ref_set = ao_ref_alias_set (ref); > All direct lookups in most cases > 2565 :: > 2566 :3407876 : int num_tests = 0, max_tests = > param_modref_max_tests; > 2567 : 14479077 : for (auto base_node : tt->bases) > 2568 :: { > 2569 :6171739 : if (tbaa_p && > flag_strict_aliasing) > At average there are 2 bases travelled by the loop. > 2570 :: { > 2571 :5253690 : if (num_tests >= max_tests) > 2572 :: return true; > 2573 :5253690 : alias_stats.modref_tests++; > 2574 :5253690 : if (!alias_sets_conflict_p > (base_set, base_node->base)) > alias set checks cheecks are also reasonably cheap and 50% of time avoids > walking rest of the tree > 2575 :2238398 : continue; > 2576 :3015292 : num_tests++; > 2577 :: } > 2578 :: > 2579 :3933341 : if (base_node->every_ref) > > We hit the loop only 0.15 times per invocation of the function > > 2580 :: return true; > 2581 :: > 2582 : 14943624 : for (auto ref_node : > base_node->refs) > 2583 :: { > 2584 :: /* Do not repeat same test > as before. */ > 2585 :4381325 : if ((ref_set != base_set || > base_node->base != ref_node->ref) > > and usually visit only bit more htan one refs > 2586 :2548127 : && tbaa_p && > flag_strict_aliasing) > 2587 :: { > 2588 :1840866 : if (num_tests >= > max_tests) > 2589
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener > wrote: > > > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford > > wrote: > > > > > > Richard Biener writes: > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches > > > > wrote: > > > >> > > > >> When compiling an optabs.ii at -O2 with a release-checking build, > > > >> we spent a lot of time in call_may_clobber_ref_p. One of the > > > >> first things the function tries is the get_modref_function_summary > > > >> approach, but that also seems to be the most expensive check. In general it should not be that expensive since the trees are generally small (despite the 3 nested loops making them look dangerously big) and most of time we resort to simple pointer/array lookups. >From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html (which are quite useful when trying to understand typical behaviour of the alias oracle) In call_may_clobber_ref_p we get 2980 : 256141790 : callee = gimple_call_fndecl (call); 2981 :: 2982 : 256141790 : if (callee != NULL_TREE && !ref->volatile_p) 2983 :: { 2984 : 240446150 : struct cgraph_node *node = cgraph_node::get (callee); this is direct pointer from node 2985 : 240446150 : if (node) 2986 :: { 2987 : 240234208 : modref_summary *summary = get_modref_function_summary (node); This is fast summary so it is array lookup 2988 : 240234208 : if (summary) 2989 :: { 2990 : 19732392 : if (!modref_may_conflict (call, summary->stores, ref, tbaa_p) And we get here in 8% of invocations And in modref_may_conflict we have: 21147836 : modref_may_conflict (const gcall *stmt, 2552 :: modref_tree *tt, ao_ref *ref, bool tbaa_p) 2553 :: { 2554 : 21147836 : alias_set_type base_set, ref_set; 2555 :: 2556 : 21147836 : if (tt->every_base) 2557 :: return true; 2558 :: 2559 :3407876 : if (!dbg_cnt (ipa_mod_ref)) 2560 :: return true; 2561 :: 2562 :3407876 : base_set = ao_ref_base_alias_set (ref); 2563 :: 2564 :3407876 : ref_set = ao_ref_alias_set (ref); All direct lookups in most cases 2565 :: 2566 :3407876 : int num_tests = 0, max_tests = param_modref_max_tests; 2567 : 14479077 : for (auto base_node : tt->bases) 2568 :: { 2569 :6171739 : if (tbaa_p && flag_strict_aliasing) At average there are 2 bases travelled by the loop. 2570 :: { 2571 :5253690 : if (num_tests >= max_tests) 2572 :: return true; 2573 :5253690 : alias_stats.modref_tests++; 2574 :5253690 : if (!alias_sets_conflict_p (base_set, base_node->base)) alias set checks cheecks are also reasonably cheap and 50% of time avoids walking rest of the tree 2575 :2238398 : continue; 2576 :3015292 : num_tests++; 2577 :: } 2578 :: 2579 :3933341 : if (base_node->every_ref) We hit the loop only 0.15 times per invocation of the function 2580 :: return true; 2581 :: 2582 : 14943624 : for (auto ref_node : base_node->refs) 2583 :: { 2584 :: /* Do not repeat same test as before. */ 2585 :4381325 : if ((ref_set != base_set || base_node->base != ref_node->ref) and usually visit only bit more htan one refs 2586 :2548127 : && tbaa_p && flag_strict_aliasing) 2587 :: { 2588 :1840866 : if (num_tests >= max_tests) 2589 :: return true; 2590 :1809594 : alias_stats.modref_tests++; 2591 :1809594 : if (!alias_sets_conflict_p (ref_set, ref_node->ref))
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
Richard Biener writes: > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches >> > wrote: >> >> >> >> When compiling an optabs.ii at -O2 with a release-checking build, >> >> we spent a lot of time in call_may_clobber_ref_p. One of the >> >> first things the function tries is the get_modref_function_summary >> >> approach, but that also seems to be the most expensive check. >> >> At least for the optabs.ii test, most of the things g_m_f_s could >> >> handle could also be handled by points-to analysis, which is much >> >> cheaper. >> >> >> >> This patch therefore rearranges the tests in approximate order >> >> of cheapness. One wart is that points-to analysis can't be >> >> used on its own for this purpose; it has to be used in >> >> conjunction with check_fnspec. This is because the argument >> >> information in the fnspec is not reflected in the points-to >> >> information, which instead contains overly optimistic (rather >> >> than conservatively pessimistic) information. >> >> >> >> Specifically, find_func_aliases_for_builtin_call short-circuits >> >> the handle_rhs_call/handle_call_arg logic and doesn't itself add >> >> any compensating information about the arguments. This means that: >> >> >> >> free (foo) >> >> >> >> does not have an points-to information for foo (it has an empty >> >> set instead). >> > >> > Huh? The gimple_call_use/clobber_sets should be conservative >> > and stand on their own. Can you share a testcase that breaks when >> > returning early if independent_pt? Does this problem also exist >> > on the branch? >> >> With the attached patch to disable get_modref_function_summary, >> do the PT query ahead of the fnspec query, and assert if the fnspec >> query would have found an alias, I get the following results for >> execute.exp on x86_64-linux-gnu (full log attached): >> >> # of expected passes44554 >> # of unexpected failures1710 >> # of unresolved testcases 855 >> # of unsupported tests 62 >> >> E.g. for execute/2703-1.c, the assertion trips for: >> >> 3786 if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p)) >> (gdb) call debug (def_stmt) >> __builtin_memset (p_3(D), 0, 28); >> (gdb) call debug (ref.base) >> *p_3(D) >> (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, >> ref.base) >> $3 = false > > huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for > decls. > In fact it should eventually ICE even ;) Oops, yes. That's what comes of trying to be fancy and doing things in a gdb session rather than printfs. :-) It is of course the: else if ((TREE_CODE (base) == MEM_REF || TREE_CODE (base) == TARGET_MEM_REF) && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME) { struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)); if (pi && !pt_solutions_intersect (gimple_call_clobber_set (call), >pt)) { path that fails the assert. > That said, nothing should > really iniitalize > the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time > or IPA PTA when enabled. pt_solutuon_reset sets ->anything to true so the > function should return true ... Yeah, it works OK in the reset state. > It looks like Honza added computation of these sets at some point, also maybe > forgetting to set some bits here. > > I prefer to investigate / fix this before refactoring the uses in this way. > Can > you file a bugreport please? I can reproduce the memcpy FAIL with > just moving check_fnspec down. OK, filed as PR103584 (for the list). Thanks, Richard > > Thanks, > Richard. > >> Setting ASSERTS to 0 gives: >> >> FAIL: gcc.c-torture/execute/memcpy-1.c -O2 execution test >> FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fno-use-linker-plugin >> -flto-partition=none execution test >> FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fuse-linker-plugin >> -fno-fat-lto-objects execution test >> FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution test >> FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -g execution test >> FAIL: gcc.c-torture/execute/memcpy-1.c -Os execution test >> FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution test >> FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test >> >> # of expected passes23138 >> # of unexpected failures8 >> # of unsupported tests 24 >> >> (same for -m32). >> >> Originally I saw it as a bootstrap failure building stage 2 >> build-side libcpp, due to a bogus uninitialised warning. >> >> But alongside the memset and free examples, there's: >> >> /* All the following functions do not return pointers, do not >> modify the
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
On Mon, Dec 6, 2021 at 4:03 PM Richard Biener wrote: > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford > wrote: > > > > Richard Biener writes: > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches > > > wrote: > > >> > > >> When compiling an optabs.ii at -O2 with a release-checking build, > > >> we spent a lot of time in call_may_clobber_ref_p. One of the > > >> first things the function tries is the get_modref_function_summary > > >> approach, but that also seems to be the most expensive check. > > >> At least for the optabs.ii test, most of the things g_m_f_s could > > >> handle could also be handled by points-to analysis, which is much > > >> cheaper. > > >> > > >> This patch therefore rearranges the tests in approximate order > > >> of cheapness. One wart is that points-to analysis can't be > > >> used on its own for this purpose; it has to be used in > > >> conjunction with check_fnspec. This is because the argument > > >> information in the fnspec is not reflected in the points-to > > >> information, which instead contains overly optimistic (rather > > >> than conservatively pessimistic) information. > > >> > > >> Specifically, find_func_aliases_for_builtin_call short-circuits > > >> the handle_rhs_call/handle_call_arg logic and doesn't itself add > > >> any compensating information about the arguments. This means that: > > >> > > >> free (foo) > > >> > > >> does not have an points-to information for foo (it has an empty > > >> set instead). > > > > > > Huh? The gimple_call_use/clobber_sets should be conservative > > > and stand on their own. Can you share a testcase that breaks when > > > returning early if independent_pt? Does this problem also exist > > > on the branch? > > > > With the attached patch to disable get_modref_function_summary, > > do the PT query ahead of the fnspec query, and assert if the fnspec > > query would have found an alias, I get the following results for > > execute.exp on x86_64-linux-gnu (full log attached): > > > > # of expected passes44554 > > # of unexpected failures1710 > > # of unresolved testcases 855 > > # of unsupported tests 62 > > > > E.g. for execute/2703-1.c, the assertion trips for: > > > > 3786 if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p)) > > (gdb) call debug (def_stmt) > > __builtin_memset (p_3(D), 0, 28); > > (gdb) call debug (ref.base) > > *p_3(D) > > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, > > ref.base) > > $3 = false > > huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for > decls. > In fact it should eventually ICE even ;) That said, nothing should > really iniitalize > the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time > or IPA PTA when enabled. pt_solutuon_reset sets ->anything to true so the > function should return true ... > > It looks like Honza added computation of these sets at some point, also maybe > forgetting to set some bits here. > > I prefer to investigate / fix this before refactoring the uses in this way. > Can > you file a bugreport please? I can reproduce the memcpy FAIL with > just moving check_fnspec down. So the issue seems that for # .MEM_22 = VDEF <.MEM_19> memcpy (, , 1024); determine_global_memory_access determines the stmt doesn't write to global memory (well, yes!), but it fails to add the actual arguments to the set of clobbered vars. Honza, can you please look into this? Seems to be caused by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8 Richard. > Thanks, > Richard. > > > Setting ASSERTS to 0 gives: > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 execution test > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fno-use-linker-plugin > > -flto-partition=none execution test > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects execution test > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -fomit-frame-pointer > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -g execution test > > FAIL: gcc.c-torture/execute/memcpy-1.c -Os execution test > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test > > > > # of expected passes23138 > > # of unexpected failures8 > > # of unsupported tests 24 > > > > (same for -m32). > > > > Originally I saw it as a bootstrap failure building stage 2 > > build-side libcpp, due to a bogus uninitialised warning. > > > > But alongside the memset and free examples, there's: > > > > /* All the following functions do not return pointers, do not > > modify the points-to sets of memory reachable from their > > arguments and do not add to the ESCAPED solution. */ > >
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford wrote: > > Richard Biener writes: > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches > > wrote: > >> > >> When compiling an optabs.ii at -O2 with a release-checking build, > >> we spent a lot of time in call_may_clobber_ref_p. One of the > >> first things the function tries is the get_modref_function_summary > >> approach, but that also seems to be the most expensive check. > >> At least for the optabs.ii test, most of the things g_m_f_s could > >> handle could also be handled by points-to analysis, which is much > >> cheaper. > >> > >> This patch therefore rearranges the tests in approximate order > >> of cheapness. One wart is that points-to analysis can't be > >> used on its own for this purpose; it has to be used in > >> conjunction with check_fnspec. This is because the argument > >> information in the fnspec is not reflected in the points-to > >> information, which instead contains overly optimistic (rather > >> than conservatively pessimistic) information. > >> > >> Specifically, find_func_aliases_for_builtin_call short-circuits > >> the handle_rhs_call/handle_call_arg logic and doesn't itself add > >> any compensating information about the arguments. This means that: > >> > >> free (foo) > >> > >> does not have an points-to information for foo (it has an empty > >> set instead). > > > > Huh? The gimple_call_use/clobber_sets should be conservative > > and stand on their own. Can you share a testcase that breaks when > > returning early if independent_pt? Does this problem also exist > > on the branch? > > With the attached patch to disable get_modref_function_summary, > do the PT query ahead of the fnspec query, and assert if the fnspec > query would have found an alias, I get the following results for > execute.exp on x86_64-linux-gnu (full log attached): > > # of expected passes44554 > # of unexpected failures1710 > # of unresolved testcases 855 > # of unsupported tests 62 > > E.g. for execute/2703-1.c, the assertion trips for: > > 3786 if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p)) > (gdb) call debug (def_stmt) > __builtin_memset (p_3(D), 0, 28); > (gdb) call debug (ref.base) > *p_3(D) > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, > ref.base) > $3 = false huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for decls. In fact it should eventually ICE even ;) That said, nothing should really iniitalize the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time or IPA PTA when enabled. pt_solutuon_reset sets ->anything to true so the function should return true ... It looks like Honza added computation of these sets at some point, also maybe forgetting to set some bits here. I prefer to investigate / fix this before refactoring the uses in this way. Can you file a bugreport please? I can reproduce the memcpy FAIL with just moving check_fnspec down. Thanks, Richard. > Setting ASSERTS to 0 gives: > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 execution test > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none execution test > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects execution test > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -g execution test > FAIL: gcc.c-torture/execute/memcpy-1.c -Os execution test > FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test > > # of expected passes23138 > # of unexpected failures8 > # of unsupported tests 24 > > (same for -m32). > > Originally I saw it as a bootstrap failure building stage 2 > build-side libcpp, due to a bogus uninitialised warning. > > But alongside the memset and free examples, there's: > > /* All the following functions do not return pointers, do not > modify the points-to sets of memory reachable from their > arguments and do not add to the ESCAPED solution. */ > case BUILT_IN_SINCOS: > case BUILT_IN_SINCOSF: > ... > return true; > > (always an empty set), whereas check_fnspec takes errno into account. > > > I wonder if, at this point, we should split the patch up to do the > > trivial move of the ao_ref_base dependent and volatile checks > > and leave this more complicated detail for a second patch? > > Yeah, that'd work, but in practice it was the PT part that was the > high-value part. IIRC the early base checks caught less than 1% of > cases (I can rerun tonight to get exact numbers). > > Thanks, > Richard >
Re: [PATCH] alias: Optimise call_may_clobber_ref_p
On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches wrote: > > When compiling an optabs.ii at -O2 with a release-checking build, > we spent a lot of time in call_may_clobber_ref_p. One of the > first things the function tries is the get_modref_function_summary > approach, but that also seems to be the most expensive check. > At least for the optabs.ii test, most of the things g_m_f_s could > handle could also be handled by points-to analysis, which is much > cheaper. > > This patch therefore rearranges the tests in approximate order > of cheapness. One wart is that points-to analysis can't be > used on its own for this purpose; it has to be used in > conjunction with check_fnspec. This is because the argument > information in the fnspec is not reflected in the points-to > information, which instead contains overly optimistic (rather > than conservatively pessimistic) information. > > Specifically, find_func_aliases_for_builtin_call short-circuits > the handle_rhs_call/handle_call_arg logic and doesn't itself add > any compensating information about the arguments. This means that: > > free (foo) > > does not have an points-to information for foo (it has an empty > set instead). Huh? The gimple_call_use/clobber_sets should be conservative and stand on their own. Can you share a testcase that breaks when returning early if independent_pt? Does this problem also exist on the branch? I wonder if, at this point, we should split the patch up to do the trivial move of the ao_ref_base dependent and volatile checks and leave this more complicated detail for a second patch? The first part is pre-approved, I'd like to understand the second part before considering it. Thanks, Richard. > > It would be good to fix that for compile-time reasons if nothing else. > Doing points-to analysis without check_fnspec gave a measureable > speed-up, since PTA resolved the vast majority of queries, and > (as expected) the vast majority of calls had no fnspec. > > This gave about a ~2% compile-time improvement for the test above. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Richard > > > gcc/ > * tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks > to the cheaper ones come first. > --- > gcc/tree-ssa-alias.c | 145 +-- > 1 file changed, 70 insertions(+), 75 deletions(-) > > diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c > index 88fd7821c5e..573766278bc 100644 > --- a/gcc/tree-ssa-alias.c > +++ b/gcc/tree-ssa-alias.c > @@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool > tbaa_p) > bool > call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) > { > - tree base; > - tree callee; > - >/* If the call is pure or const it cannot clobber anything. */ >if (gimple_call_flags (call) >& (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS)) > @@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, > bool tbaa_p) > break; >} > > - callee = gimple_call_fndecl (call); > + tree base = ao_ref_base (ref); > + if (base > + && (TREE_CODE (base) == SSA_NAME > + || CONSTANT_CLASS_P (base))) > +return false; > + > + /* A call that is not without side-effects might involve volatile > + accesses and thus conflicts with all other volatile accesses. */ > + if (ref->volatile_p) > +return true; > + > + bool independent_pt = false; > + if (base && DECL_P (base)) > +{ > + /* If the reference is based on a decl that is not aliased the call > +cannot possibly clobber it. */ > + if (!may_be_aliased (base) > + /* But local non-readonly statics can be modified through > +recursion or the call may implement a threading barrier which > +we must treat as may-def. */ > + && (TREE_READONLY (base) > + || !is_global_var (base))) > + return false; > > - if (callee != NULL_TREE && !ref->volatile_p) > + auto clobbers = gimple_call_clobber_set (call); > + independent_pt = !pt_solution_includes (clobbers, base); > +} > + else if (base > + && (TREE_CODE (base) == MEM_REF > + || TREE_CODE (base) == TARGET_MEM_REF) > + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME) > +{ > + tree addr = TREE_OPERAND (base, 0); > + > + /* If the reference is based on a pointer that points to memory that > +may not be written to then the call cannot possibly clobber it. */ > + if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr)) > + return false; > + > + /* Check if the base variable is call-clobbered. */ > + if (auto *pi = SSA_NAME_PTR_INFO (addr)) > + { > + auto clobbers = gimple_call_clobber_set (call); > + independent_pt = !pt_solutions_intersect (clobbers, >pt); > + } > +} > + > + /* Points-to information needs to be tested in
[PATCH] alias: Optimise call_may_clobber_ref_p
When compiling an optabs.ii at -O2 with a release-checking build, we spent a lot of time in call_may_clobber_ref_p. One of the first things the function tries is the get_modref_function_summary approach, but that also seems to be the most expensive check. At least for the optabs.ii test, most of the things g_m_f_s could handle could also be handled by points-to analysis, which is much cheaper. This patch therefore rearranges the tests in approximate order of cheapness. One wart is that points-to analysis can't be used on its own for this purpose; it has to be used in conjunction with check_fnspec. This is because the argument information in the fnspec is not reflected in the points-to information, which instead contains overly optimistic (rather than conservatively pessimistic) information. Specifically, find_func_aliases_for_builtin_call short-circuits the handle_rhs_call/handle_call_arg logic and doesn't itself add any compensating information about the arguments. This means that: free (foo) does not have an points-to information for foo (it has an empty set instead). It would be good to fix that for compile-time reasons if nothing else. Doing points-to analysis without check_fnspec gave a measureable speed-up, since PTA resolved the vast majority of queries, and (as expected) the vast majority of calls had no fnspec. This gave about a ~2% compile-time improvement for the test above. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ * tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks to the cheaper ones come first. --- gcc/tree-ssa-alias.c | 145 +-- 1 file changed, 70 insertions(+), 75 deletions(-) diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 88fd7821c5e..573766278bc 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool tbaa_p) bool call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) { - tree base; - tree callee; - /* If the call is pure or const it cannot clobber anything. */ if (gimple_call_flags (call) & (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS)) @@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) break; } - callee = gimple_call_fndecl (call); + tree base = ao_ref_base (ref); + if (base + && (TREE_CODE (base) == SSA_NAME + || CONSTANT_CLASS_P (base))) +return false; + + /* A call that is not without side-effects might involve volatile + accesses and thus conflicts with all other volatile accesses. */ + if (ref->volatile_p) +return true; + + bool independent_pt = false; + if (base && DECL_P (base)) +{ + /* If the reference is based on a decl that is not aliased the call +cannot possibly clobber it. */ + if (!may_be_aliased (base) + /* But local non-readonly statics can be modified through +recursion or the call may implement a threading barrier which +we must treat as may-def. */ + && (TREE_READONLY (base) + || !is_global_var (base))) + return false; - if (callee != NULL_TREE && !ref->volatile_p) + auto clobbers = gimple_call_clobber_set (call); + independent_pt = !pt_solution_includes (clobbers, base); +} + else if (base + && (TREE_CODE (base) == MEM_REF + || TREE_CODE (base) == TARGET_MEM_REF) + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME) +{ + tree addr = TREE_OPERAND (base, 0); + + /* If the reference is based on a pointer that points to memory that +may not be written to then the call cannot possibly clobber it. */ + if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr)) + return false; + + /* Check if the base variable is call-clobbered. */ + if (auto *pi = SSA_NAME_PTR_INFO (addr)) + { + auto clobbers = gimple_call_clobber_set (call); + independent_pt = !pt_solutions_intersect (clobbers, >pt); + } +} + + /* Points-to information needs to be tested in conjunction with + check_fnspec. The fnspec argument information for built-in + functions is excluded from the PT sets. */ + int res = check_fnspec (call, ref, true); + if (res >= 0) +return res; + + if (independent_pt) +return false; + + if (tree callee = gimple_call_fndecl (call)) { struct cgraph_node *node = cgraph_node::get (callee); if (node) @@ -3017,77 +3069,20 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) return false; } alias_stats.modref_clobber_may_alias++; - } - } -} - - base = ao_ref_base (ref); - if (!base) -return true; - - if (TREE_CODE (base) == SSA_NAME - || CONSTANT_CLASS_P (base)) -return false; - - /* A call that is not without