Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-07 Thread Jan Hubicka via Gcc-patches
> 
> Yeah, the fast summary array lookup itself seems fine.  What slowed
> this down for me was instead:
> 
>   /* A single function body may be represented by multiple symbols with
>  different visibility.  For example, if FUNC is an interposable alias,
>  we don't want to return anything, even if we have summary for the target
>  function.  */
>   enum availability avail;
>   func = func->function_or_virtual_thunk_symbol
>(, current_function_decl ?
> cgraph_node::get (current_function_decl) : NULL);
>   if (avail <= AVAIL_INTERPOSABLE)
> return NULL;
> 
> which goes through get_availability (via ultimate_alias_target).
> The most common path for the optabs.ii testcase is:
> 
>   /* Inline functions are safe to be analyzed even if their symbol can
>  be overwritten at runtime.  It is not meaningful to enforce any sane
>  behavior on replacing inline function by different body.  */
>   else if (DECL_DECLARED_INLINE_P (decl))
> avail = AVAIL_AVAILABLE;
> 
> But to get there we had to go through:
> 
>   else if (ifunc_resolver
>  || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl)))
> avail = AVAIL_INTERPOSABLE;

Hmm, that is kind of hack (I do not remember doing this).
The visibility checks are frequent in pretty much all IPA propagation
code, so I was always meaning to keep it quite cheap.
Pehraps simply adding noipa flag into cgraph_node itself?

I was also thinking about simply storing visibility to cgraph node. The
catch is that it is not completely invaraint (i.e. if you are seeing
access from a comdat group to other symbol in same comdat group it is
not interposable, while code outside of the comdat group may see it is
as interposable).  I am not even sure how important is to handle that
case except that I did, when I originally implemented the logic.
> 
> Last night I tried caching the noipa attribute, like we do for ifunc.
> That did improve compile time by 0.5% (relative to with the patch
> applied rather than without it).  It also made the cost of the lookup
> part, up to this line:
> 
> > 2988 :  240234208 :   if (summary)
> 
> negligible even with the original order of the checks.  So I think
> with that change, the lookup becomes cheap, like you say.

Thanks for noticing the attribut lookup.  I guess I should be more
careful about having these in hot parts of IPA code (such as in
inliner).  I will do some coveraging to see how bad it is in practice.
> 
> I don't think adding the caching is safe at this point in stage 3
> though.  Attributes are added by direct manipulation of the tree data
> structure and there are multiple places that in principle could
> (directly or indirectly) add new noipa attributes after the decl
> has been entered into the symbol table.  No existing tests seemed
> to expose that, but it was possible to write a new test that did.

Indeed one needs to be bit careful and not set it while frontend is
still on its (somewhat messy job) of changing trees.  However we only
need it for functions with are definitions and analysyed and we could
easily set it as part of the analyse_function path.  I will cook up a
patch.

Also I disucssed the PTA issues with Richi and I think we are in
agreement that we want to simply fixup the ??? comment and make the PTA
right.  It is also on my todo list.
> 
> > 2989 :: {
> > 2990 :   19732392 :   if 
> > (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
> >
> > And we get here in 8% of invocations
> 
> Once the lookup is neglible as above, so that the cost is purely on this
> test, I still see a ~2% slowdown relative to the “PT first” ordering.
> I can look into why in more detail, if that's unexpected,  It seemed
> from a quick look to be inner ao_ref stuff.
> 
> Like you said at the top of the quote above, the number of loop
> iterations per query was low.  By far the most common in the optabs.ii
> testcase was:
> 
>   1 outer
>   2 middle
>   2 inner

Yep.  Most common case where we end up with bigger load/store trees are
consturctors that store various fields of the structure.  So reversing
PTA and modref query still makes sense to me.

Honza
> 
> Thanks,
> Richard


Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-07 Thread Richard Sandiford via Gcc-patches
Thanks for the info.

Jan Hubicka  writes:
>> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
>>  wrote:
>> >
>> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>> >  wrote:
>> > >
>> > > Richard Biener  writes:
>> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
>> > > >  wrote:
>> > > >>
>> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
>> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
>> > > >> first things the function tries is the get_modref_function_summary
>> > > >> approach, but that also seems to be the most expensive check.
>
> In general it should not be that expensive since the trees are generally
> small (despite the 3 nested loops making them look dangerously big) and
> most of time we resort to simple pointer/array lookups.
>
> From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
> (which are quite useful when trying to understand typical behaviour of
> the alias oracle)
>
> In call_may_clobber_ref_p we get
>
> 2980 :  256141790 :   callee = gimple_call_fndecl (call);
> 2981 :: 
> 2982 :  256141790 :   if (callee != NULL_TREE && 
> !ref->volatile_p)
> 2983 :: {
> 2984 :  240446150 :   struct cgraph_node *node = 
> cgraph_node::get (callee);
> this is direct pointer from node
> 2985 :  240446150 :   if (node)
> 2986 :: {
> 2987 :  240234208 :   modref_summary *summary = 
> get_modref_function_summary (node);
> This is fast summary so it is array lookup

Yeah, the fast summary array lookup itself seems fine.  What slowed
this down for me was instead:

  /* A single function body may be represented by multiple symbols with
 different visibility.  For example, if FUNC is an interposable alias,
 we don't want to return anything, even if we have summary for the target
 function.  */
  enum availability avail;
  func = func->function_or_virtual_thunk_symbol
 (, current_function_decl ?
  cgraph_node::get (current_function_decl) : NULL);
  if (avail <= AVAIL_INTERPOSABLE)
return NULL;

which goes through get_availability (via ultimate_alias_target).
The most common path for the optabs.ii testcase is:

  /* Inline functions are safe to be analyzed even if their symbol can
 be overwritten at runtime.  It is not meaningful to enforce any sane
 behavior on replacing inline function by different body.  */
  else if (DECL_DECLARED_INLINE_P (decl))
avail = AVAIL_AVAILABLE;

But to get there we had to go through:

  else if (ifunc_resolver
   || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl)))
avail = AVAIL_INTERPOSABLE;

and most functions in this test have nonnull attributes.  (I can look
into why if that's unexpected.)

Last night I tried caching the noipa attribute, like we do for ifunc.
That did improve compile time by 0.5% (relative to with the patch
applied rather than without it).  It also made the cost of the lookup
part, up to this line:

> 2988 :  240234208 :   if (summary)

negligible even with the original order of the checks.  So I think
with that change, the lookup becomes cheap, like you say.

I don't think adding the caching is safe at this point in stage 3
though.  Attributes are added by direct manipulation of the tree data
structure and there are multiple places that in principle could
(directly or indirectly) add new noipa attributes after the decl
has been entered into the symbol table.  No existing tests seemed
to expose that, but it was possible to write a new test that did.

> 2989 :: {
> 2990 :   19732392 :   if 
> (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
>
> And we get here in 8% of invocations

Once the lookup is neglible as above, so that the cost is purely on this
test, I still see a ~2% slowdown relative to the “PT first” ordering.
I can look into why in more detail, if that's unexpected,  It seemed
from a quick look to be inner ao_ref stuff.

Like you said at the top of the quote above, the number of loop
iterations per query was low.  By far the most common in the optabs.ii
testcase was:

  1 outer
  2 middle
  2 inner

Thanks,
Richard


Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-07 Thread Jan Hubicka via Gcc-patches
> > Notice the ??? comment. The code does not set clobbers here because it
> > assumes that tree-ssa-alias will do the right thing.
> > So one may make builtins handling first, PTA next and only if both say
> > "may alias" continue.  Other option is to extend the code here to add
> > propert clobber/use PTA to make things more regular.  I would be happy
> > to help with that.
> 
> Or avoid setting the gimple_call_clobber/use sets for builtins handled in
> the above way.  As said, the info there is currently incorrect.  Note that
> originally the call use/clobber vi were only added in IPA mode.  I suppose
> you now add them unconditionally (I forgot about all the changes).

I think even before my changes we did non-trivial call clobbers in some
cases.  I just generalized existing clode that was generating
non-trivial constraints, for example, for readonly non-escape args. If
you search in gcc10 tree, call clobber vi is set in handle_rhs_call
and some builtins handling and later copied to the gimple_call_clobber
in compute_points_to_sets which is the non-ipa path.  Similarly there is
logic for call use for pure/const etc.

At some point of fnspec development I disabled the special case path in
tree-ssa-alias.c and it also broke on some builtins testcases due to
wrong PTA and this happened before the PTA changes.  That is how I know
about the ??? comment.

With my changes I think it only started to hit more cases mostly because
I added the global memory handling which makes the clobber/use sets more
likely to disambiguate.

I agree that we want to get rid of wrong uses/clobbers from the IL - it
is slopy to rely that other oracle will prevail PTA's wrong answers.
If we want to just keep clobber/uses set to "all" I guess we need some
special way to communicate that given call is specially handled builtin
(listing all builtins for third time there seems like to too cool
option) or we want to initialize call_use_vi/call_clobber_vi
in find_func_aliases_for_builtin_call and in that case maybe it is not
that hard to initialize them to the right thing.

Do we know what makes ref_clobbered_by_call slow on optabs?  (We have
snow so I would like to spend some time on cross country ski today and
tomorrow, but can definitly profile that at thursday).

Honza
> 
> Richard.
> 
> >
> > Honza
> > > >
> > > > It looks like Honza added computation of these sets at some point, also 
> > > > maybe
> > > > forgetting to set some bits here.
> > > >
> > > > I prefer to investigate / fix this before refactoring the uses in this 
> > > > way.  Can
> > > > you file a bugreport please?  I can reproduce the memcpy FAIL with
> > > > just moving check_fnspec down.
> > >
> > > So the issue seems that for
> > > # .MEM_22 = VDEF <.MEM_19>
> > > memcpy (, , 1024);
> > >
> > > determine_global_memory_access determines the stmt doesn't write
> > > to global memory (well, yes!), but it fails to add the actual arguments
> > > to the set of clobbered vars.
> > >
> > > Honza, can you please look into this?  Seems to be caused
> > > by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > Setting ASSERTS to 0 gives:
> > > > >
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto 
> > > > > -fno-use-linker-plugin -flto-partition=none  execution test
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto 
> > > > > -fuse-linker-plugin -fno-fat-lto-objects  execution test
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution 
> > > > > test
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> > > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> > > > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution 
> > > > > test
> > > > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
> > > > >
> > > > > # of expected passes23138
> > > > > # of unexpected failures8
> > > > > # of unsupported tests  24
> > > > >
> > > > > (same for -m32).
> > > > >
> > > > > Originally I saw it as a bootstrap failure building stage 2
> > > > > build-side libcpp, due to a bogus uninitialised warning.
> > > > >
> > > > > But alongside the memset and free examples, there's:
> > > > >
> > > > >   /* All the following functions do not return pointers, do not
> > > > >  modify the points-to sets of memory reachable from their
> > > > >  arguments and do not add to the ESCAPED solution.  */
> > > > >   case BUILT_IN_SINCOS:
> > > > >   case BUILT_IN_SINCOSF:
> > > > >   ...
> > > > > return true;
> > > > >
> > > > > (always an empty set), whereas check_fnspec takes errno into account.
> > > > >
> > > > > > I wonder if, at this 

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches
On Mon, Dec 6, 2021 at 4:45 PM Jan Hubicka  wrote:
>
> > On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
> >  wrote:
> > >
> > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener  writes:
> > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > > >  wrote:
> > > > >>
> > > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > > >> first things the function tries is the get_modref_function_summary
> > > > >> approach, but that also seems to be the most expensive check.
>
> In general it should not be that expensive since the trees are generally
> small (despite the 3 nested loops making them look dangerously big) and
> most of time we resort to simple pointer/array lookups.
>
> From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
> (which are quite useful when trying to understand typical behaviour of
> the alias oracle)
>
> In call_may_clobber_ref_p we get
>
> 2980 :  256141790 :   callee = gimple_call_fndecl (call);
> 2981 ::
> 2982 :  256141790 :   if (callee != NULL_TREE && 
> !ref->volatile_p)
> 2983 :: {
> 2984 :  240446150 :   struct cgraph_node *node = 
> cgraph_node::get (callee);
> this is direct pointer from node
> 2985 :  240446150 :   if (node)
> 2986 :: {
> 2987 :  240234208 :   modref_summary *summary = 
> get_modref_function_summary (node);
> This is fast summary so it is array lookup
> 2988 :  240234208 :   if (summary)
> 2989 :: {
> 2990 :   19732392 :   if 
> (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
>
> And we get here in 8% of invocations
>
> And in modref_may_conflict we have:
>
>
>21147836 : modref_may_conflict (const gcall *stmt,
> 2552 ::  modref_tree 
>  *tt, ao_ref *ref, bool tbaa_p)
> 2553 :: {
> 2554 :   21147836 :   alias_set_type base_set, ref_set;
> 2555 ::
> 2556 :   21147836 :   if (tt->every_base)
> 2557 :: return true;
> 2558 ::
> 2559 :3407876 :   if (!dbg_cnt (ipa_mod_ref))
> 2560 :: return true;
> 2561 ::
> 2562 :3407876 :   base_set = ao_ref_base_alias_set 
> (ref);
> 2563 ::
> 2564 :3407876 :   ref_set = ao_ref_alias_set (ref);
> All direct lookups in most cases
> 2565 ::
> 2566 :3407876 :   int num_tests = 0, max_tests = 
> param_modref_max_tests;
> 2567 :   14479077 :   for (auto base_node : tt->bases)
> 2568 :: {
> 2569 :6171739 :   if (tbaa_p && 
> flag_strict_aliasing)
> At average there are 2 bases travelled by the loop.
> 2570 :: {
> 2571 :5253690 :   if (num_tests >= max_tests)
> 2572 :: return true;
> 2573 :5253690 :   alias_stats.modref_tests++;
> 2574 :5253690 :   if (!alias_sets_conflict_p 
> (base_set, base_node->base))
> alias set checks cheecks are also reasonably cheap and 50% of time avoids 
> walking rest of the tree
> 2575 :2238398 : continue;
> 2576 :3015292 :   num_tests++;
> 2577 :: }
> 2578 ::
> 2579 :3933341 :   if (base_node->every_ref)
>
> We hit the loop only 0.15 times per invocation of the function
>
> 2580 :: return true;
> 2581 ::
> 2582 :   14943624 :   for (auto ref_node : 
> base_node->refs)
> 2583 :: {
> 2584 ::   /* Do not repeat same test 
> as before.  */
> 2585 :4381325 :   if ((ref_set != base_set || 
> base_node->base != ref_node->ref)
>
> and usually visit only bit more htan one refs
> 2586 :2548127 :   && tbaa_p && 
> flag_strict_aliasing)
> 2587 :: {
> 2588 :1840866 :   if (num_tests >= 
> max_tests)
> 2589   

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Jan Hubicka via Gcc-patches
> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
>  wrote:
> >
> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > >  wrote:
> > > >>
> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > >> first things the function tries is the get_modref_function_summary
> > > >> approach, but that also seems to be the most expensive check.

In general it should not be that expensive since the trees are generally
small (despite the 3 nested loops making them look dangerously big) and
most of time we resort to simple pointer/array lookups.

>From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
(which are quite useful when trying to understand typical behaviour of
the alias oracle)

In call_may_clobber_ref_p we get

2980 :  256141790 :   callee = gimple_call_fndecl (call);
2981 :: 
2982 :  256141790 :   if (callee != NULL_TREE && 
!ref->volatile_p)
2983 :: {
2984 :  240446150 :   struct cgraph_node *node = 
cgraph_node::get (callee);
this is direct pointer from node
2985 :  240446150 :   if (node)
2986 :: {
2987 :  240234208 :   modref_summary *summary = 
get_modref_function_summary (node);
This is fast summary so it is array lookup
2988 :  240234208 :   if (summary)
2989 :: {
2990 :   19732392 :   if (!modref_may_conflict 
(call, summary->stores, ref, tbaa_p)

And we get here in 8% of invocations

And in modref_may_conflict we have:


   21147836 : modref_may_conflict (const gcall *stmt,
2552 ::  modref_tree 
 *tt, ao_ref *ref, bool tbaa_p)
2553 :: {
2554 :   21147836 :   alias_set_type base_set, ref_set;
2555 :: 
2556 :   21147836 :   if (tt->every_base)
2557 :: return true;
2558 :: 
2559 :3407876 :   if (!dbg_cnt (ipa_mod_ref))
2560 :: return true;
2561 :: 
2562 :3407876 :   base_set = ao_ref_base_alias_set 
(ref);
2563 :: 
2564 :3407876 :   ref_set = ao_ref_alias_set (ref);
All direct lookups in most cases
2565 :: 
2566 :3407876 :   int num_tests = 0, max_tests = 
param_modref_max_tests;
2567 :   14479077 :   for (auto base_node : tt->bases)
2568 :: {
2569 :6171739 :   if (tbaa_p && 
flag_strict_aliasing)
At average there are 2 bases travelled by the loop.
2570 :: {
2571 :5253690 :   if (num_tests >= max_tests)
2572 :: return true;
2573 :5253690 :   alias_stats.modref_tests++;
2574 :5253690 :   if (!alias_sets_conflict_p 
(base_set, base_node->base))
alias set checks cheecks are also reasonably cheap and 50% of time avoids 
walking rest of the tree
2575 :2238398 : continue;
2576 :3015292 :   num_tests++;
2577 :: }
2578 :: 
2579 :3933341 :   if (base_node->every_ref)

We hit the loop only 0.15 times per invocation of the function

2580 :: return true;
2581 :: 
2582 :   14943624 :   for (auto ref_node : 
base_node->refs)
2583 :: {
2584 ::   /* Do not repeat same test as 
before.  */
2585 :4381325 :   if ((ref_set != base_set || 
base_node->base != ref_node->ref)

and usually visit only bit more htan one refs
2586 :2548127 :   && tbaa_p && 
flag_strict_aliasing)
2587 :: {
2588 :1840866 :   if (num_tests >= 
max_tests)
2589 :: return true;
2590 :1809594 :   
alias_stats.modref_tests++;
2591 :1809594 :   if 
(!alias_sets_conflict_p (ref_set, ref_node->ref))

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
>> >  wrote:
>> >>
>> >> When compiling an optabs.ii at -O2 with a release-checking build,
>> >> we spent a lot of time in call_may_clobber_ref_p.  One of the
>> >> first things the function tries is the get_modref_function_summary
>> >> approach, but that also seems to be the most expensive check.
>> >> At least for the optabs.ii test, most of the things g_m_f_s could
>> >> handle could also be handled by points-to analysis, which is much
>> >> cheaper.
>> >>
>> >> This patch therefore rearranges the tests in approximate order
>> >> of cheapness.  One wart is that points-to analysis can't be
>> >> used on its own for this purpose; it has to be used in
>> >> conjunction with check_fnspec.  This is because the argument
>> >> information in the fnspec is not reflected in the points-to
>> >> information, which instead contains overly optimistic (rather
>> >> than conservatively pessimistic) information.
>> >>
>> >> Specifically, find_func_aliases_for_builtin_call short-circuits
>> >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
>> >> any compensating information about the arguments.  This means that:
>> >>
>> >>   free (foo)
>> >>
>> >> does not have an points-to information for foo (it has an empty
>> >> set instead).
>> >
>> > Huh?  The gimple_call_use/clobber_sets should be conservative
>> > and stand on their own.  Can you share a testcase that breaks when
>> > returning early if independent_pt?  Does this  problem also exist
>> > on the branch?
>>
>> With the attached patch to disable get_modref_function_summary,
>> do the PT query ahead of the fnspec query, and assert if the fnspec
>> query would have found an alias, I get the following results for
>> execute.exp on x86_64-linux-gnu (full log attached):
>>
>> # of expected passes44554
>> # of unexpected failures1710
>> # of unresolved testcases   855
>> # of unsupported tests  62
>>
>> E.g. for execute/2703-1.c, the assertion trips for:
>>
>> 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
>> (gdb) call debug (def_stmt)
>> __builtin_memset (p_3(D), 0, 28);
>> (gdb) call debug (ref.base)
>> *p_3(D)
>> (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
>> ref.base)
>> $3 = false
>
> huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> decls.
> In fact it should eventually ICE even ;)

Oops, yes.  That's what comes of trying to be fancy and doing things
in a gdb session rather than printfs. :-)  It is of course the:

  else if ((TREE_CODE (base) == MEM_REF
|| TREE_CODE (base) == TARGET_MEM_REF)
   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
{
  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0));
  if (pi
  && !pt_solutions_intersect (gimple_call_clobber_set (call), >pt))
{

path that fails the assert.

> That said, nothing should
> really iniitalize
> the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
> or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> function should return true ...

Yeah, it works OK in the reset state.

> It looks like Honza added computation of these sets at some point, also maybe
> forgetting to set some bits here.
>
> I prefer to investigate / fix this before refactoring the uses in this way.  
> Can
> you file a bugreport please?  I can reproduce the memcpy FAIL with
> just moving check_fnspec down.

OK, filed as PR103584 (for the list).

Thanks,
Richard

>
> Thanks,
> Richard.
>
>> Setting ASSERTS to 0 gives:
>>
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
>> -flto-partition=none  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
>> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
>>
>> # of expected passes23138
>> # of unexpected failures8
>> # of unsupported tests  24
>>
>> (same for -m32).
>>
>> Originally I saw it as a bootstrap failure building stage 2
>> build-side libcpp, due to a bogus uninitialised warning.
>>
>> But alongside the memset and free examples, there's:
>>
>>   /* All the following functions do not return pointers, do not
>>  modify the 

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches
On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
 wrote:
>
> On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > >  wrote:
> > >>
> > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > >> first things the function tries is the get_modref_function_summary
> > >> approach, but that also seems to be the most expensive check.
> > >> At least for the optabs.ii test, most of the things g_m_f_s could
> > >> handle could also be handled by points-to analysis, which is much
> > >> cheaper.
> > >>
> > >> This patch therefore rearranges the tests in approximate order
> > >> of cheapness.  One wart is that points-to analysis can't be
> > >> used on its own for this purpose; it has to be used in
> > >> conjunction with check_fnspec.  This is because the argument
> > >> information in the fnspec is not reflected in the points-to
> > >> information, which instead contains overly optimistic (rather
> > >> than conservatively pessimistic) information.
> > >>
> > >> Specifically, find_func_aliases_for_builtin_call short-circuits
> > >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> > >> any compensating information about the arguments.  This means that:
> > >>
> > >>   free (foo)
> > >>
> > >> does not have an points-to information for foo (it has an empty
> > >> set instead).
> > >
> > > Huh?  The gimple_call_use/clobber_sets should be conservative
> > > and stand on their own.  Can you share a testcase that breaks when
> > > returning early if independent_pt?  Does this  problem also exist
> > > on the branch?
> >
> > With the attached patch to disable get_modref_function_summary,
> > do the PT query ahead of the fnspec query, and assert if the fnspec
> > query would have found an alias, I get the following results for
> > execute.exp on x86_64-linux-gnu (full log attached):
> >
> > # of expected passes44554
> > # of unexpected failures1710
> > # of unresolved testcases   855
> > # of unsupported tests  62
> >
> > E.g. for execute/2703-1.c, the assertion trips for:
> >
> > 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> > (gdb) call debug (def_stmt)
> > __builtin_memset (p_3(D), 0, 28);
> > (gdb) call debug (ref.base)
> > *p_3(D)
> > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> > ref.base)
> > $3 = false
>
> huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> decls.
> In fact it should eventually ICE even ;)  That said, nothing should
> really iniitalize
> the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
> or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> function should return true ...
>
> It looks like Honza added computation of these sets at some point, also maybe
> forgetting to set some bits here.
>
> I prefer to investigate / fix this before refactoring the uses in this way.  
> Can
> you file a bugreport please?  I can reproduce the memcpy FAIL with
> just moving check_fnspec down.

So the issue seems that for
# .MEM_22 = VDEF <.MEM_19>
memcpy (, , 1024);

determine_global_memory_access determines the stmt doesn't write
to global memory (well, yes!), but it fails to add the actual arguments
to the set of clobbered vars.

Honza, can you please look into this?  Seems to be caused
by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8

Richard.

> Thanks,
> Richard.
>
> > Setting ASSERTS to 0 gives:
> >
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
> >
> > # of expected passes23138
> > # of unexpected failures8
> > # of unsupported tests  24
> >
> > (same for -m32).
> >
> > Originally I saw it as a bootstrap failure building stage 2
> > build-side libcpp, due to a bogus uninitialised warning.
> >
> > But alongside the memset and free examples, there's:
> >
> >   /* All the following functions do not return pointers, do not
> >  modify the points-to sets of memory reachable from their
> >  arguments and do not add to the ESCAPED solution.  */
> >   

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches
On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> When compiling an optabs.ii at -O2 with a release-checking build,
> >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> >> first things the function tries is the get_modref_function_summary
> >> approach, but that also seems to be the most expensive check.
> >> At least for the optabs.ii test, most of the things g_m_f_s could
> >> handle could also be handled by points-to analysis, which is much
> >> cheaper.
> >>
> >> This patch therefore rearranges the tests in approximate order
> >> of cheapness.  One wart is that points-to analysis can't be
> >> used on its own for this purpose; it has to be used in
> >> conjunction with check_fnspec.  This is because the argument
> >> information in the fnspec is not reflected in the points-to
> >> information, which instead contains overly optimistic (rather
> >> than conservatively pessimistic) information.
> >>
> >> Specifically, find_func_aliases_for_builtin_call short-circuits
> >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> >> any compensating information about the arguments.  This means that:
> >>
> >>   free (foo)
> >>
> >> does not have an points-to information for foo (it has an empty
> >> set instead).
> >
> > Huh?  The gimple_call_use/clobber_sets should be conservative
> > and stand on their own.  Can you share a testcase that breaks when
> > returning early if independent_pt?  Does this  problem also exist
> > on the branch?
>
> With the attached patch to disable get_modref_function_summary,
> do the PT query ahead of the fnspec query, and assert if the fnspec
> query would have found an alias, I get the following results for
> execute.exp on x86_64-linux-gnu (full log attached):
>
> # of expected passes44554
> # of unexpected failures1710
> # of unresolved testcases   855
> # of unsupported tests  62
>
> E.g. for execute/2703-1.c, the assertion trips for:
>
> 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> (gdb) call debug (def_stmt)
> __builtin_memset (p_3(D), 0, 28);
> (gdb) call debug (ref.base)
> *p_3(D)
> (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> ref.base)
> $3 = false

huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for decls.
In fact it should eventually ICE even ;)  That said, nothing should
really iniitalize
the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
function should return true ...

It looks like Honza added computation of these sets at some point, also maybe
forgetting to set some bits here.

I prefer to investigate / fix this before refactoring the uses in this way.  Can
you file a bugreport please?  I can reproduce the memcpy FAIL with
just moving check_fnspec down.

Thanks,
Richard.

> Setting ASSERTS to 0 gives:
>
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
>
> # of expected passes23138
> # of unexpected failures8
> # of unsupported tests  24
>
> (same for -m32).
>
> Originally I saw it as a bootstrap failure building stage 2
> build-side libcpp, due to a bogus uninitialised warning.
>
> But alongside the memset and free examples, there's:
>
>   /* All the following functions do not return pointers, do not
>  modify the points-to sets of memory reachable from their
>  arguments and do not add to the ESCAPED solution.  */
>   case BUILT_IN_SINCOS:
>   case BUILT_IN_SINCOSF:
>   ...
> return true;
>
> (always an empty set), whereas check_fnspec takes errno into account.
>
> > I wonder if, at this point, we should split the patch up to do the
> > trivial move of the ao_ref_base dependent and volatile checks
> > and leave this more complicated detail for a second patch?
>
> Yeah, that'd work, but in practice it was the PT part that was the
> high-value part.  IIRC the early base checks caught less than 1% of
> cases (I can rerun tonight to get exact numbers).
>
> Thanks,
> Richard
>


Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-05 Thread Richard Biener via Gcc-patches
On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
 wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> we spent a lot of time in call_may_clobber_ref_p.  One of the
> first things the function tries is the get_modref_function_summary
> approach, but that also seems to be the most expensive check.
> At least for the optabs.ii test, most of the things g_m_f_s could
> handle could also be handled by points-to analysis, which is much
> cheaper.
>
> This patch therefore rearranges the tests in approximate order
> of cheapness.  One wart is that points-to analysis can't be
> used on its own for this purpose; it has to be used in
> conjunction with check_fnspec.  This is because the argument
> information in the fnspec is not reflected in the points-to
> information, which instead contains overly optimistic (rather
> than conservatively pessimistic) information.
>
> Specifically, find_func_aliases_for_builtin_call short-circuits
> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> any compensating information about the arguments.  This means that:
>
>   free (foo)
>
> does not have an points-to information for foo (it has an empty
> set instead).

Huh?  The gimple_call_use/clobber_sets should be conservative
and stand on their own.  Can you share a testcase that breaks when
returning early if independent_pt?  Does this  problem also exist
on the branch?

I wonder if, at this point, we should split the patch up to do the
trivial move of the ao_ref_base dependent and volatile checks
and leave this more complicated detail for a second patch?

The first part is pre-approved, I'd like to understand the second part
before considering it.

Thanks,
Richard.

>
> It would be good to fix that for compile-time reasons if nothing else.
> Doing points-to analysis without check_fnspec gave a measureable
> speed-up, since PTA resolved the vast majority of queries, and
> (as expected) the vast majority of calls had no fnspec.
>
> This gave about a ~2% compile-time improvement for the test above.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks
> to the cheaper ones come first.
> ---
>  gcc/tree-ssa-alias.c | 145 +--
>  1 file changed, 70 insertions(+), 75 deletions(-)
>
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index 88fd7821c5e..573766278bc 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool 
> tbaa_p)
>  bool
>  call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p)
>  {
> -  tree base;
> -  tree callee;
> -
>/* If the call is pure or const it cannot clobber anything.  */
>if (gimple_call_flags (call)
>& (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS))
> @@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
> break;
>}
>
> -  callee = gimple_call_fndecl (call);
> +  tree base = ao_ref_base (ref);
> +  if (base
> +  && (TREE_CODE (base) == SSA_NAME
> + || CONSTANT_CLASS_P (base)))
> +return false;
> +
> +  /* A call that is not without side-effects might involve volatile
> + accesses and thus conflicts with all other volatile accesses.  */
> +  if (ref->volatile_p)
> +return true;
> +
> +  bool independent_pt = false;
> +  if (base && DECL_P (base))
> +{
> +  /* If the reference is based on a decl that is not aliased the call
> +cannot possibly clobber it.  */
> +  if (!may_be_aliased (base)
> + /* But local non-readonly statics can be modified through
> +recursion or the call may implement a threading barrier which
> +we must treat as may-def.  */
> + && (TREE_READONLY (base)
> + || !is_global_var (base)))
> +   return false;
>
> -  if (callee != NULL_TREE && !ref->volatile_p)
> +  auto clobbers = gimple_call_clobber_set (call);
> +  independent_pt = !pt_solution_includes (clobbers, base);
> +}
> +  else if (base
> +  && (TREE_CODE (base) == MEM_REF
> +  || TREE_CODE (base) == TARGET_MEM_REF)
> +  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
> +{
> +  tree addr = TREE_OPERAND (base, 0);
> +
> +  /* If the reference is based on a pointer that points to memory that
> +may not be written to then the call cannot possibly clobber it.  */
> +  if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr))
> +   return false;
> +
> +  /* Check if the base variable is call-clobbered.  */
> +  if (auto *pi = SSA_NAME_PTR_INFO (addr))
> +   {
> + auto clobbers = gimple_call_clobber_set (call);
> + independent_pt = !pt_solutions_intersect (clobbers, >pt);
> +   }
> +}
> +
> +  /* Points-to information needs to be tested in 

[PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-05 Thread Richard Sandiford via Gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
we spent a lot of time in call_may_clobber_ref_p.  One of the
first things the function tries is the get_modref_function_summary
approach, but that also seems to be the most expensive check.
At least for the optabs.ii test, most of the things g_m_f_s could
handle could also be handled by points-to analysis, which is much
cheaper.

This patch therefore rearranges the tests in approximate order
of cheapness.  One wart is that points-to analysis can't be
used on its own for this purpose; it has to be used in
conjunction with check_fnspec.  This is because the argument
information in the fnspec is not reflected in the points-to
information, which instead contains overly optimistic (rather
than conservatively pessimistic) information.

Specifically, find_func_aliases_for_builtin_call short-circuits
the handle_rhs_call/handle_call_arg logic and doesn't itself add
any compensating information about the arguments.  This means that:

  free (foo)

does not have an points-to information for foo (it has an empty
set instead).

It would be good to fix that for compile-time reasons if nothing else.
Doing points-to analysis without check_fnspec gave a measureable
speed-up, since PTA resolved the vast majority of queries, and
(as expected) the vast majority of calls had no fnspec.

This gave about a ~2% compile-time improvement for the test above.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-ssa-alias.c (call_may_clobber_ref_p_1): Reorder checks
to the cheaper ones come first.
---
 gcc/tree-ssa-alias.c | 145 +--
 1 file changed, 70 insertions(+), 75 deletions(-)

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 88fd7821c5e..573766278bc 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2952,9 +2952,6 @@ ref_maybe_used_by_stmt_p (gimple *stmt, tree ref, bool 
tbaa_p)
 bool
 call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p)
 {
-  tree base;
-  tree callee;
-
   /* If the call is pure or const it cannot clobber anything.  */
   if (gimple_call_flags (call)
   & (ECF_PURE|ECF_CONST|ECF_LOOPING_CONST_OR_PURE|ECF_NOVOPS))
@@ -2977,9 +2974,64 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool 
tbaa_p)
break;
   }
 
-  callee = gimple_call_fndecl (call);
+  tree base = ao_ref_base (ref);
+  if (base
+  && (TREE_CODE (base) == SSA_NAME
+ || CONSTANT_CLASS_P (base)))
+return false;
+
+  /* A call that is not without side-effects might involve volatile
+ accesses and thus conflicts with all other volatile accesses.  */
+  if (ref->volatile_p)
+return true;
+
+  bool independent_pt = false;
+  if (base && DECL_P (base))
+{
+  /* If the reference is based on a decl that is not aliased the call
+cannot possibly clobber it.  */
+  if (!may_be_aliased (base)
+ /* But local non-readonly statics can be modified through
+recursion or the call may implement a threading barrier which
+we must treat as may-def.  */
+ && (TREE_READONLY (base)
+ || !is_global_var (base)))
+   return false;
 
-  if (callee != NULL_TREE && !ref->volatile_p)
+  auto clobbers = gimple_call_clobber_set (call);
+  independent_pt = !pt_solution_includes (clobbers, base);
+}
+  else if (base
+  && (TREE_CODE (base) == MEM_REF
+  || TREE_CODE (base) == TARGET_MEM_REF)
+  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
+{
+  tree addr = TREE_OPERAND (base, 0);
+
+  /* If the reference is based on a pointer that points to memory that
+may not be written to then the call cannot possibly clobber it.  */
+  if (SSA_NAME_POINTS_TO_READONLY_MEMORY (addr))
+   return false;
+
+  /* Check if the base variable is call-clobbered.  */
+  if (auto *pi = SSA_NAME_PTR_INFO (addr))
+   {
+ auto clobbers = gimple_call_clobber_set (call);
+ independent_pt = !pt_solutions_intersect (clobbers, >pt);
+   }
+}
+
+  /* Points-to information needs to be tested in conjunction with
+ check_fnspec.  The fnspec argument information for built-in
+ functions is excluded from the PT sets.  */
+  int res = check_fnspec (call, ref, true);
+  if (res >= 0)
+return res;
+
+  if (independent_pt)
+return false;
+
+  if (tree callee = gimple_call_fndecl (call))
 {
   struct cgraph_node *node = cgraph_node::get (callee);
   if (node)
@@ -3017,77 +3069,20 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
  return false;
}
  alias_stats.modref_clobber_may_alias++;
- }
-   }
-}
-
-  base = ao_ref_base (ref);
-  if (!base)
-return true;
-
-  if (TREE_CODE (base) == SSA_NAME
-  || CONSTANT_CLASS_P (base))
-return false;
-
-  /* A call that is not without