Thanks for the info.

Jan Hubicka <hubi...@kam.mff.cuni.cz> writes:
>> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
>> <richard.guent...@gmail.com> wrote:
>> >
>> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> > >
>> > > Richard Biener <richard.guent...@gmail.com> writes:
>> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
>> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > >>
>> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
>> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
>> > > >> first things the function tries is the get_modref_function_summary
>> > > >> approach, but that also seems to be the most expensive check.
>
> In general it should not be that expensive since the trees are generally
> small (despite the 3 nested loops making them look dangerously big) and
> most of time we resort to simple pointer/array lookups.
>
> From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
> (which are quite useful when trying to understand typical behaviour of
> the alias oracle)
>
> In call_may_clobber_ref_p we get
>
>     2980                 :  256141790 :   callee = gimple_call_fndecl (call);
>     2981                 :            : 
>     2982                 :  256141790 :   if (callee != NULL_TREE && 
> !ref->volatile_p)
>     2983                 :            :     {
>     2984                 :  240446150 :       struct cgraph_node *node = 
> cgraph_node::get (callee);
> this is direct pointer from node
>     2985                 :  240446150 :       if (node)
>     2986                 :            :         {
>     2987                 :  240234208 :           modref_summary *summary = 
> get_modref_function_summary (node);
> This is fast summary so it is array lookup

Yeah, the fast summary array lookup itself seems fine.  What slowed
this down for me was instead:

  /* A single function body may be represented by multiple symbols with
     different visibility.  For example, if FUNC is an interposable alias,
     we don't want to return anything, even if we have summary for the target
     function.  */
  enum availability avail;
  func = func->function_or_virtual_thunk_symbol
                 (&avail, current_function_decl ?
                          cgraph_node::get (current_function_decl) : NULL);
  if (avail <= AVAIL_INTERPOSABLE)
    return NULL;

which goes through get_availability (via ultimate_alias_target).
The most common path for the optabs.ii testcase is:

  /* Inline functions are safe to be analyzed even if their symbol can
     be overwritten at runtime.  It is not meaningful to enforce any sane
     behavior on replacing inline function by different body.  */
  else if (DECL_DECLARED_INLINE_P (decl))
    avail = AVAIL_AVAILABLE;

But to get there we had to go through:

  else if (ifunc_resolver
           || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl)))
    avail = AVAIL_INTERPOSABLE;

and most functions in this test have nonnull attributes.  (I can look
into why if that's unexpected.)

Last night I tried caching the noipa attribute, like we do for ifunc.
That did improve compile time by 0.5% (relative to with the patch
applied rather than without it).  It also made the cost of the lookup
part, up to this line:

>     2988                 :  240234208 :           if (summary)

negligible even with the original order of the checks.  So I think
with that change, the lookup becomes cheap, like you say.

I don't think adding the caching is safe at this point in stage 3
though.  Attributes are added by direct manipulation of the tree data
structure and there are multiple places that in principle could
(directly or indirectly) add new noipa attributes after the decl
has been entered into the symbol table.  No existing tests seemed
to expose that, but it was possible to write a new test that did.

>     2989                 :            :             {
>     2990                 :   19732392 :               if 
> (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
>
> And we get here in 8% of invocations

Once the lookup is neglible as above, so that the cost is purely on this
test, I still see a ~2% slowdown relative to the “PT first” ordering.
I can look into why in more detail, if that's unexpected,  It seemed
from a quick look to be inner ao_ref stuff.

Like you said at the top of the quote above, the number of loop
iterations per query was low.  By far the most common in the optabs.ii
testcase was:

  1 outer
  2 middle
  2 inner

Thanks,
Richard

Reply via email to