RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

Martin Jambor Tue, 08 Sep 2020 07:01:23 -0700

Hi,

On Fri, Aug 21 2020, Tamar Christina wrote:
>> 
>> Honza's changes have been motivated to big extent as an enabler for IPA-CP
>> heuristics changes to actually speed up 548.exchange2_r.
>> 
>> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
>> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
>> below it is just 276 seconds - faster than one built with GCC 8 which needs
>> 283 seconds.
>> 
>> I'll be interested in knowing if it also works this well on other 
>> architectures.
>>


I have posted the new version of the patch series to the mailing list
yesterday and I have also pushed the branch to the FSF repo as
refs/users/jamborm/heads/ipa-context_and_exchange-200907

>
> Many thanks for working on this!
>
> I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
> Do I need any flags for it to work? The patch was applied on top of 
> 656218ab982cc22b826227045826c92743143af1
>

I only have access to fairly old AMD (Seattle) Opteron 1100 which might
not support some interesting Aarch64 ISA extensions but I can measure a
significant speedup on it (everything with just -Ofast -march=native
-mtune=native, no non-default parameters, without LTO, without any
inlining options):

  GCC 10 branch:              915 seconds
  Master (rev. 995bb851ffe):  989 seconds
  My branch:                  827 seconds

(All is 548.exchange_r reference run time.)

> And I tried 3 runs
> 1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once

This is the first time I saw -fno-inline-functions-called-once used in
this context.  This seems to indicate we are looking at another problem
that at least I have not known about yet.  Can you please upload
somewhere the inlining WPA dumps with and without the option?

Similarly, I do not need LTO for the speedup on x86_64.

The patches in the series should also remove the need for --param
ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 If you still need
them on my branch, could you please again provide me with (WPA, if with
LTO) ipa-cp dumps with and without them?


> 2) -mcpu=native -Ofast -fomit-frame-pointer -flto 
> -fno-inline-functions-called-once
> 3) -mcpu=native -Ofast -fomit-frame-pointer -flto
>
> First one used to give us the best result, with this patch there's no 
> difference between 1 and 2 (11% regression) and the 3rd one is about 15% on 
> top of that.

OK, so the patch did help (but above you wrote it did not?) but not
enough to be as fast as some previous revision and on top of that
-fno-inline-functions-called-once further helps but again not enough?

If correct, this looks like we need to examine what goes wrong
specifically in the case of Neoverse-N1 though.

Thanks,

Martin

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

Reply via email to