> Hi All, > > James and I have been investigating this regression and have tracked it down > to register allocation. > > I have create a new PR with our findings > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately > we don't know how to proceed. > > This does seem like a genuine bug in RA. It looks like some magic threshold > has been crossed, but we're having > trouble determining what this magic number is.
Thank you for the analysis - it was on my TODO list for very long time, but the function is large. I will read it carefully and lets see if we can come up with something useful. Honza > > Any help is appreciated. > > Thanks, > Tamar > > > -----Original Message----- > > From: Xionghu Luo <luo...@linux.ibm.com> > > Sent: Friday, October 16, 2020 9:47 AM > > To: Tamar Christina <tamar.christ...@arm.com>; Martin Jambor > > <mjam...@suse.cz>; Richard Sandiford <richard.sandif...@arm.com>; > > luoxhu via Gcc-patches <gcc-patches@gcc.gnu.org> > > Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com; > > li...@gcc.gnu.org; Jan Hubicka <hubi...@ucw.cz>; dje....@gmail.com > > Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive > > calls > > > > > > > > On 2020/9/12 01:36, Tamar Christina wrote: > > > Hi Martin, > > > > > >> > > >> can you please confirm that the difference between these two is all > > >> due to the last option -fno-inline-functions-called-once ? Is LTo > > >> necessary? I.e., can you run the benchmark also built with the > > >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno- > > inline-functions-called-once ? > > >> > > > > > > Done, see below. > > > > > >>> +----------+-------------------------------------------------------- > > >>> +----------+---------------------- > > >> --------------------------------------------------------------------+--------------+--+- > > -+ > > >>> | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > > >> | -24% | | | > > >>> +----------+-------------------------------------------------------- > > >>> +----------+---------------------- > > >> --------------------------------------------------------------------+--------------+--+- > > -+ > > >>> | Branch | -mcpu=native -Ofast -fomit-frame-pointer > > >> | -26% | | | > > >>> +----------+-------------------------------------------------------- > > >>> +----------+---------------------- > > >> --------------------------------------------------------------------+--------------+--+- > > -+ > > >> > > >>> > > >>> (Hopefully the table shows up correct) > > >> > > >> it does show OK for me, thanks. > > >> > > >>> > > >>> It looks like your patch definitely does improve the basic cases. So > > >>> there's not much difference between lto and non-lto anymore and it's > > >> much Better than GCC 10. However it still contains the regression > > >> introduced by Honza's changes. > > >> > > >> I assume these are rates, not times, so negative means bad. But do I > > >> understand it correctly that you're comparing against GCC 10 with the > > >> two parameters set to rather special values? Because your table > > >> seems to indicate that even for you, the branch is faster than GCC 10 > > >> with just - mcpu=native -Ofast -fomit-frame-pointer. > > > > > > Yes these are indeed rates, and indeed I am comparing against the same > > > options we used to get the fastest rates on before which is the two > > > parameters and the inline flag. > > > > > >> > > >> So is the problem that the best obtainable run-time, even with > > >> obscure options, from the branch is slower than the best obtainable > > >> run-time from GCC 10? > > >> > > > > > > Yeah that's the problem, when we compare the two we're still behind. > > > > > > I've done the additional two runs > > > > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Compiler | Flags > > | diff GCC 10 | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > > > ipa-cp- > > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > > called-once | | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer > > | -44% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto > > | -36% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | GCC 11 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > > > ipa-cp- > > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > > called-once | -12% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > > > ipa-cp- > > eval-threshold=1 --param ipa-cp-unit-growth=80 > > | -22% > > | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > > > ipa-cp- > > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > > called-once | -12% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > > | -24% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer > > | -26% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline- > > functions-called-once > > | -12% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -fno-inline- > > functions-called-once > > | -11% | > > > +----------+------------------------------------------------------------------------------ > > --------------------------------------------------------------------+--------------+ > > > > > > And this confirms that indeed LTO isn't needed and that the branch > > > without any options is indeed much better than it was on GCC 10 without > > any options. > > > > > > It also confirms that the only remaining difference is in the > > > -fno-inline-functions-called-once > > > > If -fno-inline-functions-called-once is added, the recursive call function > > digits_2 won't be inlined, as each digits_2 is specialized to clone nodes > > and > > called once only, so performance back is expected, I guess it is somewhat > > similar to -fno-inline for this case. > > > > @Jambor @Honza Any progress about this (--param controlling maximal > > recursion depth) and the other regression about > > LOOP_GUARD_WITH_PREDICTION in > > PR96825(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825) please? :) I > > tested the current master FSF code, the regression still exists... > > > > > > Thanks, > > Xionghu > > >