Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

Jan Hubicka Thu, 21 Jan 2021 07:11:27 -0800

> Hi All,
> 
> James and I have been investigating this regression and have tracked it down 
> to register allocation.
> 
> I have create a new PR with our findings 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately
> we don't know how to proceed.
> 
> This does seem like a genuine bug in RA.  It looks like some magic threshold 
> has been crossed, but we're having
> trouble determining what this magic number is.


Thank you for the analysis - it was on my TODO list for very long
time, but the function is large.  I will read it carefully and lets see
if we can come up with something useful.  

Honza
> 
> Any help is appreciated.
> 
> Thanks,
> Tamar
> 
> > -----Original Message-----
> > From: Xionghu Luo <luo...@linux.ibm.com>
> > Sent: Friday, October 16, 2020 9:47 AM
> > To: Tamar Christina <tamar.christ...@arm.com>; Martin Jambor
> > <mjam...@suse.cz>; Richard Sandiford <richard.sandif...@arm.com>;
> > luoxhu via Gcc-patches <gcc-patches@gcc.gnu.org>
> > Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com;
> > li...@gcc.gnu.org; Jan Hubicka <hubi...@ucw.cz>; dje....@gmail.com
> > Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive
> > calls
> > 
> > 
> > 
> > On 2020/9/12 01:36, Tamar Christina wrote:
> > > Hi Martin,
> > >
> > >>
> > >> can you please confirm that the difference between these two is all
> > >> due to the last option -fno-inline-functions-called-once ?  Is LTo
> > >> necessary?  I.e., can you run the benchmark also built with the
> > >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-
> > inline-functions-called-once ?
> > >>
> > >
> > > Done, see below.
> > >
> > >>> +----------+--------------------------------------------------------
> > >>> +----------+----------------------
> > >> --------------------------------------------------------------------+--------------+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > >> | -24%         |  |  |
> > >>> +----------+--------------------------------------------------------
> > >>> +----------+----------------------
> > >> --------------------------------------------------------------------+--------------+--+-
> > -+
> > >>> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> > >> | -26%         |  |  |
> > >>> +----------+--------------------------------------------------------
> > >>> +----------+----------------------
> > >> --------------------------------------------------------------------+--------------+--+-
> > -+
> > >>
> > >>>
> > >>> (Hopefully the table shows up correct)
> > >>
> > >> it does show OK for me, thanks.
> > >>
> > >>>
> > >>> It looks like your patch definitely does improve the basic cases. So
> > >>> there's not much difference between lto and non-lto anymore and it's
> > >> much Better than GCC 10. However it still contains the regression
> > >> introduced by Honza's changes.
> > >>
> > >> I assume these are rates, not times, so negative means bad.  But do I
> > >> understand it correctly that you're comparing against GCC 10 with the
> > >> two parameters set to rather special values?  Because your table
> > >> seems to indicate that even for you, the branch is faster than GCC 10
> > >> with just - mcpu=native -Ofast -fomit-frame-pointer.
> > >
> > > Yes these are indeed rates, and indeed I am comparing against the same
> > > options we used to get the fastest rates on before which is the two
> > > parameters and the inline flag.
> > >
> > >>
> > >> So is the problem that the best obtainable run-time, even with
> > >> obscure options, from the branch is slower than the best obtainable
> > >> run-time from GCC 10?
> > >>
> > >
> > > Yeah that's the problem, when we compare the two we're still behind.
> > >
> > > I've done the additional two runs
> > >
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Compiler | Flags
> > | diff GCC 10  |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> > > ipa-cp-
> > eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> > called-once |              |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer
> > | -44%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > | -36%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> > > ipa-cp-
> > eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> > called-once | -12%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> > > ipa-cp-
> > eval-threshold=1 --param   ipa-cp-unit-growth=80                            
> >        | -22%
> > |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> > > ipa-cp-
> > eval-threshold=1 --param   ipa-cp-unit-growth=80 -fno-inline-functions-
> > called-once | -12%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> > | -24%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> > | -26%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline-
> > functions-called-once                                                       
> >           | -12%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -fno-inline-
> > functions-called-once                                                       
> >                 | -11%         |
> > > +----------+------------------------------------------------------------------------------
> > --------------------------------------------------------------------+--------------+
> > >
> > > And this confirms that indeed LTO isn't needed and that the branch
> > > without any options is indeed much better than it was on GCC 10 without
> > any options.
> > >
> > > It also confirms that the only remaining difference is in the
> > > -fno-inline-functions-called-once
> > 
> > If -fno-inline-functions-called-once is added, the recursive call function
> > digits_2 won't be inlined, as each digits_2 is specialized to clone nodes 
> > and
> > called once only, so performance back is expected, I guess it is somewhat
> > similar to -fno-inline for this case.
> > 
> > @Jambor @Honza Any progress about this (--param controlling maximal
> > recursion depth) and the other regression about
> > LOOP_GUARD_WITH_PREDICTION in
> > PR96825(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825) please? :) I
> > tested the current master FSF code, the regression still exists...
> > 
> > 
> > Thanks,
> > Xionghu
> > 
>

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

Reply via email to