https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598

--- Comment #11 from Jiangning Liu <jiangning.liu at amperecomputing dot com> 
---
(In reply to rguent...@suse.de from comment #8)
> On Sat, 9 Jan 2021, jiangning.liu at amperecomputing dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
> > 
> > --- Comment #7 from Jiangning Liu <jiangning.liu at amperecomputing dot 
> > com> ---
> > (In reply to rguent...@suse.de from comment #6)
> > > On January 9, 2021 4:17:17 AM GMT+01:00, "jiangning.liu at amperecomputing
> > > dot com" <gcc-bugzi...@gcc.gnu.org> wrote:
> > > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
> > > >
> > > >--- Comment #5 from Jiangning Liu <jiangning.liu at amperecomputing dot
> > > >com> ---
> > > >> It has to be done with care of course, cost modeling is difficult
> > > >> (we need to have a good estimate of n and m or need to version
> > > >> the whole nest).  That said, usually we attempt the reverse
> > > >transform.
> > > >
> > > >Before tuning the cost model good enough, we may implement this
> > > >optimization by
> > > >adding a new optimization command line option. This won't hurt gcc,
> > > >right?
> > > 
> > > New options not enabled by default tend to bitrot, be broken from the 
> > > start
> > > and won't be used by the lazy user. So I see no point in doing that. 
> > > 
> > 
> > Understand. I mean we can enable it by default eventually, but we need to
> > implement and tune it step by step. It is unrealistic to work out the best 
> > cost
> > model at the very beginning.
> 
> Sure.  The "easiest" thing is to rely on a profile from PGO, we did
> have some transforms only enabled by -fprofile-use by default.  That is,
> the cost model needs to be conservative, esp. if you introduce dynamic
> allocation for this.  In the end I guess only a variant that versions
> the nest on the size of the temporary will be good enough to not trigger
> OOM or excessive overhead for small sizes anyway.

People usually don't use PGO unless they can't find any better static compiler
switches. This optimization should always benefit performance if we can tune
the cost model good enough. It is true that the temp memory size needs to be
checked to avoid OOM, which is one of the runtime overheads.

Reply via email to