Hi Martin,

> 
> can you please confirm that the difference between these two is all due to
> the last option -fno-inline-functions-called-once ?  Is LTo necessary?  I.e., 
> can
> you run the benchmark also built with the branch compiler and -mcpu=native
> -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ?
> 

Done, see below.

> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+--+--+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> | -24%         |  |  |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+--+--+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> | -26%         |  |  |
> > +----------+------------------------------------------------------------------------------
> --------------------------------------------------------------------+--------------+--+--+
> 
> >
> > (Hopefully the table shows up correct)
> 
> it does show OK for me, thanks.
> 
> >
> > It looks like your patch definitely does improve the basic cases. So
> > there's not much difference between lto and non-lto anymore and it's
> much Better than GCC 10. However it still contains the regression introduced
> by Honza's changes.
> 
> I assume these are rates, not times, so negative means bad.  But do I
> understand it correctly that you're comparing against GCC 10 with the two
> parameters set to rather special values?  Because your table seems to
> indicate that even for you, the branch is faster than GCC 10 with just -
> mcpu=native -Ofast -fomit-frame-pointer.

Yes these are indeed rates, and indeed I am comparing against the same options
we used to get the fastest rates on before which is the two parameters and
the inline flag.

> 
> So is the problem that the best obtainable run-time, even with obscure
> options, from the branch is slower than the best obtainable run-time from
> GCC 10?
> 

Yeah that's the problem, when we compare the two we're still behind.

I've done the additional two runs

+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Compiler | Flags                                                              
                                                                              | 
diff GCC 10  |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once |              |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer                           
                                                                              | 
-44%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto                     
                                                                              | 
-36%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80                         
          | -22%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto                     
                                                                              | 
-24%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer                           
                                                                              | 
-26%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto 
-fno-inline-functions-called-once                                               
                  | -12%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer 
-fno-inline-functions-called-once                                               
                        | -11%         |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------+

And this confirms that indeed LTO isn't needed and that the branch
without any options is indeed much better than it was on GCC 10 without any 
options.

It also confirms that the only remaining difference is in the 
-fno-inline-functions-called-once

> >
> >> > And I tried 3 runs
> >> > 1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param
> >> > ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80
> >> > -fno-inline-functions-called-once
> >>
> >> This is the first time I saw -fno-inline-functions-called-once used
> >> in this context.  This seems to indicate we are looking at another
> >> problem that at least I have not known about yet.  Can you please
> >> upload somewhere the inlining WPA dumps with and without the option?
> >
> > We used it to cover up for the register allocation issue where in
> > lining some large functions would cause massive spilling.  Looks like
> > it still has an effect now but even with it we're still seeing the 12%
> regression.
> >
> > Which option is this? -fdump-ipa-cgraph?
> 
> -fdump-ipa-inline-details and -fdump-ipa-cp-details.

I've kicked off the CI runs and will get you the dumps on Monday.

Cheers,
Tamar

> 
> It would be nice if the slowdown was all due to the inliner.  But the 
> predictors
> changes might of course have quite an impact also on other optimizations.
> 
> Martin

Reply via email to