Re: FDO and LTO on ARM

Jan Hubicka Fri, 05 Aug 2011 12:49:04 -0700

Am Fri 05 Aug 2011 07:49:49 PM CEST schrieb Xinliang David Li<davi...@google.com>:

On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:

On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka <j...@suse.de> wrote:

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.


FDO with -Os still optimize for size, even in hot parts.


I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
always false for optimize_size :(


That is function level query. At the BB/EDGE level, the condition is refined:


Well we summarize function profile to:
 1) hot
 2) normal
 3) executed once
 4) unlikely

We summarize BB profile to:
 1) maybe_hot
 2) probably_cold (equivalent to !maybe_hot)
 3) probably_never_executed

Except for executed once that is special thing for function fed bydiscovery of main() and static ctors/dtors there is 1-1 correspondencein between BB and function predicates. With profile feedback functionis hot if it contain BB that is maybe_hot (with feedback it is alsoprobably hot), it is normal if it contain BB that is!probably_never_executed and unlikely if all BBs areprobably_never_executed. So with profile feedback the function profilesummaries are no more refined that BB ones.

Without profile feedback things are more messy and the names of BBsettings was more or less invented on what static profile estimate cantell you. Lacking function level profile estimate, we generallyconsider functions "normal" unless told otherwise in few special cases.We also never autodetect probably_never_executed even though it wouldmake a lot of sense to do so for EH/paths to exit. As I mentioned, Ithink we should start doing so.

Finally optimize_size comes into game that is independent of thesummaries above and it is why I added the optimize_XXX_for_size/speedpredicates. By default -Os imply optimize for size everything and-O123 optimize for size everything that is maybe_hot (i.e. not quitereliably proven otherwise).

In a way I like the current scheme since it is simple and extending itshould IMO have some good reason. We could refine -Os behaviourwithout changing current predicates to optimize for speed ina) functions declared as "hot" by user and BBs in them that are notproved cold.b) based on profile feedback - i.e. we could have two thresholds, BBswith very arge counts wil be probably hot, BBs in between will bemaybe hot/normal and BBs with low counts will be cold.This would probably motivate introduction of probably_hot predicatethat summarize the above.

If we want to refine things, we could also re-consider how we want tobehave to BBs with 0 coverage. I.e. if we want toa) consider them "normal" and let the presence of -Os/-O123 todecide whether they are size/speed optimized,

 b) consider them "cold" since they are not executed at all,

c) consider them "cold" in functions that are otherwise covered bythe test run and "normal" in case the function is not covered at all(i.e. training X server on particular set of hardware may not convinceGCC to optimize for size all the other drivers not covered by thetrain run).

We currently implement B and it sort of work well since users usuallytrain for what matters for them and are happy to see binaries smaller.

What I don't like about the a&c is bit of inconsistency with smallcounts. I.e. count 1 will imply optimizing for size, but roundofferror to 0 will cause it to be optimized for speed that is weird.Of course also flipping the default here would cause significant grownof FDO binaries and users are already unhappy that FDO binaries aretoo large.


Honza

Re: FDO and LTO on ARM

Reply via email to