Am Fri 05 Aug 2011 07:49:49 PM CEST schrieb Xinliang David Li <davi...@google.com>:

On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka <j...@suse.de> wrote:
Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.

FDO with -Os still optimize for size, even in hot parts.

I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
always false for optimize_size :(

That is function level query. At the BB/EDGE level, the condition is refined:

Well we summarize function profile to:
 1) hot
 2) normal
 3) executed once
 4) unlikely

We summarize BB profile to:
 1) maybe_hot
 2) probably_cold (equivalent to !maybe_hot)
 3) probably_never_executed

Except for executed once that is special thing for function fed by discovery of main() and static ctors/dtors there is 1-1 correspondence in between BB and function predicates. With profile feedback function is hot if it contain BB that is maybe_hot (with feedback it is also probably hot), it is normal if it contain BB that is !probably_never_executed and unlikely if all BBs are probably_never_executed. So with profile feedback the function profile summaries are no more refined that BB ones.

Without profile feedback things are more messy and the names of BB settings was more or less invented on what static profile estimate can tell you. Lacking function level profile estimate, we generally consider functions "normal" unless told otherwise in few special cases. We also never autodetect probably_never_executed even though it would make a lot of sense to do so for EH/paths to exit. As I mentioned, I think we should start doing so.

Finally optimize_size comes into game that is independent of the summaries above and it is why I added the optimize_XXX_for_size/speed predicates. By default -Os imply optimize for size everything and -O123 optimize for size everything that is maybe_hot (i.e. not quite reliably proven otherwise).

In a way I like the current scheme since it is simple and extending it should IMO have some good reason. We could refine -Os behaviour without changing current predicates to optimize for speed in a) functions declared as "hot" by user and BBs in them that are not proved cold. b) based on profile feedback - i.e. we could have two thresholds, BBs with very arge counts wil be probably hot, BBs in between will be maybe hot/normal and BBs with low counts will be cold. This would probably motivate introduction of probably_hot predicate that summarize the above.

If we want to refine things, we could also re-consider how we want to behave to BBs with 0 coverage. I.e. if we want to a) consider them "normal" and let the presence of -Os/-O123 to decide whether they are size/speed optimized,
 b) consider them "cold" since they are not executed at all,
c) consider them "cold" in functions that are otherwise covered by the test run and "normal" in case the function is not covered at all (i.e. training X server on particular set of hardware may not convince GCC to optimize for size all the other drivers not covered by the train run).

We currently implement B and it sort of work well since users usually train for what matters for them and are happy to see binaries smaller.

What I don't like about the a&c is bit of inconsistency with small counts. I.e. count 1 will imply optimizing for size, but roundoff error to 0 will cause it to be optimized for speed that is weird. Of course also flipping the default here would cause significant grown of FDO binaries and users are already unhappy that FDO binaries are too large.

Honza

Reply via email to