Am Fri 05 Aug 2011 07:49:49 PM CEST schrieb Xinliang David Li
<davi...@google.com>:
On Fri, Aug 5, 2011 at 12:32 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka <j...@suse.de> wrote:
Did you try using FDO with -Os? FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.
FDO with -Os still optimize for size, even in hot parts.
I don't think so. Or at least that would be a bug. Shouldn't 'hot'
BBs/functions
be optimized for speed even at -Os? Hm, I see predict.c indeed returns
always false for optimize_size :(
That is function level query. At the BB/EDGE level, the condition is refined:
Well we summarize function profile to:
1) hot
2) normal
3) executed once
4) unlikely
We summarize BB profile to:
1) maybe_hot
2) probably_cold (equivalent to !maybe_hot)
3) probably_never_executed
Except for executed once that is special thing for function fed by
discovery of main() and static ctors/dtors there is 1-1 correspondence
in between BB and function predicates. With profile feedback function
is hot if it contain BB that is maybe_hot (with feedback it is also
probably hot), it is normal if it contain BB that is
!probably_never_executed and unlikely if all BBs are
probably_never_executed. So with profile feedback the function profile
summaries are no more refined that BB ones.
Without profile feedback things are more messy and the names of BB
settings was more or less invented on what static profile estimate can
tell you. Lacking function level profile estimate, we generally
consider functions "normal" unless told otherwise in few special cases.
We also never autodetect probably_never_executed even though it would
make a lot of sense to do so for EH/paths to exit. As I mentioned, I
think we should start doing so.
Finally optimize_size comes into game that is independent of the
summaries above and it is why I added the optimize_XXX_for_size/speed
predicates. By default -Os imply optimize for size everything and
-O123 optimize for size everything that is maybe_hot (i.e. not quite
reliably proven otherwise).
In a way I like the current scheme since it is simple and extending it
should IMO have some good reason. We could refine -Os behaviour
without changing current predicates to optimize for speed in
a) functions declared as "hot" by user and BBs in them that are not
proved cold.
b) based on profile feedback - i.e. we could have two thresholds, BBs
with very arge counts wil be probably hot, BBs in between will be
maybe hot/normal and BBs with low counts will be cold.
This would probably motivate introduction of probably_hot predicate
that summarize the above.
If we want to refine things, we could also re-consider how we want to
behave to BBs with 0 coverage. I.e. if we want to
a) consider them "normal" and let the presence of -Os/-O123 to
decide whether they are size/speed optimized,
b) consider them "cold" since they are not executed at all,
c) consider them "cold" in functions that are otherwise covered by
the test run and "normal" in case the function is not covered at all
(i.e. training X server on particular set of hardware may not convince
GCC to optimize for size all the other drivers not covered by the
train run).
We currently implement B and it sort of work well since users usually
train for what matters for them and are happy to see binaries smaller.
What I don't like about the a&c is bit of inconsistency with small
counts. I.e. count 1 will imply optimizing for size, but roundoff
error to 0 will cause it to be optimized for speed that is weird.
Of course also flipping the default here would cause significant grown
of FDO binaries and users are already unhappy that FDO binaries are
too large.
Honza