On Fri, Apr 18, 2014 at 2:16 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> >> What I've observed on power is that LTO alone reduces performance and
>> >> LTO+FDO is not significantly different than FDO alone.
>> > On SPEC2k6?
>> >
>> > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO 
>> > seems
>> > off-noise win on SPEC2k6
>> > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html
>> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html
>> >
>> > I do not see why PPC should be significantly more constrained by register
>> > pressure.
>> >
>> > I do not have head to head comparsion of FDO and FDO+LTO for SPEC
>> > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html
>> > shows noticeable drop in calculix and gamess.
>> > Martin profiled calculix and tracked it down to a loop that is not trained
>> > but hot in the reference run.  That makes it optimized for size.
>> >
>> > http://dromaeo.com/?id=219677,219672,219965,219877
>> > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO
>> > Here the benefits of LTO and FDO seems to add up nicely.
>> >>
>> >> I agree that an exact estimate of the register pressure would be a
>> >> difficult problem. I'm hoping that something that approximates potential
>> >> register pressure downstream will be sufficient to help inlining
>> >> decisions.
>> >
>> > Yep, register pressure and I-cache overhead estimates are used for inline
>> > decisions by some compilers.
>> >
>> > I am mostly concerned about the metric suffering from GIGO principe if we 
>> > mix
>> > together too many estimates that are somehwat wrong by their nature. This 
>> > is
>> > why I mostly tried to focus on size/time estimates and not add too many 
>> > other
>> > metrics. But perhaps it is a time to experiment wit these, since obviously 
>> > we
>> > pushed current infrastructure to mostly to its limits.
>> >
>>
>> I like the word GIGO here. Getting inline signals right  requires deep
>> analysis (including interprocedural analysis). Different signals/hints
>> may also come with different quality thus different weights.
>>
>> Another challenge is how to quantify cycle savings/overhead more
>> precisely. With that, we can abandon the threshold based scheme -- any
>> callsite with a net saving will be considered.
>
> Inline hints are intended to do this - at the moment we bump the limits up
> when we estimate big speedups for the inlining and with today patch and FDO
> we bypass the thresholds when we know from FDO that call matters.
>
> Concerning your other email, indeed we should consider heavy callees (in 
> Open64
> terminology) that consume a lot of time and do not skip the call sites.  Easy
> way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply
> multiplies the count and estimated time.  (We probably gouth to get rid of the
> time capping and use wider arithmetics too).

That's what we did in Google branches. We had two heuristics -- hot
caller and hot callee heuristics.

1) For the hot caller heuristic, other simple analysis is checked a)
global working set size; b)  callsite argument check -- very simple
check to guess if inlining this callsite would sharpen analysis

2) We had not tuned hot callee heuristic by doing more analysis --
simply turn in on using hotness does not make a noticable differences.
Other hints are needed.

David



>
> I wonder if that is not too local and if we should not try to estimate 
> cumulative time
> of the function and get more agressive on inlining over the whole path leading
> to hot code.
>
> Honza

Reply via email to