> Honza,
> > Main motivation for this was profiling programs that contain specific
> > code paths for different CPUs (such as graphics library in Firefox or Linux
> > kernel). In the situation training machine differs from the machine
> > program is run later, we end up optimizing for size all code paths
> > except ones taken by the specific CPU.  This patch essentially tells gcc
> > to consider every non-trained function as built without profile
> > feedback.
> Make sense.
> > 
> > For Firefox it had important impact on graphics rendering tests back
> > then since the building machined had AVX while the benchmarking did not.
> > Some benchmarks improved several times which is not a surprise if you
> > consider tight graphics rendering loop optimized for size versus
> > vectorized one.  
> 
> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO 
> hurt the performance for those cases? 

Yes, to get code size improvements we assume that the non-trained part
of code is cold and with -Os we are very aggressive to optimize for
size.  We now have two-level optimize_for size, so I think we could
make this more fine grained this stage1.

Honza
> 
> > The patch has bad effect on code size which in turn
> > impacts performance too, so I think it makes sense to use
> > -fprofile-partial-training with bit of care (i.e. only one code where
> > such scenarios are likely).
> 
> Right. 
> > 
> > As for backporting, I do not have checkout of GCC 8 right now. It
> > depends on profile infrastructure that was added in 2017 (so stage1 of
> > GCC 8), so the patch may backport quite easilly.  I am not 100% sure
> > what shape the infrastrucure was in the first version, but I am quite
> > convinced it had the necessary bits - it was able to make the difference
> > between 0 profile count and missing profile feedback.
> 
> This is good to know, I will try to back port to GCC8 and let them test to 
> see any good impact.
> 
> Qing
> > 
> > Honza
> >> 
> 

Reply via email to