> Honza, > > Main motivation for this was profiling programs that contain specific > > code paths for different CPUs (such as graphics library in Firefox or Linux > > kernel). In the situation training machine differs from the machine > > program is run later, we end up optimizing for size all code paths > > except ones taken by the specific CPU. This patch essentially tells gcc > > to consider every non-trained function as built without profile > > feedback. > Make sense. > > > > For Firefox it had important impact on graphics rendering tests back > > then since the building machined had AVX while the benchmarking did not. > > Some benchmarks improved several times which is not a surprise if you > > consider tight graphics rendering loop optimized for size versus > > vectorized one. > > That’s a lot of improvement. So, without -fprofile-partial-training, the PGO > hurt the performance for those cases?
Yes, to get code size improvements we assume that the non-trained part of code is cold and with -Os we are very aggressive to optimize for size. We now have two-level optimize_for size, so I think we could make this more fine grained this stage1. Honza > > > The patch has bad effect on code size which in turn > > impacts performance too, so I think it makes sense to use > > -fprofile-partial-training with bit of care (i.e. only one code where > > such scenarios are likely). > > Right. > > > > As for backporting, I do not have checkout of GCC 8 right now. It > > depends on profile infrastructure that was added in 2017 (so stage1 of > > GCC 8), so the patch may backport quite easilly. I am not 100% sure > > what shape the infrastrucure was in the first version, but I am quite > > convinced it had the necessary bits - it was able to make the difference > > between 0 profile count and missing profile feedback. > > This is good to know, I will try to back port to GCC8 and let them test to > see any good impact. > > Qing > > > > Honza > >> >