Re: Improve static and AFDO profile combination

Jan Hubicka Sun, 22 Jun 2025 01:01:23 -0700

> In addition to working with you on the issues of profile being lost with
> LTO, cloning and other cases, my plan is to
>  1) finish the VPT reorganization
>  2) make AFD reader to scale up the profile since at least in data from
>  SPEC or profiledbootstrap the counters are quite small integers which
>  makes furhter scaling to produce 0s that breaks various heuristics.
>  3) implement local profiles with global AFDO 0 counnt so we get
>  hot/cold functions identified correctly again


I pushed these changes and debugged the reason why inlining does not
happen early - it is caused by inliner losing discriminator info for
which I sent a patch. Once it gets into mainline I will drop the 
second early inliner from afdo pass since it doesn't do any additional
inlining then.

So I think most of basic plumbing to bring auto-fdo to current profiling
infrastructure is in place, but we will need to debug performance
regresisons.  I still do not get better SPEC scores with autofdo than
without but at least they are not fairly close.

Honza
>  4) see how much the afdo propagation can be improved.  There are quite
>  obvious limitations in current code. It is also slow since instead of
>  worklist it does iteration
> 
> Hopefully after this stage the afdo will +- work and we can look into
> performance issues...
> 
> https://lnt.opensuse.org/db_default/v4/SPEC/67738?compare_to=67761
> compares afdo -Ofast -flto to -Ofast -flto with no feedback
> https://lnt.opensuse.org/db_default/v4/SPEC/67738?compare_to=67753
> compares afdo -Ofast -flto to real FDO -Ofast -flto
> 
> So last runs are closer to having no feedback.  Many regresions are gone
> but still there are some serious to look at:
> 
> SPEC/SPEC2017/INT/520.omnetpp_r       20.62%
> SPEC/SPEC2017/FP/549.fotonik3d_r      19.16%
> SPEC/SPEC2017/FP/527.cam4_r           14.31%
> SPEC/SPEC2017/FP/510.parest_r                 14.19%
> SPEC/SPEC2017/INT/500.perlbench_r     13.01%
> SPEC/SPEC2017/FP/511.povray_r                 12.68%
> SPEC/SPEC2017/FP/503.bwaves_r                 7.81% 
> SPEC/SPEC2017/INT/505.mcf_r           7.29% 
> SPEC/SPEC2017/FP/507.cactuBSSN_r      6.69% 
> SPEC/SPEC2017/INT/502.gcc_r           6.15% 
> 
> I think we will want to improve the profiling setup by running
> train tasks multiple times since we gather too little data.  In my
> benchmarks I simply use ref runs as train runs which solves some of the
> regressions seen above (omnetpp and perlbench works well for me).
> 
> Honza

Re: Improve static and AFDO profile combination

Reply via email to