> In addition to working with you on the issues of profile being lost with > LTO, cloning and other cases, my plan is to > 1) finish the VPT reorganization > 2) make AFD reader to scale up the profile since at least in data from > SPEC or profiledbootstrap the counters are quite small integers which > makes furhter scaling to produce 0s that breaks various heuristics. > 3) implement local profiles with global AFDO 0 counnt so we get > hot/cold functions identified correctly again
I pushed these changes and debugged the reason why inlining does not happen early - it is caused by inliner losing discriminator info for which I sent a patch. Once it gets into mainline I will drop the second early inliner from afdo pass since it doesn't do any additional inlining then. So I think most of basic plumbing to bring auto-fdo to current profiling infrastructure is in place, but we will need to debug performance regresisons. I still do not get better SPEC scores with autofdo than without but at least they are not fairly close. Honza > 4) see how much the afdo propagation can be improved. There are quite > obvious limitations in current code. It is also slow since instead of > worklist it does iteration > > Hopefully after this stage the afdo will +- work and we can look into > performance issues... > > https://lnt.opensuse.org/db_default/v4/SPEC/67738?compare_to=67761 > compares afdo -Ofast -flto to -Ofast -flto with no feedback > https://lnt.opensuse.org/db_default/v4/SPEC/67738?compare_to=67753 > compares afdo -Ofast -flto to real FDO -Ofast -flto > > So last runs are closer to having no feedback. Many regresions are gone > but still there are some serious to look at: > > SPEC/SPEC2017/INT/520.omnetpp_r 20.62% > SPEC/SPEC2017/FP/549.fotonik3d_r 19.16% > SPEC/SPEC2017/FP/527.cam4_r 14.31% > SPEC/SPEC2017/FP/510.parest_r 14.19% > SPEC/SPEC2017/INT/500.perlbench_r 13.01% > SPEC/SPEC2017/FP/511.povray_r 12.68% > SPEC/SPEC2017/FP/503.bwaves_r 7.81% > SPEC/SPEC2017/INT/505.mcf_r 7.29% > SPEC/SPEC2017/FP/507.cactuBSSN_r 6.69% > SPEC/SPEC2017/INT/502.gcc_r 6.15% > > I think we will want to improve the profiling setup by running > train tasks multiple times since we gather too little data. In my > benchmarks I simply use ref runs as train runs which solves some of the > regressions seen above (omnetpp and perlbench works well for me). > > Honza