https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90364
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- 6.22% 80774 wrf_r_peak.pgo __module_mp_wsm5_MOD_nislfv_rain_plm 5.50% 71494 wrf_r_peak.pgo __module_mp_wsm5_MOD_wsm52d vs. 4.04% 49253 wrf_r_peak.std __module_mp_wsm5_MOD_wsm52d 3.93% 47888 wrf_r_peak.std __module_mp_wsm5_MOD_nislfv_rain_plm shows the biggest differences. The reason must still lie with how GCC considers loops hot or cold. I wonder whether if-conversion loop versioning properly handles profile or whether we consider loops cold afterwards. I notice the predicate degrades to !optimize_bb_for_size_p (loop->header). I guess dumping the result of optimize_loop[_nest]_for_speed_p in IL dumps along loop headers might show the differences.