https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473
Bug ID: 102473 Summary: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: crazylht at gmail dot com Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux All three x86_64 LNT machines have detected a 4.5-5.2% performance regression of SPEC FPrate 2017 benchmarks 521.wrf_r when compiled with -Ofast and the default (generic) march and mtune. Zen2 based machine regressed by 5%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=294.548.0 Zen1 based machine regressed by 5.2%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=35.548.0 Kabylake based machine regressed by 4.5%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=34.548.0 On an AMD zen2 based machine I have bisected the regression to commit r12-3426-g8f323c712ea76c: 8f323c712ea76cc4506b03895e9b991e4e4b2baf is the first bad commit commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf Author: liuhongt <hongtao....@intel.com> Date: Tue Sep 7 12:39:04 2021 +0800 Optimize v4sf reduction. gcc/ChangeLog: PR target/101059 * config/i386/sse.md (reduc_plus_scal_<mode>): Split to .. (reduc_plus_scal_v4sf): .. this, New define_expand. (reduc_plus_scal_v2df): .. and this, New define_expand. I have confirmed that the commit causes a similar regression on another Intel Skylake server. On the Zen2 machine, this is the difference in samples collected by perf for different symbols (before is commit 60eec23b5ed, after commit 8f323c712ea): | Symbol | sys lib | Before | After | diff | % | |---------------------------------------------+---------+--------+-------+-------+-------| | __logf_fma | yes | 68882 | 68940 | +58 | +0.08 | | __atanf | yes | 66664 | 66196 | -468 | -0.70 | | __module_advect_em_MOD_advect_scalar_pd | no | 62286 | 62348 | +62 | +0.10 | | __powf_fma | yes | 56213 | 56127 | -86 | -0.15 | | __module_mp_wsm5_MOD_nislfv_rain_plm | no | 46990 | 48340 | +1350 | +2.87 | | __module_mp_wsm5_MOD_wsm52d | no | 41031 | 40968 | -63 | -0.15 | | __module_small_step_em_MOD_advance_uv | no | 30908 | 30909 | +1 | +0.00 | | __module_small_step_em_MOD_advance_w | no | 28738 | 28600 | -138 | -0.48 | | __module_advect_em_MOD_advect_scalar | no | 28400 | 28429 | +29 | +0.10 | | __expf_fma | yes | 26702 | 26516 | -186 | -0.70 | | __module_big_step_utilities_em_MOD_phy_prep | no | 25878 | 25816 | -62 | -0.24 | | psim_unstable_ | no | 24994 | 25106 | +112 | +0.45 | | __module_bl_ysu_MOD_ysu2d | no | 24799 | 25251 | +452 | +1.82 | | psih_unstable_ | no | 22600 | 23139 | +539 | +2.38 | | __module_small_step_em_MOD_advance_mu_t | no | 22250 | 22232 | -18 | -0.08 | | __memset_avx2_unaligned_erms | yes | 21748 | 21613 | -135 | -0.62 | | _ZGVbN4vv_powf_sse4 | yes | 21206 | 21355 | +149 | +0.70 | Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)