https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
Bug ID: 94364 Summary: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux SPEC 2017 INTrate benchmark 505.mcf_r, when compiled with options -Ofast -march=native -mtune=native, is 8% slower than when we also use option -mprefer-vector-width=128. I have observed it on both AMD Zen2 and Intel Cascade Lake Server CPUs (using master revision 26b3e568a60). Better vector width selection would therefore bring about noticeable speed-up. Symbol profiles (collected on AMD Rome): -Ofast -march=native -mtune=native: Overhead Samples Shared Object Symbol ........ ............ ............... ................................ 28.64% 462302 mcf_r_peak.mine spec_qsort 21.58% 348703 mcf_r_peak.mine cost_compare 15.81% 255029 mcf_r_peak.mine primal_bea_mpp 15.58% 251176 mcf_r_peak.mine replace_weaker_arc 7.37% 118646 mcf_r_peak.mine arc_compare 6.53% 105337 mcf_r_peak.mine price_out_impl 1.38% 22276 mcf_r_peak.mine update_tree -Ofast -march=native -mtune=native -mprefer-vector-width=128: Overhead Samples Shared Object Symbol ........ ............ ............... ................................ 23.57% 354536 mcf_r_peak.mine spec_qsort 23.51% 353767 mcf_r_peak.mine cost_compare 16.98% 255104 mcf_r_peak.mine primal_bea_mpp 16.65% 249891 mcf_r_peak.mine replace_weaker_arc 7.29% 109267 mcf_r_peak.mine arc_compare 7.09% 106380 mcf_r_peak.mine price_out_impl 1.53% 22968 mcf_r_peak.mine update_tree Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)