https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #2) > It seems such code generation is r254855's intention. > > /* Use 256-bit AVX instructions instead of 512-bit AVX > instructions > 4695 in the auto-vectorizer. */ > 4696 if (ix86_tune_features[X86_TUNE_AVX256_OPTIMAL] > 4697 && !(opts_set->x_ix86_target_flags & > OPTION_MASK_PREFER_AVX256)) > 4698 opts->x_ix86_target_flags |= OPTION_MASK_PREFER_AVX256; > > I know there is a frequency reduction issue when many zmm registers are > used, but i don't know what exact situation did r254855 deal with? But it should generate assemble like what g++ did which also use ymm instead of zmm.