https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Matthias Kretz (Vir) from comment #15) > So it seems that if at least one of the vector builtins involved in the > expression is 512 bits GCC needs to locally increase prefer-vector-width to > 512? Or, more generally: > > prefer-vector-width = max(prefer-vector-width, 8 * sizeof(operands)..., 8 * > sizeof(return-value)) > > The reason to default to 256 bits is to avoid zmm register usage altogether > (clock-down). But if the surrounding code already uses zmm registers that > motivation is moot. > > Also, I think this shouldn't be considered auto-vectorization but rather > pattern recognition (recognizing a __builtin_convertvector). The related question is "should GCC set prefer-vector-width=512" when 512-bit intrinsics is used. There may be a situation where users don't want compiler to generate zmm except for those 512-bit intrinsics in their program, i.e the hot loop is written with 512-bit intrinsics for performance purpose, but for other places, better no zmm usage.