https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048

--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Matthias Kretz (Vir) from comment #15)
> So it seems that if at least one of the vector builtins involved in the
> expression is 512 bits GCC needs to locally increase prefer-vector-width to
> 512? Or, more generally:
> 
> prefer-vector-width = max(prefer-vector-width, 8 * sizeof(operands)..., 8 *
> sizeof(return-value))
> 
> The reason to default to 256 bits is to avoid zmm register usage altogether
> (clock-down). But if the surrounding code already uses zmm registers that
> motivation is moot.
> 
> Also, I think this shouldn't be considered auto-vectorization but rather
> pattern recognition (recognizing a __builtin_convertvector).

The related question is "should GCC set prefer-vector-width=512" when 512-bit
intrinsics is used. There may be a situation where users don't want compiler to
generate zmm except for those 512-bit intrinsics in their program, i.e the hot
loop is written with 512-bit intrinsics for performance purpose, but for other
places, better no zmm usage.

Reply via email to