Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on double precision floating values - is more appropriate to use here, and it can also result in shorter insn encodings when source is memory or %xmm0...%xmm7, and no masking is applied (in allowing a 2-byte VEX prefix then instead of a 3-byte one).
gcc/ * config/i386/sse.md (<avx512>_vec_dup<mode><mask_name>): Use vmovddup. --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -25724,9 +25724,9 @@ "TARGET_AVX512F" { /* There is no DF broadcast (in AVX-512*) to 128b register. - Mimic it with integer variant. */ + Mimic it with vmovddup, just like vec_dupv2df<mask_name> does. */ if (<MODE>mode == V2DFmode) - return "vpbroadcastq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"; + return "vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"; return "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}"; }