Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on
double precision floating values - is more appropriate to use here, and
it can also result in shorter insn encodings when source is memory or
%xmm0...%xmm7, and no masking is applied (in allowing a 2-byte VEX
prefix then instead of a 3-byte one).

gcc/

        * config/i386/sse.md (<avx512>_vec_dup<mode><mask_name>): Use
        vmovddup.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25724,9 +25724,9 @@
   "TARGET_AVX512F"
 {
   /*  There is no DF broadcast (in AVX-512*) to 128b register.
-      Mimic it with integer variant.  */
+      Mimic it with vmovddup, just like vec_dupv2df<mask_name> does.  */
   if (<MODE>mode == V2DFmode)
-    return "vpbroadcastq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}";
+    return "vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}";
 
   return "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, 
%0<mask_operand2>|%0<mask_operand2>, %<iptr>1}";
 }

Reply via email to