17 Regression] x86: register-source vmovddup spilled to the stack instead of using the register form

Ashwin.Godbole at amd dot com via Gcc-bugs Thu, 18 Jun 2026 04:32:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125876


--- Comment #6 from Ashwin Godbole <Ashwin.Godbole at amd dot com> ---
Comment on attachment 64770
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64770
Fix (patchfile)

>From 0dffb4fe83e6dc4b9b17690aea35fc8134328cfc Mon Sep 17 00:00:00 2001
>From: Ashwin Godbole <[email protected]>
>Date: Mon, 8 Jun 2026 06:31:32 -0700
>Subject: [PATCH] i386: Allow register source for 256/512-bit vmovddup
>
>The avx512f_movddup512 and avx_movddup256 patterns only offered a memory
>source alternative for operand 1.  As a result, a value already in a
>vector register was spilled to the stack and reloaded through the memory
>form of vmovddup, instead of using the existing register form
>"vmovddup %zmm, %zmm" / "vmovddup %ymm, %ymm".  This added a gratuitous
>store/reload and stack realignment for the common _mm512_movedup_pd /
>_mm256_movedup_pd and unpacklo(x, x) idioms.
>
>Relax the operand 1 constraint to "vm", mirroring the sibling
>avx512f_unpcklpd512 / avx_unpcklpd256 patterns, so the register form is
>used when the source is already in a register.  The 128-bit case
>(vec_dupv2df) already provides register alternatives and is unchanged.
>
>gcc/ChangeLog:
>
>       * config/i386/sse.md (avx512f_movddup512<mask_name>): Allow a
>       register source for operand 1 by using the "vm" constraint.
>       (avx_movddup256<mask_name>): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>       * gcc.target/i386/avx512f-vmovddup-3.c: New test.
>
>Co-authored-by: Sarvesh Chandra <[email protected]>
>Signed-off-by: Ashwin Godbole <[email protected]>
>---
> gcc/config/i386/sse.md                        |  4 ++--
> .../gcc.target/i386/avx512f-vmovddup-3.c      | 19 +++++++++++++++++++
> 2 files changed, 21 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c
>
>diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>index 6ca3e34b2c7..1f698642826 100644
>--- a/gcc/config/i386/sse.md
>+++ b/gcc/config/i386/sse.md
>@@ -13813,7 +13813,7 @@
>   [(set (match_operand:V8DF 0 "register_operand" "=v")
>       (vec_select:V8DF
>         (vec_concat:V16DF
>-          (match_operand:V8DF 1 "nonimmediate_operand" "m")
>+          (match_operand:V8DF 1 "nonimmediate_operand" "vm")
>           (match_dup 1))
>         (parallel [(const_int 0) (const_int 8)
>                    (const_int 2) (const_int 10)
>@@ -13846,7 +13846,7 @@
>   [(set (match_operand:V4DF 0 "register_operand" "=v")
>       (vec_select:V4DF
>         (vec_concat:V8DF
>-          (match_operand:V4DF 1 "nonimmediate_operand" "m")
>+          (match_operand:V4DF 1 "nonimmediate_operand" "vm")
>           (match_dup 1))
>         (parallel [(const_int 0) (const_int 4)
>                    (const_int 2) (const_int 6)])))]
>diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c 
>b/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c
>new file mode 100644
>index 00000000000..5d7aceab5d6
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c
>@@ -0,0 +1,19 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
>+/* { dg-final { scan-assembler "vmovddup\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+" 
>} } */
>+/* { dg-final { scan-assembler "vmovddup\[ \\t\]+%ymm\[0-9\]+, %ymm\[0-9\]+" 
>} } */
>+/* { dg-final { scan-assembler-not "vmovddup\[^\\n\]*\\(%\[er\]sp\\)" } } */
>+
>+#include <immintrin.h>
>+
>+__m512d
>+f512 (__m512d x)
>+{
>+  return _mm512_movedup_pd (x);
>+}
>+
>+__m256d
>+f256 (__m256d x)
>+{
>+  return _mm256_movedup_pd (x);
>+}
>-- 
>2.43.5
>

[Bug rtl-optimization/125876] [13/14/15/16/17 Regression] x86: register-source vmovddup spilled to the stack instead of using the register form

Reply via email to