https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125876
--- Comment #6 from Ashwin Godbole <Ashwin.Godbole at amd dot com> --- Comment on attachment 64770 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64770 Fix (patchfile) >From 0dffb4fe83e6dc4b9b17690aea35fc8134328cfc Mon Sep 17 00:00:00 2001 >From: Ashwin Godbole <[email protected]> >Date: Mon, 8 Jun 2026 06:31:32 -0700 >Subject: [PATCH] i386: Allow register source for 256/512-bit vmovddup > >The avx512f_movddup512 and avx_movddup256 patterns only offered a memory >source alternative for operand 1. As a result, a value already in a >vector register was spilled to the stack and reloaded through the memory >form of vmovddup, instead of using the existing register form >"vmovddup %zmm, %zmm" / "vmovddup %ymm, %ymm". This added a gratuitous >store/reload and stack realignment for the common _mm512_movedup_pd / >_mm256_movedup_pd and unpacklo(x, x) idioms. > >Relax the operand 1 constraint to "vm", mirroring the sibling >avx512f_unpcklpd512 / avx_unpcklpd256 patterns, so the register form is >used when the source is already in a register. The 128-bit case >(vec_dupv2df) already provides register alternatives and is unchanged. > >gcc/ChangeLog: > > * config/i386/sse.md (avx512f_movddup512<mask_name>): Allow a > register source for operand 1 by using the "vm" constraint. > (avx_movddup256<mask_name>): Likewise. > >gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512f-vmovddup-3.c: New test. > >Co-authored-by: Sarvesh Chandra <[email protected]> >Signed-off-by: Ashwin Godbole <[email protected]> >--- > gcc/config/i386/sse.md | 4 ++-- > .../gcc.target/i386/avx512f-vmovddup-3.c | 19 +++++++++++++++++++ > 2 files changed, 21 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c > >diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >index 6ca3e34b2c7..1f698642826 100644 >--- a/gcc/config/i386/sse.md >+++ b/gcc/config/i386/sse.md >@@ -13813,7 +13813,7 @@ > [(set (match_operand:V8DF 0 "register_operand" "=v") > (vec_select:V8DF > (vec_concat:V16DF >- (match_operand:V8DF 1 "nonimmediate_operand" "m") >+ (match_operand:V8DF 1 "nonimmediate_operand" "vm") > (match_dup 1)) > (parallel [(const_int 0) (const_int 8) > (const_int 2) (const_int 10) >@@ -13846,7 +13846,7 @@ > [(set (match_operand:V4DF 0 "register_operand" "=v") > (vec_select:V4DF > (vec_concat:V8DF >- (match_operand:V4DF 1 "nonimmediate_operand" "m") >+ (match_operand:V4DF 1 "nonimmediate_operand" "vm") > (match_dup 1)) > (parallel [(const_int 0) (const_int 4) > (const_int 2) (const_int 6)])))] >diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c >b/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c >new file mode 100644 >index 00000000000..5d7aceab5d6 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovddup-3.c >@@ -0,0 +1,19 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -mavx512f -mavx512vl" } */ >+/* { dg-final { scan-assembler "vmovddup\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+" >} } */ >+/* { dg-final { scan-assembler "vmovddup\[ \\t\]+%ymm\[0-9\]+, %ymm\[0-9\]+" >} } */ >+/* { dg-final { scan-assembler-not "vmovddup\[^\\n\]*\\(%\[er\]sp\\)" } } */ >+ >+#include <immintrin.h> >+ >+__m512d >+f512 (__m512d x) >+{ >+ return _mm512_movedup_pd (x); >+} >+ >+__m256d >+f256 (__m256d x) >+{ >+ return _mm256_movedup_pd (x); >+} >-- >2.43.5 >
