[PATCH v2] x86: Convert MMX integer loads from constant vector pool

H.J. Lu Wed, 16 Jul 2025 08:10:21 -0700

On Mon, Jul 14, 2025 at 11:03 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Tue, Jul 15, 2025 at 3:43 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
> > vector pool:
> >
> > (insn 6 2 7 2 (set (reg:V1SI 5 di)
> >         (mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32])) 
> > "pr1
> > 21062-2.c":10:3 2036 {*movv1si_internal}
> >      (expr_list:REG_EQUAL (const_vector:V1SI [
> >                 (const_int -1 [0xffffffffffffffff])
> >             ])
> >         (nil)))
> >
> > we can convert it to
> >
> > (insn 12 2 7 2 (set (reg:SI 5 di)
> >         (const_int -1 [0xffffffffffffffff])) "pr121062-2.c":10:3 100 
> > {*movsi_int
> > ernal}
> >      (nil))
> >
> > Co-Developed-by: H.J. Lu <hjl.to...@gmail.com>
> >
> > gcc/
> >
> > PR target/121062
> > * config/i386/i386.cc (ix86_convert_const_vector_to_integer):
> > Handle E_V1SImode and E_V1DImode.
> > * config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
> > (mmxinsnmode): Add V1DI and V1SI.
> > Add V_16_32_64 splitter for constant vector loads from constant
> > vector pool.
> > (V_16_32_64:*mov<mode>_imm): Replace lowpart_subreg with
> > adjust_address.
> >
> > gcc/testsuite/
> >
> > PR target/121062
> > * gcc.target/i386/pr121062-1.c: New test.
> > * gcc.target/i386/pr121062-2.c: Likewise.
> > * gcc.target/i386/pr121062-3a.c: Likewise.
> > * gcc.target/i386/pr121062-3b.c: Likewise.
> > * gcc.target/i386/pr121062-3c.c: Likewise.
> > * gcc.target/i386/pr121062-4.c: Likewise.
> > * gcc.target/i386/pr121062-5.c: Likewise.
> > * gcc.target/i386/pr121062-6.c: Likewise.
> > * gcc.target/i386/pr121062-7.c: Likewise.
> >
> > OK for master?
>
> OK, with some code movements, as mentioned below.
>
> Thanks,
> Uros.
>
> +(define_split
> +  [(set (match_operand:V_16_32_64 0 "general_reg_operand")
> +    (match_operand:V_16_32_64 1 "memory_operand"))]
> +  "reload_completed
> +   && SYMBOL_REF_P (XEXP (operands[1], 0))
> +   && CONSTANT_POOL_ADDRESS_P (XEXP (operands[1], 0))"
> +  [(set (match_dup 0) (match_dup 1))]
> ...
>
> Please put this new pattern after *movv2qi_internal as it also applies
> to V2QImode and ...


Fixed.

> @@ -417,10 +438,11 @@ (define_insn_and_split "*mov<mode>_imm"
>    "&& reload_completed"
>    [(set (match_dup 0) (match_dup 1))]
>
> ... put *mov<mode>_imm" just after the new splitter, to prevent
> shadowing of *movv2qi_internal.

Fixed.

> +  operands[0] = adjust_address (operands[0], <mmxinsnmode>mode, 0);
>    operands[1] = GEN_INT (val);
> -  operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode);
>
> FYI, subregs of memory operands should be avoided, we have plenty of
> helpers to change address mode or adjust address in other ways.

Thanks for the pointer.

Here is the fixed v2 patch.  I will check it in after testing.

Thanks.

-- 
H.J.

From a63a4d6444ee7761b57a824627dde01c42bd688c Mon Sep 17 00:00:00 2001
From: Uros Bizjak <ubiz...@gmail.com>
Date: Tue, 15 Jul 2025 05:05:10 +0800
Subject: [PATCH v2] x86: Convert MMX integer loads from constant vector pool

For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
vector pool:

(insn 6 2 7 2 (set (reg:V1SI 5 di)
        (mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32])) "pr121062-2.c":10:3 2036 {*movv1si_internal}
     (expr_list:REG_EQUAL (const_vector:V1SI [
                (const_int -1 [0xffffffffffffffff])
            ])
        (nil)))

we can convert it to

(insn 12 2 7 2 (set (reg:SI 5 di)
        (const_int -1 [0xffffffffffffffff])) "pr121062-2.c":10:3 100 {*movsi_internal}
     (nil))

Co-Developed-by: H.J. Lu <hjl.to...@gmail.com>

gcc/

	PR target/121062
	* config/i386/i386.cc (ix86_convert_const_vector_to_integer):
	Handle E_V1SImode and E_V1DImode.
	* config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
	(mmxinsnmode): Add V1DI and V1SI.
	Add V_16_32_64 splitter for constant vector loads from constant
	vector pool.
	(V_16_32_64:*mov<mode>_imm): Moved after V_16_32_64 splitter.
	Replace lowpart_subreg with adjust_address.

gcc/testsuite/

	PR target/121062
	* gcc.target/i386/pr121062-1.c: New test.
	* gcc.target/i386/pr121062-2.c: Likewise.
	* gcc.target/i386/pr121062-3a.c: Likewise.
	* gcc.target/i386/pr121062-3b.c: Likewise.
	* gcc.target/i386/pr121062-3c.c: Likewise.
	* gcc.target/i386/pr121062-4.c: Likewise.
	* gcc.target/i386/pr121062-5.c: Likewise.
	* gcc.target/i386/pr121062-6.c: Likewise.
	* gcc.target/i386/pr121062-7.c: Likewise.
---
 gcc/config/i386/i386.cc                     |  4 ++
 gcc/config/i386/mmx.md                      | 60 ++++++++++++++-------
 gcc/testsuite/gcc.target/i386/pr121062-1.c  | 34 ++++++++++++
 gcc/testsuite/gcc.target/i386/pr121062-2.c  | 14 +++++
 gcc/testsuite/gcc.target/i386/pr121062-3a.c | 23 ++++++++
 gcc/testsuite/gcc.target/i386/pr121062-3b.c |  6 +++
 gcc/testsuite/gcc.target/i386/pr121062-3c.c |  6 +++
 gcc/testsuite/gcc.target/i386/pr121062-4.c  | 14 +++++
 gcc/testsuite/gcc.target/i386/pr121062-5.c  | 13 +++++
 gcc/testsuite/gcc.target/i386/pr121062-6.c  | 13 +++++
 gcc/testsuite/gcc.target/i386/pr121062-7.c  | 13 +++++
 11 files changed, 181 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-7.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 313522b88e3..37db8a1d118 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16704,6 +16704,10 @@ ix86_convert_const_vector_to_integer (rtx op, machine_mode mode)
 	  val = wi::insert (val, wv, innermode_bits * i, innermode_bits);
 	}
       break;
+    case E_V1SImode:
+    case E_V1DImode:
+      op = CONST_VECTOR_ELT (op, 0);
+      return INTVAL (op);
     case E_V2HFmode:
     case E_V2BFmode:
     case E_V4HFmode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 29a8cb599a7..1f9799344b6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -81,12 +81,13 @@ (define_mode_iterator VI_16_32 [V4QI V2QI V2HI])
 ;; 4-byte and 2-byte QImode vector modes
 (define_mode_iterator VI1_16_32 [V4QI V2QI])
 
-;; All 2-byte, 4-byte and 8-byte vector modes with more than 1 element
+;; All 2-byte, 4-byte and 8-byte vector modes.
 (define_mode_iterator V_16_32_64
-   [V2QI V4QI V2HI V2HF
+   [V2QI V4QI V2HI V1SI V2HF V2BF
     (V8QI "TARGET_64BIT") (V4HI "TARGET_64BIT")
     (V4HF "TARGET_64BIT") (V4BF "TARGET_64BIT")
-    (V2SI "TARGET_64BIT") (V2SF "TARGET_64BIT")])
+    (V2SI "TARGET_64BIT") (V2SF "TARGET_64BIT")
+    (V1DI "TARGET_64BIT")])
 
 ;; V2S* modes
 (define_mode_iterator V2FI [V2SF V2SI])
@@ -107,6 +108,7 @@ (define_mode_attr mmxinsnmode
   [(V8QI "DI") (V4QI "SI") (V2QI "HI")
    (V4HI "DI") (V2HI "SI")
    (V2SI "DI")
+   (V1DI "DI") (V1SI "SI")
    (V4HF "DI") (V2HF "SI")
    (V4BF "DI") (V2BF "SI")
    (V2SF "DI")])
@@ -407,22 +409,6 @@ (define_insn "*mov<mode>_internal"
 	   ]
 	   (symbol_ref "true")))])
 
-;; 16-bit, 32-bit and 64-bit constant vector stores.  After reload,
-;; convert them to immediate integer stores.
-(define_insn_and_split "*mov<mode>_imm"
-  [(set (match_operand:V_16_32_64 0 "memory_operand" "=m")
-	(match_operand:V_16_32_64 1 "x86_64_const_vector_operand" "i"))]
-  ""
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0) (match_dup 1))]
-{
-  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1],
-							    <MODE>mode);
-  operands[1] = GEN_INT (val);
-  operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode);
-})
-
 ;; For TARGET_64BIT we always round up to 8 bytes.
 (define_insn "*push<mode>2_rex64"
   [(set (match_operand:V_32 0 "push_operand" "=X,X")
@@ -588,6 +574,42 @@ (define_insn "*movv2qi_internal"
 	   ]
 	   (symbol_ref "true")))])
 
+(define_split
+  [(set (match_operand:V_16_32_64 0 "general_reg_operand")
+	(match_operand:V_16_32_64 1 "memory_operand"))]
+  "reload_completed
+   && SYMBOL_REF_P (XEXP (operands[1], 0))
+   && CONSTANT_POOL_ADDRESS_P (XEXP (operands[1], 0))"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  rtx op1 = avoid_constant_pool_reference (operands[1]);
+
+  if (!CONST_VECTOR_P (op1))
+    FAIL;
+
+  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (op1, <MODE>mode);
+
+  operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode);
+  operands[1] = GEN_INT (val);
+})
+
+;; 16-bit, 32-bit and 64-bit constant vector stores.  After reload,
+;; convert them to immediate integer stores.
+(define_insn_and_split "*mov<mode>_imm"
+  [(set (match_operand:V_16_32_64 0 "memory_operand" "=m")
+	(match_operand:V_16_32_64 1 "x86_64_const_vector_operand" "i"))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  rtx op1 = operands[1];
+  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (op1, <MODE>mode);
+
+  operands[0] = adjust_address (operands[0], <mmxinsnmode>mode, 0);
+  operands[1] = GEN_INT (val);
+})
+
 ;; We always round up to UNITS_PER_WORD bytes.
 (define_insn "*pushv2qi2"
   [(set (match_operand:V2QI 0 "push_operand" "=X,X")
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-1.c b/gcc/testsuite/gcc.target/i386/pr121062-1.c
new file mode 100644
index 00000000000..799f8562c9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3" } */
+
+extern union {
+  int i;
+  float f;
+} int_as_float_u;
+
+extern int render_result_from_bake_w;
+extern int render_result_from_bake_h_seed_pass;
+extern float *render_result_from_bake_h_primitive;
+extern float *render_result_from_bake_h_seed;
+
+float
+int_as_float(int i)
+{
+  int_as_float_u.i = i;
+  return int_as_float_u.f;
+}
+
+void
+render_result_from_bake_h(int tx)
+{
+  while (render_result_from_bake_w) {
+    for (; tx < render_result_from_bake_w; tx++)
+      render_result_from_bake_h_primitive[1] =
+          render_result_from_bake_h_primitive[2] = int_as_float(-1);
+    if (render_result_from_bake_h_seed_pass) {
+      *render_result_from_bake_h_seed = 0;
+    }
+  }
+}
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1, %r\[a-z0-9\]+" 2 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-2.c b/gcc/testsuite/gcc.target/i386/pr121062-2.c
new file mode 100644
index 00000000000..723d68a4003
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+extern void bar (S);
+
+void
+foo ()
+{
+  bar ((S){-1});
+}
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1, \\(%esp\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$-1, %edi" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-3a.c b/gcc/testsuite/gcc.target/i386/pr121062-3a.c
new file mode 100644
index 00000000000..effd4ff5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-3a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -march=x86-64 -fpic" } */
+
+typedef struct {
+  struct {
+    unsigned short lo4;
+    unsigned short lo3;
+    unsigned short lo2;
+    unsigned short lo1;
+  } i;
+} BID_BINARY80LDOUBLE;
+extern BID_BINARY80LDOUBLE __bid64_to_binary80_x_out;
+void
+__bid64_to_binary80 (void)
+{
+  __bid64_to_binary80_x_out.i.lo4
+    = __bid64_to_binary80_x_out.i.lo3
+    = __bid64_to_binary80_x_out.i.lo2
+    = __bid64_to_binary80_x_out.i.lo1 = 65535;
+}
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+%xmm\[0-9\]+, " 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1, \\(%(e|r)\[a-z0-9\]+\\)" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-3b.c b/gcc/testsuite/gcc.target/i386/pr121062-3b.c
new file mode 100644
index 00000000000..eb89b5da091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-3b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fno-pic -mcmodel=large" } */
+
+#include "pr121062-3a.c"
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1, \\(%r\[a-z0-9\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-3c.c b/gcc/testsuite/gcc.target/i386/pr121062-3c.c
new file mode 100644
index 00000000000..4c07029c4f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-3c.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { fpic && lp64 } } } */
+/* { dg-options "-O2 -march=x86-64 -fpic -mcmodel=large" } */
+
+#include "pr121062-3a.c"
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+\\\$-1, \\(%r\[a-z0-9\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-4.c b/gcc/testsuite/gcc.target/i386/pr121062-4.c
new file mode 100644
index 00000000000..77a0c2e90bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+typedef long long int __attribute__((__vector_size__ (8))) S;
+
+void
+foo (S *c)
+{
+  *c = (S){0x12345678badbeefULL};
+}
+
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+%xmm\[0-9\]+, " 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movabsq\[ \\t\]+\\\$81985529250168559, %r\[a-z0-9\]+" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-5.c b/gcc/testsuite/gcc.target/i386/pr121062-5.c
new file mode 100644
index 00000000000..22c09a6bfec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+typedef int __attribute__((__vector_size__ (4))) S;
+
+void
+foo (S *c)
+{
+  *c = (S){0x12345678};
+}
+
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$305419896, \\(%(e|r)\[a-z0-9\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-6.c b/gcc/testsuite/gcc.target/i386/pr121062-6.c
new file mode 100644
index 00000000000..780b496b504
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-6.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-dce -mtune=generic" } */
+
+typedef int __attribute__((__vector_size__ (8))) S;
+
+void
+foo (S *c)
+{
+  *c = (S){0x12345678,0xbadbeefULL};
+}
+
+/* { dg-final { scan-assembler-times "movq\[ \\t\]+%xmm\[0-9\]+, " 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movabsq\[ \\t\]+\\\$841538639400031864, %r\[a-z0-9\]+" 1 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr121062-7.c b/gcc/testsuite/gcc.target/i386/pr121062-7.c
new file mode 100644
index 00000000000..f1834f8e173
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121062-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+typedef __bf16 __attribute__((__vector_size__ (4))) S;
+
+void
+foo (S *c)
+{
+  *c = (S){-0.1, 2.1};
+}
+
+
+/* { dg-final { scan-assembler-times "movl\[ \\t\]+\\\$1074183629, \\(%(e|r)\[a-z0-9\]+\\)" 1 } } */
-- 
2.50.1

[PATCH v2] x86: Convert MMX integer loads from constant vector pool

Reply via email to