Re: [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]

2024-01-25 Thread Richard Sandiford
Andrew Pinski  writes:
> The split for movv8di is not ready to handle the case where the setting
> register overlaps with the address of the memory that is being load.
> Fixing the split than just making the output constraint as an early clobber
> for this alternative. The split would first need to figure out which register
> is overlapping with the address and then only emit that move last.

I was curious how strained that detection would be in practice, and in
the end it didn't seem too bad.  I pushed the following variant after
testing on aarch64-linux-gnu.

Thanks,
Richard


The LS64 movv8di pattern didn't handle loads that overlapped with
the address register (unless the overlap happened to be in the
last subload).

gcc/
PR target/113550
* config/aarch64/aarch64-simd.md: In the movv8di splitter, check
whether each split instruction is a load that clobbers the source
address.  Emit that instruction last if so.

gcc/testsuite/
PR target/113550
* gcc.target/aarch64/pr113550.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md  | 18 ++--
 gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +
 2 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 48f0741e7d0..f036f6ce997 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -8221,11 +8221,21 @@ (define_split
   || (memory_operand (operands[0], V8DImode)
   && register_operand (operands[1], V8DImode)))
 {
+  std::pair last_pair = {};
   for (int offset = 0; offset < 64; offset += 16)
-   emit_move_insn (simplify_gen_subreg (TImode, operands[0],
-V8DImode, offset),
-   simplify_gen_subreg (TImode, operands[1],
-V8DImode, offset));
+{
+ std::pair pair = {
+   simplify_gen_subreg (TImode, operands[0], V8DImode, offset),
+   simplify_gen_subreg (TImode, operands[1], V8DImode, offset)
+ };
+ if (register_operand (pair.first, TImode)
+ && reg_overlap_mentioned_p (pair.first, pair.second))
+   last_pair = pair;
+ else
+   emit_move_insn (pair.first, pair.second);
+}
+  if (last_pair.first)
+   emit_move_insn (last_pair.first, last_pair.second);
   DONE;
 }
   else
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c 
b/gcc/testsuite/gcc.target/aarch64/pr113550.c
new file mode 100644
index 000..0ff3c7b5c00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c
@@ -0,0 +1,48 @@
+/* { dg-options "-O" } */
+/* { dg-do run } */
+
+#pragma GCC push_options
+#pragma GCC target "+ls64"
+#pragma GCC aarch64 "arm_acle.h"
+#pragma GCC pop_options
+
+#define DEF_FUNCTION(NAME, ARGS)   \
+  __attribute__((noipa))   \
+  __arm_data512_t  \
+  NAME ARGS\
+  {\
+return *ptr;   \
+  }
+
+DEF_FUNCTION (f0, (__arm_data512_t *ptr))
+DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr))
+DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr))
+DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr))
+DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr))
+DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4,
+  __arm_data512_t *ptr))
+DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5,
+  __arm_data512_t *ptr))
+DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6,
+  __arm_data512_t *ptr))
+
+int
+main (void)
+{
+  __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 };
+  __arm_data512_t res[8] =
+  {
+f0 (),
+f1 (0, ),
+f2 (0, 1, ),
+f3 (0, 1, 2, ),
+f4 (0, 1, 2, 3, ),
+f5 (0, 1, 2, 3, 4, ),
+f6 (0, 1, 2, 3, 4, 5, ),
+f7 (0, 1, 2, 3, 4, 5, 6, )
+  };
+  for (int i = 0; i < 8; ++i)
+if (__builtin_memcmp (, [i], sizeof (x)) != 0)
+  __builtin_abort ();
+  return 0;
+}
-- 
2.25.1



[PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]

2024-01-23 Thread Andrew Pinski
The split for movv8di is not ready to handle the case where the setting
register overlaps with the address of the memory that is being load.
Fixing the split than just making the output constraint as an early clobber
for this alternative. The split would first need to figure out which register
is overlapping with the address and then only emit that move last.

Build and tested for aarch64-linux-gnu with no regressions

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_movv8di): Mark the last
alternative's output constraint as an early clobber.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 662ef696630..ba079298b84 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7985,7 +7985,7 @@ (define_insn "*aarch64_mov"
 )
 
 (define_insn "*aarch64_movv8di"
-  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
+  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,")
(match_operand:V8DI 1 "general_operand" " r,r,m"))]
   "(register_operand (operands[0], V8DImode)
 || register_operand (operands[1], V8DImode))"
-- 
2.39.3