On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold memcpy.
> > The default is
> >
> > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> >
> > For x86, it is MOVE_MAX to restore the old behavior before
> 
> I know we've discussed this to death in the PR, I just want to repeat here
> that the GIMPLE folding expects to generate a single load and a single
> store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> was chosen originally (it's documented to what a "single instruction" does).
> In practice MOVE_MAX does not seem to cover vector register sizes
> so Richard pulled MOVE_RATIO which is really intended to cover
> the case of using multiple instructions for moving memory (but then I
> don't remember whether for the ARM case the single load/store GIMPLE
> will be expanded to multiple load/store instructions).
> 
> TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> being very specific for memcpy folding (we also fold memmove btw).
> 
> There is also MOVE_MAX_PIECES which _might_ be more appropriate
> than MOVE_MAX here and still honor the idea of single instructions.
> Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> not MOVE_MAX * MOVE_RATIO.
> 
> So if we need a new hook then that hook should at least get the
> 'speed' argument of MOVE_RATIO and it should get a better name.
> 
> I still think that it should be possible to improve the insn check to
> avoid use of "disabled" modes, maybe that's also a point to add
> a new hook like .move_with_mode_p or so?  To quote, we do

Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.

> 
>               scalar_int_mode mode;
>               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
>                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
>                   && have_insn_for (SET, mode)
>                   /* If the destination pointer is not aligned we must be able
>                      to emit an unaligned store.  */
>                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
>                       || !targetm.slow_unaligned_access (mode, dest_align)
>                       || (optab_handler (movmisalign_optab, mode)
>                           != CODE_FOR_nothing)))
> 
> where I understand the ISA is enabled and if the user explicitely
> uses it that's OK but -mprefer-avx128 should tell GCC to never
> generate AVX256 code where the user was not explicitely using it
> (still for example glibc might happily use AVX256 code to implement
> the memcpy we are folding!)
> 
> Note the BB vectorizer also might end up with using AVX256 because
> in places it also relies on optab queries and the vector_mode_supported_p
> check (but the memcpy folding uses the fake integer modes).  So
> x86 might need to implement the related_mode hook to avoid "auto"-using
> a larger vector mode which the default implementation would happily do.
> 
> Richard.

OK for master?

Thanks.

H.J.
---
Add TARGET_MOVE_WITH_MODE_P to return true if move with mode can be
generated implicitly.  The default definition returns true.  The x86
version returns true if the mode size <= MOVE_MAX, which is the max
number of bytes we can move in one reasonably fast instruction.

gcc/

        PR target/103393
        * gimple-fold.cc (gimple_fold_builtin_memory_op): Call
        targetm.move_with_mode_p to check if move with mode can be
        generated implicitly.
        * target.def: Add move_with_mode_p.
        * targhooks.cc (default_move_with_mode_p): New.
        * targhooks.h (default_move_with_mode_p): Likewise.
        * config/i386/i386.cc (ix86_move_with_mode_p): New.
        (TARGET_MOVE_WITH_MODE_P): Likewise.
        * doc/tm.texi.in: Add TARGET_MOVE_WITH_MODE_P.
        * doc/tm.texi: Regenerate.

gcc/testsuite/

        PR target/103393
        * gcc.target/i386/pr103393-1.c: New test.
        * gcc.target/i386/pr103393-2.c: Likewise.
        * gcc.target/i386/pr103393-3.c: Likewise.
        * gcc.target/i386/pr103393-4.c: Likewise.
        * gcc.target/i386/pr103393-5.c: Likewise.
---
 gcc/config/i386/i386.cc                    | 12 ++++++++++++
 gcc/doc/tm.texi                            |  5 +++++
 gcc/doc/tm.texi.in                         |  2 ++
 gcc/gimple-fold.cc                         |  1 +
 gcc/target.def                             |  7 +++++++
 gcc/testsuite/gcc.target/i386/pr103393-1.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr103393-2.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr103393-3.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr103393-4.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr103393-5.c | 16 ++++++++++++++++
 10 files changed, 107 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-5.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b2bf90576d5..68a2c59220c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23918,6 +23918,15 @@ ix86_push_rounding (poly_int64 bytes)
   return ROUND_UP (bytes, UNITS_PER_WORD);
 }
 
+/* Implement the TARGET_MOVE_WITH_MODE_P hook.  Return true if move
+   with MODE can be generated implicitly.  */
+
+static bool
+ix86_move_with_mode_p (machine_mode mode)
+{
+  return GET_MODE_SIZE (mode) <= MOVE_MAX;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -24735,6 +24744,9 @@ static bool ix86_libc_has_fast_function (int fcode 
ATTRIBUTE_UNUSED)
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
 #endif /* #if CHECKING_P */
 
+#undef TARGET_MOVE_WITH_MODE_P
+#define TARGET_MOVE_WITH_MODE_P ix86_move_with_mode_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-i386.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 49864dd79f8..9d5058610e1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11924,6 +11924,11 @@ statement holding the function call.  Returns true if 
any change
 was made to the GIMPLE stream.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode @var{mode})
+This target hook returns true if move with mode @var{mode} can be
+generated implicitly.  The default definition returns true.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree 
@var{decl1}, tree @var{decl2})
 This hook is used to compare the target attributes in two functions to
 determine which function's features get higher priority.  This is used
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 95e5e341f07..e9158ab90c6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7924,6 +7924,8 @@ to by @var{ce_info}.
 
 @hook TARGET_GIMPLE_FOLD_BUILTIN
 
+@hook TARGET_MOVE_WITH_MODE_P
+
 @hook TARGET_COMPARE_VERSION_PRIORITY
 
 @hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c9179abb27e..93267eeabb2 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -1007,6 +1007,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
              if (int_mode_for_size (ilen * 8, 0).exists (&mode)
                  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
                  && have_insn_for (SET, mode)
+                 && targetm.move_with_mode_p (mode)
                  /* If the destination pointer is not aligned we must be able
                     to emit an unaligned store.  */
                  && (dest_align >= GET_MODE_ALIGNMENT (mode)
diff --git a/gcc/target.def b/gcc/target.def
index 72c2e1ef756..041d944cb38 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2489,6 +2489,13 @@ was made to the GIMPLE stream.",
  bool, (gimple_stmt_iterator *gsi),
  hook_bool_gsiptr_false)
 
+DEFHOOK
+(move_with_mode_p,
+ "This target hook returns true if move with mode @var{mode} can be\n\
+generated implicitly.  The default definition returns true.",
+ bool, (machine_mode mode),
+ hook_bool_mode_true)
+
 /* Target hook is used to compare the target attributes in two functions to
    determine which function's features get higher priority.  This is used
    during function multi-versioning to figure out the order in which two
diff --git a/gcc/testsuite/gcc.target/i386/pr103393-1.c 
b/gcc/testsuite/gcc.target/i386/pr103393-1.c
new file mode 100644
index 00000000000..2091d33c45f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103393-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx -mprefer-avx128 
-mprefer-vector-width=128" } */
+
+struct TestData {
+  float arr[8];
+};
+
+void
+cpy (struct TestData *s1, struct TestData *s2 )
+{
+  for(int i=0; i<8; ++i)
+    s1->arr[i] = s2->arr[i];
+}
+
+/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } 
} */
+/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103393-2.c 
b/gcc/testsuite/gcc.target/i386/pr103393-2.c
new file mode 100644
index 00000000000..4ad8c8bf379
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103393-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+struct TestData {
+  float arr[8];
+};
+
+void
+cpy (struct TestData *s1, struct TestData *s2 )
+{
+  for(int i=0; i<16; ++i)
+    s1->arr[i] = s2->arr[i];
+}
+
+/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } 
} */
+/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103393-3.c 
b/gcc/testsuite/gcc.target/i386/pr103393-3.c
new file mode 100644
index 00000000000..7a03160e512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103393-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=sapphirerapids" } */
+
+struct TestData {
+  float arr[8];
+};
+
+void
+cpy (struct TestData *s1, struct TestData *s2 )
+{
+  for(int i=0; i<16; ++i)
+    s1->arr[i] = s2->arr[i];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
+/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103393-4.c 
b/gcc/testsuite/gcc.target/i386/pr103393-4.c
new file mode 100644
index 00000000000..ae2599f6411
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103393-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512 -mstore-max=512" } */
+
+struct TestData {
+  float arr[8];
+};
+
+void
+cpy (struct TestData *s1, struct TestData *s2 )
+{
+  for(int i=0; i<16; ++i)
+    s1->arr[i] = s2->arr[i];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
+/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103393-5.c 
b/gcc/testsuite/gcc.target/i386/pr103393-5.c
new file mode 100644
index 00000000000..f9173684212
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103393-5.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512 -mmove-max=512" } */
+
+struct TestData {
+  float arr[8];
+};
+
+void
+cpy (struct TestData *s1, struct TestData *s2 )
+{
+  for(int i=0; i<16; ++i)
+    s1->arr[i] = s2->arr[i];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
+/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
-- 
2.35.1

Reply via email to