Steve Ellcey <sell...@marvell.com> writes:
> Richard,
>
> I don't necessarily disagree with anything in your comments and long
> term I think that is the right direction, but I wonder if that level of
> change is appropriate for GCC Stage 4 which is where we are now.  Your
> changes would require fixes in shared code, whereas setting
> REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.

I guess it's weighing invasiveness in terms of lines of code/location of
code vs. invasiveness in terms of the effect the code has.  Changing
REG_ALLOC_ORDER affects all functinos that use significant numbers of
SIMD registers, which could have unexpected knock-on effects like
performance regressions or exposing latent bugs.  It would also make
the RA try V24 before V16 for callers of vector PCS functions,
even though they should prefer V16.

Keeping the change specific to callees that use the new vector PCS
feature seems more conservative from that point of view.

> I am not sure how long it would take me to implement something along
> the lines of what you are suggesting.

OK, in that case, I thought I'd give a go.

This patch adds a new target hook that says which registers a function
can use without saving them in the prologue and restoring them in the
epilogue.  By default this is call_used_reg_set.

The patch only makes IRA use the hook, and in that context it's just
an (important) optimisation.  But I think we need the same thing for
passes like regrename, where it'll be a correctness issue.  I'll follow
on with patches there if this one is OK.

Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
OK to install?

Richard


2019-03-12  Richard Sandiford  <richard.sandif...@arm.com>

gcc/
        PR target/89628
        * target.def (function_scratch_regs): New hook.
        * doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder.
        * doc/tm.texi: Regenerate.
        * targhooks.h (default_function_scratch_regs): Declare.
        * targhooks.c (default_function_scratch_regs): New function.
        * emit-rtl.h (rtl_data::scratch_regs): New member variable.
        * function.c (init_dummy_function_start): Initialize it.
        (init_function_start): Likewise.
        * ira-color.c (calculate_saved_nregs): Use it instead of
        call_used_reg_set.
        * config/aarch64/aarch64.c (aarch64_function_scratch_regs): New
        function.
        (TARGET_FUNCTION_SCRATCH_REGS): Redefine.

gcc/testsuite/
        PR target/89628
        * gcc.target/aarch64/pr89628.c: New test.

Index: gcc/target.def
===================================================================
--- gcc/target.def      2019-03-08 18:14:26.113008680 +0000
+++ gcc/target.def      2019-03-12 09:34:25.447913041 +0000
@@ -5763,6 +5763,18 @@ The default version of this hook always
  default_hard_regno_scratch_ok)
 
 DEFHOOK
+(function_scratch_regs,
+ "This hook sets @var{regs} to the set of registers that function @var{fn}\n\
+can use without having to save them in the prologue and restore them in the\n\
+epilogue.\n\
+\n\
+By default this set is the same as @code{call_used_reg_set}.  Targets only\n\
+need to override the hook if some functions implement alternative ABIs that\n\
+save more registers or fewer registers than normal.",
+ void, (struct function *fn, HARD_REG_SET *regs),
+ default_function_scratch_regs)
+
+DEFHOOK
 (hard_regno_call_part_clobbered,
  "This hook should return true if @var{regno} is partly call-saved and\n\
 partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in  2019-03-08 18:14:25.577010718 +0000
+++ gcc/doc/tm.texi.in  2019-03-12 09:34:25.443913054 +0000
@@ -1705,6 +1705,11 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-used register
 @cindex call-clobbered register
 @cindex call-saved register
+@hook TARGET_FUNCTION_SCRATCH_REGS
+
+@cindex call-used register
+@cindex call-clobbered register
+@cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
 @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi     2019-03-08 18:14:25.573010734 +0000
+++ gcc/doc/tm.texi     2019-03-12 09:34:25.443913054 +0000
@@ -1894,6 +1894,19 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-used register
 @cindex call-clobbered register
 @cindex call-saved register
+@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function 
*@var{fn}, HARD_REG_SET *@var{regs})
+This hook sets @var{regs} to the set of registers that function @var{fn}
+can use without having to save them in the prologue and restore them in the
+epilogue.
+
+By default this set is the same as @code{call_used_reg_set}.  Targets only
+need to override the hook if some functions implement alternative ABIs that
+save more registers or fewer registers than normal.
+@end deftypefn
+
+@cindex call-used register
+@cindex call-clobbered register
+@cindex call-saved register
 @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn 
*@var{insn}, unsigned int @var{regno}, machine_mode @var{mode})
 This hook should return true if @var{regno} is partly call-saved and
 partly call-clobbered, and if a value of mode @var{mode} would be partly
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h     2019-03-08 18:14:25.849009683 +0000
+++ gcc/targhooks.h     2019-03-12 09:34:25.447913041 +0000
@@ -286,5 +286,6 @@ extern bool speculation_safe_value_not_n
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 extern void default_remove_extra_call_preserved_regs (rtx_insn *,
                                                      HARD_REG_SET *);
+extern void default_function_scratch_regs (struct function *, HARD_REG_SET *);
 
 #endif /* GCC_TARGHOOKS_H */
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c     2019-03-08 18:14:25.845009699 +0000
+++ gcc/targhooks.c     2019-03-12 09:34:25.447913041 +0000
@@ -2379,4 +2379,10 @@ default_remove_extra_call_preserved_regs
 {
 }
 
+void
+default_function_scratch_regs (struct function *, HARD_REG_SET *regs)
+{
+  COPY_HARD_REG_SET (*regs, call_used_reg_set);
+}
+
 #include "gt-targhooks.h"
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h      2019-03-08 18:15:33.700751752 +0000
+++ gcc/emit-rtl.h      2019-03-12 09:34:25.443913054 +0000
@@ -292,6 +292,9 @@ struct GTY(()) rtl_data {
      sets them.  */
   HARD_REG_SET asm_clobbers;
 
+  /* Registers that the function can use without having to save them first.  */
+  HARD_REG_SET scratch_regs;
+
   /* The highest address seen during shorten_branches.  */
   int max_insn_address;
 };
Index: gcc/function.c
===================================================================
--- gcc/function.c      2019-03-08 18:15:33.660751905 +0000
+++ gcc/function.c      2019-03-12 09:34:25.447913041 +0000
@@ -4875,6 +4875,7 @@ init_dummy_function_start (void)
 {
   push_dummy_function (false);
   prepare_function_start ();
+  COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set);
 }
 
 /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree node)
@@ -4888,6 +4889,8 @@ init_function_start (tree subr)
   initialize_rtl ();
 
   prepare_function_start ();
+  targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr),
+                                &crtl->scratch_regs);
   decide_function_section (subr);
 
   /* Warn if this value is an aggregate type,
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c     2019-03-08 18:15:26.824777889 +0000
+++ gcc/ira-color.c     2019-03-12 09:34:25.447913041 +0000
@@ -1660,7 +1660,7 @@ calculate_saved_nregs (int hard_regno, m
   ira_assert (hard_regno >= 0);
   for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--)
     if (!allocated_hardreg_p[hard_regno + i]
-       && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i)
+       && !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i)
        && !LOCAL_REGNO (hard_regno + i))
       nregs++;
   return nregs;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c        2019-03-08 18:15:38.228734542 +0000
+++ gcc/config/aarch64/aarch64.c        2019-03-12 09:34:25.427913117 +0000
@@ -1706,6 +1706,19 @@ aarch64_simd_call_p (rtx_insn *insn)
   return aarch64_simd_decl_p (fndecl);
 }
 
+/* Implement TARGET_FUNCTION_SCRATCH_REGS.  Vector PCS functions have
+   fewer available scratch registers than normal functions do.  */
+
+static void
+aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs)
+{
+  default_function_scratch_regs (fn, scratch_regs);
+  if (aarch64_simd_decl_p (fn->decl))
+    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+      if (FP_SIMD_SAVED_REGNUM_P (regno))
+       CLEAR_HARD_REG_BIT (*scratch_regs, regno);
+}
+
 /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS.  If INSN calls
    a function that uses the SIMD ABI, take advantage of the extra
    call-preserved registers that the ABI provides.  */
@@ -19208,6 +19221,9 @@ #define TARGET_MODES_TIEABLE_P aarch64_m
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_FUNCTION_SCRATCH_REGS
+#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs
+
 #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
 #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
   aarch64_remove_extra_call_preserved_regs
Index: gcc/testsuite/gcc.target/aarch64/pr89628.c
===================================================================
--- /dev/null   2019-03-08 11:40:14.606883727 +0000
+++ gcc/testsuite/gcc.target/aarch64/pr89628.c  2019-03-12 09:34:25.447913041 
+0000
@@ -0,0 +1,25 @@
+/* { dg-options "-O2" } */
+
+typedef __Float32x4_t vec;
+
+__attribute__((aarch64_vector_pcs))
+vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7)
+{
+  vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3;
+  t0 = a0 - a7;
+  t1 = a1 - a6;
+  t2 = a2 - a5;
+  t3 = a3 - a4;
+  t4 = a4 - a3;
+  t5 = a5 - a2;
+  t6 = a6 - a1;
+  t7 = a7 - a0;
+  s0 = t0 * t1;
+  s1 = t2 * t3;
+  s2 = t4 * t5;
+  s3 = t6 * t7;
+  return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7;
+}
+
+/* { dg-final { scan-assembler-not '\tldr\t' } } */
+/* { dg-final { scan-assembler-not '\tstr\t' } } */

Reply via email to