Re: [PATCH] Improve i?86/x86_64 prologue_and_epilogue for leaf functions (PR target/59501)

2017-07-07 Thread H.J. Lu
On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
> Hi!
>
> Honza recently changed the i?86 backend, so that it often doesn't
> do -maccumulate-outgoing-args by default on x86_64.
> Unfortunately, on some of the here included testcases this regressed
> quite a bit the generated code.  As AVX vectors are used, the dynamic
> realignment code needs to assume e.g. that some of them will need to be
> spilled, and for -mno-accumulate-outgoing-args the code needs to set
> need_drap early as well.  But in when emitting the prologue/epilogue,
> if need_drap is set, we don't perform the optimization for leaf functions
> which have zero size stack frame, thus we end up with uselessly doing
> dynamic stack realignment, setting up DRAP that nothing uses and later on
> restore everything back.
>
> This patch improves it, if the DRAP register isn't live at the start of
> entry bb successor and we aren't going to realign the stack, we don't
> need DRAP at all, and even if we need DRAP register, that can't be the sole
> reason for doing stack realignment, the prologue code is able to set up DRAP
> even without dynamic stack realignment.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-12-20  Jakub Jelinek  
>
> PR target/59501
> * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
> if !crtl->stack_realign_needed.
> (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
> and stack_realign_needed will be false, clear drap_reg and need_drap.
> Optimize leaf functions that don't need stack frame even if
> crtl->need_drap.
>
> * gcc.target/i386/pr59501-1.c: New test.
> * gcc.target/i386/pr59501-1a.c: New test.
> * gcc.target/i386/pr59501-2.c: New test.
> * gcc.target/i386/pr59501-2a.c: New test.
> * gcc.target/i386/pr59501-3.c: New test.
> * gcc.target/i386/pr59501-3a.c: New test.
> * gcc.target/i386/pr59501-4.c: New test.
> * gcc.target/i386/pr59501-4a.c: New test.
> * gcc.target/i386/pr59501-5.c: New test.
> * gcc.target/i386/pr59501-6.c: New test.
>
>
> --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
> 12:19:20.603212859 +0100
> +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 12:23:33.647881672 
> +0100
> @@ -0,0 +1,8 @@
> +/* PR target/59501 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
> +
> +#include "pr59501-3a.c"
> +
> +/* Verify no dynamic realignment is performed.  */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
>

Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
xfailed due to stack frame access via frame pointer instead of DARP.
This patch finds the maximum stack alignment from the stack frame access
instructions and avoids stack realignment if stack alignment needed is
less than incoming stack boundary.

I am testing this patch.  OK for trunk if there is no regression?


-- 
H.J.
From d9844cba6ce20498aab42a32927230cb7b56475d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 7 Jul 2017 06:01:16 -0700
Subject: [PATCH] i386: Avoid stack realignment if possible

Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
xfailed due to stack frame access via frame pointer instead of DARP.
This patch finds the maximum stack alignment from the stack frame access
instructions and avoids stack realignment if stack alignment needed is
less than incoming stack boundary.

gcc/

	PR target/59501
	* config/i386/i386.c (ix86_finalize_stack_realign_flags): Don't
	realign stack if stack alignment needed is less than incoming
	stack boundary.

gcc/testsuite/

	PR target/59501
	* gcc.target/i386/pr59501-4a.c: Remove xfail.
---
 gcc/config/i386/i386.c | 83 +++---
 gcc/testsuite/gcc.target/i386/pr59501-4a.c |  2 +-
 2 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b041524..0c61998 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14161,6 +14161,10 @@ ix86_finalize_stack_realign_flags (void)
   add_to_hard_reg_set (&set_up_by_prologue, Pmode, ARG_POINTER_REGNUM);
   add_to_hard_reg_set (&set_up_by_prologue, Pmode,
 			   HARD_FRAME_POINTER_REGNUM);
+
+  unsigned int stack_alignment = 0;
+  bool require_stack_frame = false;
+
   FOR_EACH_BB_FN (bb, cfun)
 {
   rtx_insn *insn;
@@ -14169,43 +14173,64 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 	   set_up_by_prologue))
 	  {
-		if (crtl->stack_realign_needed != stack_realign)
-		  recompute_frame_layout_p = true;
-		crtl->stack_realign_needed = stack_realign;
-		crtl->stack_realign_finalized = true;
-		if (recompute_frame_layout_p)
-		  ix86_compute_frame_layout ();
-		return;
+		require_stack_f

Re: [PATCH] Improve i?86/x86_64 prologue_and_epilogue for leaf functions (PR target/59501)

2013-12-22 Thread Jan Hubicka
> Hi!
> 
> Honza recently changed the i?86 backend, so that it often doesn't
> do -maccumulate-outgoing-args by default on x86_64.
> Unfortunately, on some of the here included testcases this regressed
> quite a bit the generated code.  As AVX vectors are used, the dynamic

Yep, the accidental change to disable register accumulation was apparently
on too long.
I still plan to switch accmulate-outoging-args on for generic - I benchmarked
it on AMD targets and it seems to be good, I sitll need to analyze the code
size implication though. At some testers we seems to have regressed in 32bit
that may be related to fact that frame pointer enabled codegen is more compact
and Vladimir patch, while restoring performance, led to code size issues.
> realignment code needs to assume e.g. that some of them will need to be
> spilled, and for -mno-accumulate-outgoing-args the code needs to set
> need_drap early as well.  But in when emitting the prologue/epilogue,
> if need_drap is set, we don't perform the optimization for leaf functions
> which have zero size stack frame, thus we end up with uselessly doing
> dynamic stack realignment, setting up DRAP that nothing uses and later on
> restore everything back.
> 
> This patch improves it, if the DRAP register isn't live at the start of
> entry bb successor and we aren't going to realign the stack, we don't
> need DRAP at all, and even if we need DRAP register, that can't be the sole
> reason for doing stack realignment, the prologue code is able to set up DRAP
> even without dynamic stack realignment.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2013-12-20  Jakub Jelinek  
> 
>   PR target/59501
>   * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
>   if !crtl->stack_realign_needed.
>   (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
>   and stack_realign_needed will be false, clear drap_reg and need_drap.
>   Optimize leaf functions that don't need stack frame even if
>   crtl->need_drap.
> 
>   * gcc.target/i386/pr59501-1.c: New test.
>   * gcc.target/i386/pr59501-1a.c: New test.
>   * gcc.target/i386/pr59501-2.c: New test.
>   * gcc.target/i386/pr59501-2a.c: New test.
>   * gcc.target/i386/pr59501-3.c: New test.
>   * gcc.target/i386/pr59501-3a.c: New test.
>   * gcc.target/i386/pr59501-4.c: New test.
>   * gcc.target/i386/pr59501-4a.c: New test.
>   * gcc.target/i386/pr59501-5.c: New test.
>   * gcc.target/i386/pr59501-6.c: New test.

This seems OK, thanks!
Honza


Re: [PATCH] Improve i?86/x86_64 prologue_and_epilogue for leaf functions (PR target/59501)

2013-12-20 Thread H.J. Lu
On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
> Hi!
>
> Honza recently changed the i?86 backend, so that it often doesn't
> do -maccumulate-outgoing-args by default on x86_64.
> Unfortunately, on some of the here included testcases this regressed
> quite a bit the generated code.  As AVX vectors are used, the dynamic
> realignment code needs to assume e.g. that some of them will need to be
> spilled, and for -mno-accumulate-outgoing-args the code needs to set
> need_drap early as well.  But in when emitting the prologue/epilogue,
> if need_drap is set, we don't perform the optimization for leaf functions
> which have zero size stack frame, thus we end up with uselessly doing
> dynamic stack realignment, setting up DRAP that nothing uses and later on
> restore everything back.
>
> This patch improves it, if the DRAP register isn't live at the start of
> entry bb successor and we aren't going to realign the stack, we don't
> need DRAP at all, and even if we need DRAP register, that can't be the sole
> reason for doing stack realignment, the prologue code is able to set up DRAP
> even without dynamic stack realignment.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-12-20  Jakub Jelinek  
>
> PR target/59501
> * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
> if !crtl->stack_realign_needed.
> (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
> and stack_realign_needed will be false, clear drap_reg and need_drap.
> Optimize leaf functions that don't need stack frame even if
> crtl->need_drap.
>
> * gcc.target/i386/pr59501-1.c: New test.
> * gcc.target/i386/pr59501-1a.c: New test.
> * gcc.target/i386/pr59501-2.c: New test.
> * gcc.target/i386/pr59501-2a.c: New test.
> * gcc.target/i386/pr59501-3.c: New test.
> * gcc.target/i386/pr59501-3a.c: New test.
> * gcc.target/i386/pr59501-4.c: New test.
> * gcc.target/i386/pr59501-4a.c: New test.
> * gcc.target/i386/pr59501-5.c: New test.
> * gcc.target/i386/pr59501-6.c: New test.
>
> --- gcc/config/i386/i386.c.jj   2013-12-19 13:35:23.0 +0100
> +++ gcc/config/i386/i386.c  2013-12-20 11:44:14.389310804 +0100
> @@ -9235,7 +9235,9 @@ ix86_save_reg (unsigned int regno, bool
> }
>  }
>
> -  if (crtl->drap_reg && regno == REGNO (crtl->drap_reg))
> +  if (crtl->drap_reg
> +  && regno == REGNO (crtl->drap_reg)
> +  && crtl->stack_realign_needed)
>  return true;
>
>return (df_regs_ever_live_p (regno)
> @@ -10473,12 +10475,23 @@ ix86_finalize_stack_realign_flags (void)
>return;
>  }
>
> +  /* If drap has been set, but it actually isn't live at the start
> + of the function and !stack_realign, there is no reason to set it up.  */
> +  if (crtl->drap_reg && !stack_realign)
> +{
> +  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +  if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
> +   {
> + crtl->drap_reg = NULL_RTX;
> + crtl->need_drap = false;
> +   }
> +}
> +
>/* If the only reason for frame_pointer_needed is that we conservatively
>   assumed stack realignment might be needed, but in the end nothing that
>   needed the stack alignment had been spilled, clear frame_pointer_needed
>   and say we don't need stack realignment.  */
>if (stack_realign
> -  && !crtl->need_drap
>&& frame_pointer_needed
>&& crtl->is_leaf
>&& flag_omit_frame_pointer
> @@ -10516,6 +10529,18 @@ ix86_finalize_stack_realign_flags (void)
>   }
> }
>
> +  /* If drap has been set, but it actually isn't live at the start
> +of the function, there is no reason to set it up.  */
> +  if (crtl->drap_reg)
> +   {
> + basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> + if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
> +   {
> + crtl->drap_reg = NULL_RTX;
> + crtl->need_drap = false;
> +   }
> +   }
> +
>frame_pointer_needed = false;
>stack_realign = false;
>crtl->max_used_stack_slot_alignment = incoming_stack_boundary;

Looks good to me.  But I can't approve it.

Thanks.

-- 
H.J.


[PATCH] Improve i?86/x86_64 prologue_and_epilogue for leaf functions (PR target/59501)

2013-12-20 Thread Jakub Jelinek
Hi!

Honza recently changed the i?86 backend, so that it often doesn't
do -maccumulate-outgoing-args by default on x86_64.
Unfortunately, on some of the here included testcases this regressed
quite a bit the generated code.  As AVX vectors are used, the dynamic
realignment code needs to assume e.g. that some of them will need to be
spilled, and for -mno-accumulate-outgoing-args the code needs to set
need_drap early as well.  But in when emitting the prologue/epilogue,
if need_drap is set, we don't perform the optimization for leaf functions
which have zero size stack frame, thus we end up with uselessly doing
dynamic stack realignment, setting up DRAP that nothing uses and later on
restore everything back.

This patch improves it, if the DRAP register isn't live at the start of
entry bb successor and we aren't going to realign the stack, we don't
need DRAP at all, and even if we need DRAP register, that can't be the sole
reason for doing stack realignment, the prologue code is able to set up DRAP
even without dynamic stack realignment.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-12-20  Jakub Jelinek  

PR target/59501
* config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
if !crtl->stack_realign_needed.
(ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
and stack_realign_needed will be false, clear drap_reg and need_drap.
Optimize leaf functions that don't need stack frame even if
crtl->need_drap.

* gcc.target/i386/pr59501-1.c: New test.
* gcc.target/i386/pr59501-1a.c: New test.
* gcc.target/i386/pr59501-2.c: New test.
* gcc.target/i386/pr59501-2a.c: New test.
* gcc.target/i386/pr59501-3.c: New test.
* gcc.target/i386/pr59501-3a.c: New test.
* gcc.target/i386/pr59501-4.c: New test.
* gcc.target/i386/pr59501-4a.c: New test.
* gcc.target/i386/pr59501-5.c: New test.
* gcc.target/i386/pr59501-6.c: New test.

--- gcc/config/i386/i386.c.jj   2013-12-19 13:35:23.0 +0100
+++ gcc/config/i386/i386.c  2013-12-20 11:44:14.389310804 +0100
@@ -9235,7 +9235,9 @@ ix86_save_reg (unsigned int regno, bool
}
 }
 
-  if (crtl->drap_reg && regno == REGNO (crtl->drap_reg))
+  if (crtl->drap_reg
+  && regno == REGNO (crtl->drap_reg)
+  && crtl->stack_realign_needed)
 return true;
 
   return (df_regs_ever_live_p (regno)
@@ -10473,12 +10475,23 @@ ix86_finalize_stack_realign_flags (void)
   return;
 }
 
+  /* If drap has been set, but it actually isn't live at the start
+ of the function and !stack_realign, there is no reason to set it up.  */
+  if (crtl->drap_reg && !stack_realign)
+{
+  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+  if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
+   {
+ crtl->drap_reg = NULL_RTX;
+ crtl->need_drap = false;
+   }
+}
+
   /* If the only reason for frame_pointer_needed is that we conservatively
  assumed stack realignment might be needed, but in the end nothing that
  needed the stack alignment had been spilled, clear frame_pointer_needed
  and say we don't need stack realignment.  */
   if (stack_realign
-  && !crtl->need_drap
   && frame_pointer_needed
   && crtl->is_leaf
   && flag_omit_frame_pointer
@@ -10516,6 +10529,18 @@ ix86_finalize_stack_realign_flags (void)
  }
}
 
+  /* If drap has been set, but it actually isn't live at the start
+of the function, there is no reason to set it up.  */
+  if (crtl->drap_reg)
+   {
+ basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+ if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
+   {
+ crtl->drap_reg = NULL_RTX;
+ crtl->need_drap = false;
+   }
+   }
+
   frame_pointer_needed = false;
   stack_realign = false;
   crtl->max_used_stack_slot_alignment = incoming_stack_boundary;
--- gcc/testsuite/gcc.target/i386/pr59501-2.c.jj2013-12-20 
12:02:08.754662741 +0100
+++ gcc/testsuite/gcc.target/i386/pr59501-2.c   2013-12-20 12:02:04.665668734 
+0100
@@ -0,0 +1,5 @@
+/* PR target/59501 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
+
+#include "pr59501-1.c"
--- gcc/testsuite/gcc.target/i386/pr59501-1.c.jj2013-12-20 
12:01:44.253781613 +0100
+++ gcc/testsuite/gcc.target/i386/pr59501-1.c   2013-12-20 12:12:26.715391613 
+0100
@@ -0,0 +1,30 @@
+/* PR target/59501 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -mno-accumulate-outgoing-args" } */
+
+#define CHECK_H "avx-check.h"
+#define TEST avx_test
+
+#include CHECK_H
+
+typedef double V __attribute__ ((vector_size (32)));
+
+__attribute__((noinline, noclone)) V
+foo (double *x, unsigned *y)
+{
+  V r = { x[y[0]], x[y[1]], x[y[2]], x[y[3]] };
+  return r;
+}
+
+static void
+TEST (voi