Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Marcus Shawcroft
On 22 July 2014 15:52, Jiong Wang jiong.w...@arm.com wrote:

 gcc/
   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
   outgoing area size when restore stack_pointer_rtx.

 gcc/testsuite/
   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction
 sequences.

OK and committed.

/Marcus


Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Richard Earnshaw
On 22/07/14 15:52, Jiong Wang wrote:
 currently we are generating sub-optimal epilogue when there
 is frame pointer and there is outgoing area.
  
 take gcc.target/aarch64/test_frame_12.c for example:
  
 the epilogue for test_12 is:
  
.L12:
sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret
  
 while the optimized version should be:
  
.L12:
add sp, x29, 0
ldp x29, x30, [sp], 416
ret

Even better would be

ldp x29, x30, [x29]
add sp, sp, #432

since now the two instructions can dual-issue.

R.

  
 when there is frame pointer, it is set up to point to base address
 of our reg save area in prologue, so in epilogue we could utilize
 this feature, and skip outgoing if there is, thus we could always utilize
 load write-back for stack adjustment when there is frame pointer.
 
 ok to install?
 
 thanks.
 
 gcc/
* config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
outgoing area size when restore stack_pointer_rtx.
 
 gcc/testsuite/
* gcc.target/aarch64/test_frame_12.c: Match optimized instruction 
 sequences.
 
 
 0014-AArch64-GCC-15-20-Optimize-epilogue-when-there-is-fr.patch
 
 
From 9d8cbfa071df773ef5edfed499c0dc90be8eebfa Mon Sep 17 00:00:00 2001
 From: Jiong Wang jiong.w...@arm.com
 Date: Tue, 17 Jun 2014 22:19:33 +0100
 Subject: [PATCH 14/19] [AArch64/GCC][15/20] Optimize epilogue when there is
  frame pointer
 
 currently we are generating sub-optimal epilogue when there
 is frame pointer and there is outgoing area.
 
 take gcc.target/aarch64/test_frame_12.c for example:
 
 the epilogue for test_12 is:
 
 .L12:
 sub sp, x29, #16
 ldp x29, x30, [sp, 16]
 add sp, sp, 432
 ret
 
 while the optimized version should be:
 
 .L12:
 add sp, x29, 0
 ldp x29, x30, [sp], 416
 ret
 
 when there is frame pointer, it is set up to point to base address of our
 reg save area in prologue, so in epilogue we could utilize this feature,
 and skip outgoing if there is, thus we could always utilize load write-back
 for stack adjustment when there is frame pointer.
 
 2014-06-16  Jiong Wang jiong.w...@arm.com
   Marcus Shawcroft  marcus.shawcr...@arm.com
 gcc/
   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
   outgoing area size when restore stack_pointer_rtx.
 
 gcc/testsuite/
   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
 ---
  gcc/config/aarch64/aarch64.c |   24 
 +++---
  gcc/testsuite/gcc.target/aarch64/test_frame_12.c |4 
  2 files changed, 11 insertions(+), 17 deletions(-)
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 425c865..65a84e8 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -2360,7 +2360,8 @@ aarch64_expand_epilogue (bool for_sibcall)
  {
insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
  hard_frame_pointer_rtx,
 -GEN_INT (- fp_offset)));
 +GEN_INT (0)));
 +  offset = offset - fp_offset;
RTX_FRAME_RELATED_P (insn) = 1;
/* As SP is set to (FP - fp_offset), according to the rules in
dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
 @@ -2368,27 +2369,16 @@ aarch64_expand_epilogue (bool for_sibcall)
cfa_reg = stack_pointer_rtx;
  }
  
 -  aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM);
 +  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
 + V0_REGNUM, V31_REGNUM);
  
if (offset  0)
  {
if (frame_pointer_needed)
   {
 -   if (fp_offset)
 - {
 -   aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
 - R30_REGNUM);
 -   insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
 -GEN_INT (offset)));
 -   RTX_FRAME_RELATED_P (insn) = 1;
 - }
 -   else
 - {
 -   aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
 - R28_REGNUM);
 -   aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
 -   cfa_reg);
 - }
 +   aarch64_restore_callee_saves (DImode, 0, R0_REGNUM, R28_REGNUM);
 +   aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
 +   cfa_reg);
   }
else
   {
 diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c 
 b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
 index 3649527..81f0070 100644
 --- a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
 +++ b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
 @@ -12,4 +12,8 @@ t_frame_pattern_outgoing 

Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Jiong Wang


On 24/07/14 13:25, Richard Earnshaw wrote:

On 22/07/14 15:52, Jiong Wang wrote:

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.
  
take gcc.target/aarch64/test_frame_12.c for example:
  
the epilogue for test_12 is:
  
.L12:

sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret
  
while the optimized version should be:
  
.L12:

add sp, x29, 0
ldp x29, x30, [sp], 416
ret

Even better would be

ldp x29, x30, [x29]
add sp, sp, #432

since now the two instructions can dual-issue.


thanks for pointing this out.

will investigate this after current stack patch set installed.

-- Jiong



R.

  
when there is frame pointer, it is set up to point to base address

of our reg save area in prologue, so in epilogue we could utilize
this feature, and skip outgoing if there is, thus we could always utilize
load write-back for stack adjustment when there is frame pointer.

ok to install?

thanks.

gcc/
* config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
* gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.


0014-AArch64-GCC-15-20-Optimize-epilogue-when-there-is-fr.patch


From 9d8cbfa071df773ef5edfed499c0dc90be8eebfa Mon Sep 17 00:00:00 2001
From: Jiong Wang jiong.w...@arm.com
Date: Tue, 17 Jun 2014 22:19:33 +0100
Subject: [PATCH 14/19] [AArch64/GCC][15/20] Optimize epilogue when there is
  frame pointer

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.

take gcc.target/aarch64/test_frame_12.c for example:

the epilogue for test_12 is:

.L12:
sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret

while the optimized version should be:

.L12:
add sp, x29, 0
ldp x29, x30, [sp], 416
ret

when there is frame pointer, it is set up to point to base address of our
reg save area in prologue, so in epilogue we could utilize this feature,
and skip outgoing if there is, thus we could always utilize load write-back
for stack adjustment when there is frame pointer.

2014-06-16  Jiong Wang jiong.w...@arm.com
Marcus Shawcroft  marcus.shawcr...@arm.com
gcc/
   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
   outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
---
  gcc/config/aarch64/aarch64.c |   24 +++---
  gcc/testsuite/gcc.target/aarch64/test_frame_12.c |4 
  2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 425c865..65a84e8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2360,7 +2360,8 @@ aarch64_expand_epilogue (bool for_sibcall)
  {
insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
   hard_frame_pointer_rtx,
-  GEN_INT (- fp_offset)));
+  GEN_INT (0)));
+  offset = offset - fp_offset;
RTX_FRAME_RELATED_P (insn) = 1;
/* As SP is set to (FP - fp_offset), according to the rules in
 dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
@@ -2368,27 +2369,16 @@ aarch64_expand_epilogue (bool for_sibcall)
cfa_reg = stack_pointer_rtx;
  }
  
-  aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM);

+  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
+   V0_REGNUM, V31_REGNUM);
  
if (offset  0)

  {
if (frame_pointer_needed)
{
- if (fp_offset)
-   {
- aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-   R30_REGNUM);
- insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
-  GEN_INT (offset)));
- RTX_FRAME_RELATED_P (insn) = 1;
-   }
- else
-   {
- aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-   R28_REGNUM);
- aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
- cfa_reg);
-   }
+ aarch64_restore_callee_saves (DImode, 0, R0_REGNUM, R28_REGNUM);
+ aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
+ cfa_reg);
}
else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
index 3649527..81f0070 100644
--- 

[AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-22 Thread Jiong Wang

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.

take gcc.target/aarch64/test_frame_12.c for example:

the epilogue for test_12 is:

  .L12:

  sub sp, x29, #16
  ldp x29, x30, [sp, 16]
  add sp, sp, 432
  ret

while the optimized version should be:

  .L12:

  add sp, x29, 0
  ldp x29, x30, [sp], 416
  ret

when there is frame pointer, it is set up to point to base address

of our reg save area in prologue, so in epilogue we could utilize
this feature, and skip outgoing if there is, thus we could always utilize
load write-back for stack adjustment when there is frame pointer.

ok to install?

thanks.

gcc/
  * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
  outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
From 9d8cbfa071df773ef5edfed499c0dc90be8eebfa Mon Sep 17 00:00:00 2001
From: Jiong Wang jiong.w...@arm.com
Date: Tue, 17 Jun 2014 22:19:33 +0100
Subject: [PATCH 14/19] [AArch64/GCC][15/20] Optimize epilogue when there is
 frame pointer

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.

take gcc.target/aarch64/test_frame_12.c for example:

the epilogue for test_12 is:

.L12:
sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret

while the optimized version should be:

.L12:
add sp, x29, 0
ldp x29, x30, [sp], 416
ret

when there is frame pointer, it is set up to point to base address of our
reg save area in prologue, so in epilogue we could utilize this feature,
and skip outgoing if there is, thus we could always utilize load write-back
for stack adjustment when there is frame pointer.

2014-06-16  Jiong Wang jiong.w...@arm.com
	Marcus Shawcroft  marcus.shawcr...@arm.com
gcc/
  * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
  outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
---
 gcc/config/aarch64/aarch64.c |   24 +++---
 gcc/testsuite/gcc.target/aarch64/test_frame_12.c |4 
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 425c865..65a84e8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2360,7 +2360,8 @@ aarch64_expand_epilogue (bool for_sibcall)
 {
   insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
    hard_frame_pointer_rtx,
-   GEN_INT (- fp_offset)));
+   GEN_INT (0)));
+  offset = offset - fp_offset;
   RTX_FRAME_RELATED_P (insn) = 1;
   /* As SP is set to (FP - fp_offset), according to the rules in
 	 dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
@@ -2368,27 +2369,16 @@ aarch64_expand_epilogue (bool for_sibcall)
   cfa_reg = stack_pointer_rtx;
 }
 
-  aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM);
+  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
+V0_REGNUM, V31_REGNUM);
 
   if (offset  0)
 {
   if (frame_pointer_needed)
 	{
-	  if (fp_offset)
-	{
-	  aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-	R30_REGNUM);
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
-	   GEN_INT (offset)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-	  else
-	{
-	  aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-	R28_REGNUM);
-	  aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
-  cfa_reg);
-	}
+	  aarch64_restore_callee_saves (DImode, 0, R0_REGNUM, R28_REGNUM);
+	  aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
+  cfa_reg);
 	}
   else
 	{
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
index 3649527..81f0070 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
@@ -12,4 +12,8 @@ t_frame_pattern_outgoing (test12, 400, , 8, a[8])
 t_frame_run (test12)
 
 /* { dg-final { scan-assembler-times sub\tsp, sp, #\[0-9\]+ 1 } } */
+
+/* Check epilogue using write-back.  */
+/* { dg-final { scan-assembler-times ldp\tx29, x30, \\\[sp\\\], \[0-9\]+ 3 } } */
+
 /* { dg-final { cleanup-saved-temps } } */
-- 
1.7.9.5