Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-13 Thread Jeff Law via Gcc-patches




On 7/13/23 08:20, Manolis Tsamis wrote:



I have sent a V3 which contains a number of fixes and improvements:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624439.html
I tested the new version rebased on master and the m68k issue did not reproduce.
I don't know what exactly fixed it; do we need to know why or is it
enough that the issue is gone following some general fixes?
It is highly possible that this also fixes the x264 failure. Please
let me know if the issue persists with v3 once you're able to test.

Sounds good.  I'll test both m68k and x264 on rv64 with the latest patch.

jeff


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-13 Thread Manolis Tsamis
On Wed, Jul 12, 2023 at 5:14 PM Jeff Law  wrote:
>
>
>
> On 7/12/23 03:12, Manolis Tsamis wrote:
> > On Mon, Jul 10, 2023 at 12:58 AM Hans-Peter Nilsson  
> > wrote:
> >>
> >> On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:
> >>
> >>> On Thu, 15 Jun 2023, Manolis Tsamis wrote:
> >>>
>  This is a new RTL pass that tries to optimize memory offset calculations
>  by moving them from add immediate instructions to the memory 
>  loads/stores.
> >>
> >>> It punts on all "use" insns that are not SET.
> >>> Why not use single_set there too?
> >>
> >> Also, I don't see insn costs considered?
> >> (Also: typo "immidiate".)
> >>
> >
> > The only change that this pass does is to change offsets where
> > possible and then simplify add immediate instructions to register
> > moves.
> > I don't see how this could result in worse performance and by
> > extension I don't see where insn costs could be used.
> > Do you have any thoughts about where to use the costs?
> If the offset crosses an architectural size boundary such that the
> instruction was longer, but still valid, it could affect the cost.
>
Ok, I haven't thought about that. I will try a prototype in case we
want to include it in a next iteration of this.

> That's the most obvious case to me.  There may be others.
>
> Any progress on that m68k issue?  I've also got a report of x264 failing
> to build on riscv64 with the V2 variant, but I haven't distilled that
> down to a testcase yet.
>
I have sent a V3 which contains a number of fixes and improvements:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624439.html
I tested the new version rebased on master and the m68k issue did not reproduce.
I don't know what exactly fixed it; do we need to know why or is it
enough that the issue is gone following some general fixes?
It is highly possible that this also fixes the x264 failure. Please
let me know if the issue persists with v3 once you're able to test.

Manolis

> jeff


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 03:12, Manolis Tsamis wrote:

On Mon, Jul 10, 2023 at 12:58 AM Hans-Peter Nilsson  wrote:


On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:


On Thu, 15 Jun 2023, Manolis Tsamis wrote:


This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.



It punts on all "use" insns that are not SET.
Why not use single_set there too?


Also, I don't see insn costs considered?
(Also: typo "immidiate".)



The only change that this pass does is to change offsets where
possible and then simplify add immediate instructions to register
moves.
I don't see how this could result in worse performance and by
extension I don't see where insn costs could be used.
Do you have any thoughts about where to use the costs?
If the offset crosses an architectural size boundary such that the 
instruction was longer, but still valid, it could affect the cost.


That's the most obvious case to me.  There may be others.

Any progress on that m68k issue?  I've also got a report of x264 failing 
to build on riscv64 with the V2 variant, but I haven't distilled that 
down to a testcase yet.


jeff


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Manolis Tsamis
On Mon, Jul 10, 2023 at 12:58 AM Hans-Peter Nilsson  wrote:
>
> On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:
>
> > On Thu, 15 Jun 2023, Manolis Tsamis wrote:
> >
> > > This is a new RTL pass that tries to optimize memory offset calculations
> > > by moving them from add immediate instructions to the memory loads/stores.
>
> > It punts on all "use" insns that are not SET.
> > Why not use single_set there too?
>
> Also, I don't see insn costs considered?
> (Also: typo "immidiate".)
>

The only change that this pass does is to change offsets where
possible and then simplify add immediate instructions to register
moves.
I don't see how this could result in worse performance and by
extension I don't see where insn costs could be used.
Do you have any thoughts about where to use the costs?

Thanks!
Manolis

> brgds, H-P


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Manolis Tsamis
On Mon, Jul 10, 2023 at 12:37 AM Hans-Peter Nilsson  wrote:
>
> On Thu, 15 Jun 2023, Manolis Tsamis wrote:
>
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >   addi t4,sp,16
> >   add  t2,a6,t4
> >   shl  t3,t2,1
> >   ld   a2,0(t3)
> >   addi a2,1
> >   sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >   add  t2,a6,sp
> >   shl  t3,t2,1
> >   ld   a2,32(t3)
> >   addi a2,1
> >   sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
>
> It punts on all "use" insns that are not SET.
> Why not use single_set there too?
>

The issue was that single_set will potentially discard clobbers, but
if we have any clobbers it may be invalid to propagate through that
instruction.
Rejecting anything that is not a SET is enough to handle anything strange.
Although this can be improved (look through clobbers/use?) the
implementation will be more complicated without any obvious (large)
benefit.

Manolis

> brgds, H-P


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-09 Thread Hans-Peter Nilsson
On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:

> On Thu, 15 Jun 2023, Manolis Tsamis wrote:
> 
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.

> It punts on all "use" insns that are not SET.
> Why not use single_set there too?

Also, I don't see insn costs considered?
(Also: typo "immidiate".)

brgds, H-P


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-09 Thread Hans-Peter Nilsson
On Thu, 15 Jun 2023, Manolis Tsamis wrote:

> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>   addi t4,sp,16
>   add  t2,a6,t4
>   shl  t3,t2,1
>   ld   a2,0(t3)
>   addi a2,1
>   sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>   add  t2,a6,sp
>   shl  t3,t2,1
>   ld   a2,32(t3)
>   addi a2,1
>   sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.

It punts on all "use" insns that are not SET.
Why not use single_set there too?

brgds, H-P


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-06-17 Thread Jeff Law via Gcc-patches




On 6/15/23 11:28, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v2:
 - Made the pass target-independant instead of RISCV specific.
 - Fixed a number of bugs.
 - Add code to handle more ADD patterns as found
   in other targets (x86, aarch64).
 - Improved naming and comments.
 - Fixed bitmap memory leak.




diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
new file mode 100644
index 000..8ef0f438191
--- /dev/null
+++ b/gcc/fold-mem-offsets.cc
@@ -0,0 +1,630 @@
+/* Late RTL pass to fold memory offsets.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1

Do we still need this #define?





+/* Tracks which instructions can be reached through instructions that can
+   propagate offsets for folding.  */
+static bitmap_head can_fold_insns;
Is there any reason why you're using "bitmap_head" rather than just the 
generic "bitmap" type?


Also note that since you've got a class you could just put these objects 
into the class and make the routines that use them member functions. 
It's marginally cleaner than using static variables.




+
+/* Helper function that performs the actions defined by PHASE for INSN.  */
+static void
+fold_mem_offsets_driver (rtx_insn* insn, int phase)
+{
+  if (phase == FM_PHASE_COMMIT_INSNS)
So FM_PHASE_COMMIT_INSNS doesn't share any code with the other phases. 
Would it be better to just factor this into a distinct function?





+{
+  rtx mem = get_foldable_mem (insn);
+
+  if (!mem)
+   return;
+
+  rtx mem_addr = XEXP (mem, 0);
+  rtx reg;
+  HOST_WIDE_INT cur_offset;
+
+  if (REG_P (mem_addr))
+   {
+ reg = mem_addr;
+ cur_offset = 0;
+   }
+  else if (GET_CODE (mem_addr) == PLUS
+  && REG_P (XEXP (mem_addr, 0))
+  && CONST_INT_P (XEXP (mem_addr, 1)))
+   {
+ reg = XEXP (mem_addr, 0);
+ cur_offset = INTVAL (XEXP (mem_addr, 1));
+   }
+  else
+   return;
So these is common to the non-commit phases.  Would it be cleaner to 
factor it into its own function, then factor each of the non-commit 
phases into their own function which calls this common routine?





+  else if (phase == FM_PHASE_VALIDITY)
+   {
+ bitmap_head fold_insns;
+ bitmap_initialize (_insns, NULL);
Note that we have auto-bitmap types which will clean up after themselves 
so that you don't have to manage allocation/deallocation.



Overall it looks really good.  I could make an argument to include it 
now, but I think one more cycle would be best.


In the mean time, I've updated my tester to use the V2 version.

Thanks!
jeff


[PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-06-15 Thread Manolis Tsamis
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v2:
- Made the pass target-independant instead of RISCV specific.
- Fixed a number of bugs.
- Add code to handle more ADD patterns as found
  in other targets (x86, aarch64).
- Improved naming and comments.
- Fixed bitmap memory leak.

 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |   8 +
 gcc/fold-mem-offsets.cc   | 630 ++
 gcc/passes.def|   1 +
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 gcc/tree-pass.h   |   1 +
 9 files changed, 702 insertions(+)
 create mode 100644 gcc/fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4be82e83b9e..98a59e0d207 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1423,6 +1423,7 @@ OBJS = \
fixed-value.o \
fold-const.o \
fold-const-call.o \
+   fold-mem-offsets.o \
function.o \
function-abi.o \
function-tests.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index a28ca13385a..5a793de34fa 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1248,6 +1248,10 @@ fcprop-registers
 Common Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
 
+ffold-mem-offsets
+Target Bool Var(flag_fold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction 
if possible.
+
 fcrossjumping
 Common Var(flag_crossjumping) Optimization
 Perform cross-jumping optimization.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9ecbd32a228..b1dba4df536 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -537,6 +537,7 @@ Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec  -fbranch-probabilities
 -fcaller-saves
 -fcombine-stack-adjustments  -fconserve-stack
+-ffold-mem-offsets
 -fcompare-elim  -fcprop-registers  -fcrossjumping
 -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules
 -fcx-limited-range
@@ -14230,6 +14231,13 @@ the comparison operation before register allocation is 
complete.
 
 Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}.
 
+@opindex ffold-mem-offsets
+@item -ffold-mem-offsets
+@itemx -fno-fold-mem-offsets
+Try to eliminate add instructions by folding them in memory loads/stores.
+
+Enabled at levels @option{-O2}, @option{-O3}.
+
 @opindex fcprop-registers
 @item -fcprop-registers
 After register allocation and post-register allocation instruction splitting,
diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
new file mode 100644
index 000..8ef0f438191
--- /dev/null
+++ b/gcc/fold-mem-offsets.cc
@@ -0,0 +1,630 @@
+/* Late RTL pass to fold memory offsets.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see