Re: [PATCH 23/23] fwprop: Rewrite to use RTL SSA

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 11/13/20 1:24 AM, Richard Sandiford via Gcc-patches wrote:
>> This patch rewrites fwprop.c to use the RTL SSA framework.  It tries
>> as far as possible to mimic the old behaviour, even in caes where
>> that doesn't fit naturally with the new framework.  I've added ???
>> comments to mark those places, but I think “fixing” them should
>> be done separately to make bisection easier.
>>
>> In particular:
>>
>> * The old implementation iterated over uses, and after a successful
>>   substitution, the new insn's uses were added to the end of the list.
>>   The pass still processed those uses, but because it processed them at
>>   the end, it didn't fully optimise one instruction before propagating
>>   it into the next.
>>
>>   The new version follows the same approach for comparison purposes,
>>   but I'd like to drop that as a follow-on patch.
>>
>> * The old implementation operated on single use sites (DF_REF_LOCs).
>>   This doesn't work well for instructions with match_dups, where it's
>>   necessary to update both an operand and its dups at the same time.
>>   For example, attempting to substitute into a divmod instruction would
>>   fail because only the div or the mod side would be updated.
>>
>>   The new version again follows this to some extent for comparison
>>   purposes (although not exactly).  Again I'd like to drop it as a
>>   follow-on patch.
>>
>>   One difference is that if a register occurs in multiple MEM addresses
>>   in a set, the new version will try to update them all at once.  This is
>>   what causes the SVE ACLE st4* output to improve.
>>
>> Also, the old version didn't naturally guarantee termination (PR79405),
>> whereas the new one does.
>>
>> gcc/
>>  * fwprop.c: Rewrite to use the RTL SSA framework.
>>
>> gcc/testsuite/
>>  * gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't
>>  expect insn updates to be deferred.
>>  * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition
>>  to be folded into the address.
>>  * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Likewise.
> Consider killing the ADD_NOTES bits.

Done (and glad to see it go).

> s/eqaul/equal/ to fix a typo.

Oops, fixed.

> Naturally I'm happy at how much by-hand RTL analysis code just
> disappears with this change :-)
>
> Ideally you'll drop this in tomorrow and we can get a fresh run of all
> the targets in my tester before the weekend.  I won't be stressed if we
> see some fallout, but I don't expect much.  I'll help track them down if
> they occur.

Thanks, now pushed with the above changes.

Richard


Re: [PATCH 23/23] fwprop: Rewrite to use RTL SSA

2020-12-15 Thread Jeff Law via Gcc-patches



On 11/13/20 1:24 AM, Richard Sandiford via Gcc-patches wrote:
> This patch rewrites fwprop.c to use the RTL SSA framework.  It tries
> as far as possible to mimic the old behaviour, even in caes where
> that doesn't fit naturally with the new framework.  I've added ???
> comments to mark those places, but I think “fixing” them should
> be done separately to make bisection easier.
>
> In particular:
>
> * The old implementation iterated over uses, and after a successful
>   substitution, the new insn's uses were added to the end of the list.
>   The pass still processed those uses, but because it processed them at
>   the end, it didn't fully optimise one instruction before propagating
>   it into the next.
>
>   The new version follows the same approach for comparison purposes,
>   but I'd like to drop that as a follow-on patch.
>
> * The old implementation operated on single use sites (DF_REF_LOCs).
>   This doesn't work well for instructions with match_dups, where it's
>   necessary to update both an operand and its dups at the same time.
>   For example, attempting to substitute into a divmod instruction would
>   fail because only the div or the mod side would be updated.
>
>   The new version again follows this to some extent for comparison
>   purposes (although not exactly).  Again I'd like to drop it as a
>   follow-on patch.
>
>   One difference is that if a register occurs in multiple MEM addresses
>   in a set, the new version will try to update them all at once.  This is
>   what causes the SVE ACLE st4* output to improve.
>
> Also, the old version didn't naturally guarantee termination (PR79405),
> whereas the new one does.
>
> gcc/
>   * fwprop.c: Rewrite to use the RTL SSA framework.
>
> gcc/testsuite/
>   * gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't
>   expect insn updates to be deferred.
>   * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition
>   to be folded into the address.
>   * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Likewise.
Consider killing the ADD_NOTES bits.

s/eqaul/equal/ to fix a typo.

Naturally I'm happy at how much by-hand RTL analysis code just
disappears with this change :-)

Ideally you'll drop this in tomorrow and we can get a fresh run of all
the targets in my tester before the weekend.  I won't be stressed if we
see some fallout, but I don't expect much.  I'll help track them down if
they occur.

Thanks for your patience.

Jeff



[PATCH 23/23] fwprop: Rewrite to use RTL SSA

2020-11-13 Thread Richard Sandiford via Gcc-patches
This patch rewrites fwprop.c to use the RTL SSA framework.  It tries
as far as possible to mimic the old behaviour, even in caes where
that doesn't fit naturally with the new framework.  I've added ???
comments to mark those places, but I think “fixing” them should
be done separately to make bisection easier.

In particular:

* The old implementation iterated over uses, and after a successful
  substitution, the new insn's uses were added to the end of the list.
  The pass still processed those uses, but because it processed them at
  the end, it didn't fully optimise one instruction before propagating
  it into the next.

  The new version follows the same approach for comparison purposes,
  but I'd like to drop that as a follow-on patch.

* The old implementation operated on single use sites (DF_REF_LOCs).
  This doesn't work well for instructions with match_dups, where it's
  necessary to update both an operand and its dups at the same time.
  For example, attempting to substitute into a divmod instruction would
  fail because only the div or the mod side would be updated.

  The new version again follows this to some extent for comparison
  purposes (although not exactly).  Again I'd like to drop it as a
  follow-on patch.

  One difference is that if a register occurs in multiple MEM addresses
  in a set, the new version will try to update them all at once.  This is
  what causes the SVE ACLE st4* output to improve.

Also, the old version didn't naturally guarantee termination (PR79405),
whereas the new one does.

gcc/
* fwprop.c: Rewrite to use the RTL SSA framework.

gcc/testsuite/
* gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't
expect insn updates to be deferred.
* gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition
to be folded into the address.
* gcc.target/aarch64/sve/acle/asm/st4_s8.c: Likewise.
---
 gcc/fwprop.c  | 1698 ++---
 .../test-return-const.c.before-fwprop.c   |2 +-
 .../gcc.target/aarch64/sve/acle/asm/st4_s8.c  |8 +-
 .../gcc.target/aarch64/sve/acle/asm/st4_u8.c  |8 +-
 4 files changed, 561 insertions(+), 1155 deletions(-)

*** /tmp/9upGS6_fwprop.c2020-11-13 08:23:52.837409271 +
--- gcc/fwprop.c2020-11-13 08:05:06.490403698 +
***
*** 18,49 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
- #include "target.h"
  #include "rtl.h"
- #include "predict.h"
  #include "df.h"
! #include "memmodel.h"
! #include "tm_p.h"
! #include "insn-config.h"
! #include "emit-rtl.h"
! #include "recog.h"
  
  #include "sparseset.h"
  #include "cfgrtl.h"
  #include "cfgcleanup.h"
  #include "cfgloop.h"
  #include "tree-pass.h"
- #include "domwalk.h"
  #include "rtl-iter.h"
! 
  
  /* This pass does simple forward propagation and simplification when an
 operand of an insn can only come from a single def.  This pass uses
!df.c, so it is global.  However, we only do limited analysis of
 available expressions.
  
 1) The pass tries to propagate the source of the def into the use,
--- 18,47 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
+ #define ADD_NOTES 0
+ 
+ #define INCLUDE_ALGORITHM
+ #define INCLUDE_FUNCTIONAL
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
  #include "rtl.h"
  #include "df.h"
! #include "rtl-ssa.h"
  
  #include "sparseset.h"
+ #include "predict.h"
  #include "cfgrtl.h"
  #include "cfgcleanup.h"
  #include "cfgloop.h"
  #include "tree-pass.h"
  #include "rtl-iter.h"
! #include "target.h"
  
  /* This pass does simple forward propagation and simplification when an
 operand of an insn can only come from a single def.  This pass uses
!RTL SSA, so it is global.  However, we only do limited analysis of
 available expressions.
  
 1) The pass tries to propagate the source of the def into the use,
***
*** 60,68 
(set (subreg:SI (reg:DI 120) 0) (const_int 0))
(set (subreg:SI (reg:DI 120) 4) (const_int -1))
(set (subreg:SI (reg:DI 122) 0)
!  (ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0)))
(set (subreg:SI (reg:DI 122) 4)
!  (ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4)))
  
 can be simplified to the much simpler
  
--- 58,66 
(set (subreg:SI (reg:DI 120) 0) (const_int 0))
(set (subreg:SI (reg:DI 120) 4) (const_int -1))
(set (subreg:SI (reg:DI 122) 0)
!(ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0)))
(set (subreg:SI (reg:DI 122) 4)
!(ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4)))
  
 can be simplified to the much simpler
  
***
*** 89,95 
   (set (reg:QI 120) (subre