Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-18 Thread Eric Botcazou
> Are we in agreement that I should revert the patch?

I think that you can leave it for now in order to have some feedback (see for 
example PR rtl-optimization/82597) but ideally it should be rewritten so as to 
reuse the existing infrastructure of the pass.

-- 
Eric Botcazou


RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Michael Collison
Are we in agreement that I should revert the patch?

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, October 17, 2017 1:10 PM
To: Michael Collison ; Eric Botcazou 

Cc: Jeff Law ; GCC Patches ; Segher 
Boessenkool ; Kyrill Tkachov 
; nd 
Subject: RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison 
 wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you 
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off 
head if DU and UD fall into this category. 

Richard. 

>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com]
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou 
>Cc: Jeff Law ; GCC Patches ; 
>Michael Collison ; Segher Boessenkool 
>; Kyrill Tkachov 
>; nd 
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal 
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou 
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou



RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Richard Biener
On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison 
 wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off 
head if DU and UD fall into this category. 

Richard. 

>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com] 
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou 
>Cc: Jeff Law ; GCC Patches ;
>Michael Collison ; Segher Boessenkool
>; Kyrill Tkachov
>; nd 
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou 
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou



Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Eric Botcazou
> I see you have objected and indicated the additional cost. Have you
> quantified how much more expensive the pass is?

No, but use-def chains are known to be slow because DF is slow, see e.g. the 
comment located a few lines below the call to try_merge_compare:

  /* ??? This is one point at which one could argue that DF_REF_CHAIN would
 be useful, but it is thought to be too heavy-weight a solution here.  */

Note that the patch breaks e.g. the Visium port, because the pass now sends 
all kind of junk RTXes to the select_cc_mode target hook, which was written to 
be in exact keeping with the arithmetic patterns of the MD file.  I'm going to 
fix the breakage of course, but this shows that the patch burns a large amount 
of cycles for targets like Visium for no benefit.

Index: config/visium/visium.c
===
--- config/visium/visium.c  (revision 253767)
+++ config/visium/visium.c  (working copy)
@@ -2938,12 +2938,6 @@ visium_select_cc_mode (enum rtx_code cod
   /* This is a btst, the result is in C instead of Z.  */
   return CCCmode;
 
-case CONST_INT:
-  /* This is a degenerate case, typically an uninitialized variable.  */
-  gcc_assert (op0 == constm1_rtx);
-
-  /* ... fall through ... */
-
 case REG:
 case AND:
 case IOR:
@@ -2953,6 +2947,7 @@ visium_select_cc_mode (enum rtx_code cod
 case LSHIFTRT:
 case TRUNCATE:
 case SIGN_EXTEND:
+case ZERO_EXTEND:
   /* Pretend that the flags are set as for a COMPARE with zero.
 That's mostly true, except for the 2 right shift insns that
 will set the C flag.  But the C flag is relevant only for
@@ -2960,6 +2955,16 @@ visium_select_cc_mode (enum rtx_code cod
 when applied to a comparison with zero.  */
   return CCmode;
 
+/* ??? Cater to the junk RTXes sent by try_merge_compare.  */
+case ASM_OPERANDS:
+case CALL:
+case CONST_INT:
+case LO_SUM:
+case HIGH:
+case MEM:
+case UNSPEC:
+  return CCmode;
+
 default:
   gcc_unreachable ();


-- 
Eric Botcazou


RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Michael Collison
Richard and Eric,

I see you have objected and indicated the additional cost. Have you quantified 
how much more expensive the pass is?

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, October 17, 2017 4:45 AM
To: Eric Botcazou 
Cc: Jeff Law ; GCC Patches ; Michael 
Collison ; Segher Boessenkool 
; Kyrill Tkachov ; nd 

Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou  wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing 
> implementation of the pass, which is based on a forward scan within 
> basic blocks to identify the various interesting instructions and 
> record them, and uses full-blown def-use and use-def chains instead, 
> which are much more costly to compute.  It's not clear to me why the existing 
> implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the 
> flags, the pass will be slower for absolutely no benefits, as the 
> existing implementation would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target 
> without consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Richard Biener
On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou  wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing implementation of
> the pass, which is based on a forward scan within basic blocks to identify the
> various interesting instructions and record them, and uses full-blown def-use
> and use-def chains instead, which are much more costly to compute.  It's not
> clear to me why the existing implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the flags, the
> pass will be slower for absolutely no benefits, as the existing implementation
> would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target without
> consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-14 Thread Eric Botcazou
> This looks good.  OK for the trunk.

FWIW I disagree.  The patch completely shuns the existing implementation of 
the pass, which is based on a forward scan within basic blocks to identify the 
various interesting instructions and record them, and uses full-blown def-use 
and use-def chains instead, which are much more costly to compute.  It's not 
clear to me why the existing implementation couldn't have been extended.

The result is that, for targets for which the pass was initially written, i.e. 
targets for which most (all) arithmetic instructions clobber the flags, the 
pass will be slower for absolutely no benefits, as the existing implementation 
would already have caught all the interesting cases.

So it's again a case of a generic change made for a specific target without 
consideration for other, admittedly less mainstream, targets...

-- 
Eric Botcazou


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-13 Thread Jeff Law
On 09/06/2017 09:56 AM, Michael Collison wrote:
> Patch updated with all relevant comments and suggestions.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, and 
> aarch64-none-linux-gnu and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
> 
>   * compare-elim.c: Include emit-rtl.h.
>   (can_merge_compare_into_arith): New function.
>   (try_validate_parallel): Likewise.
>   (try_merge_compare): Likewise.
>   (try_eliminate_compare): Call the above when no previous clobber
>   is available.
>   (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>   dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
>   
>   * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
Sorry for the delay.

This looks good.  OK for the trunk.

jeff


RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-09-06 Thread Michael Collison
Patch updated with all relevant comments and suggestions.

Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu 
and x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* compare-elim.c: Include emit-rtl.h.
(can_merge_compare_into_arith): New function.
(try_validate_parallel): Likewise.
(try_merge_compare): Likewise.
(try_eliminate_compare): Call the above when no previous clobber
is available.
(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
dataflow problems.

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

-Original Message-
From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
Sent: Saturday, September 2, 2017 12:07 AM
To: Kyrill Tkachov 
Cc: Jeff Law ; Michael Collison ; 
gcc-patches@gcc.gnu.org; nd 
Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making 
> >>good use of the flag-setting arithmetic instructions like ADDS, 
> >>SUBS, ANDS etc. that perform an arithmetic operation and compare the 
> >>result against zero.
> >>They are represented in a fairly standard way in the backend as 
> >>PARALLEL
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of [ ... ]

@item
For instructions that inherently set a condition code register, the 
@code{compare} operator is always written as the first RTL expression of the 
@code{parallel} instruction pattern.  For example, [ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is 
> >>used before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (plus:SI (reg:SI 0 x0 [ x ])
> >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >> (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >> (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>  (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't 
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again, except 
it already has two insns (the arith and the compare).  It does not see that 
when it splits the insn it can combine the first half with the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no 
conflicts there), and then we could combine 7+34 just fine.  But combine tries 
to be linear complexity, and it really cannot change insns around anyway.


Segher


pr5198v2.patch
Description: pr5198v2.patch


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-09-01 Thread Segher Boessenkool
Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making good use 
> >>of the flag-setting arithmetic instructions
> >>like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and 
> >>compare the result against zero.
> >>They are represented in a fairly standard way in the backend as PARALLEL 
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of
[ ... ]

@item
For instructions that inherently set a condition code register, the
@code{compare} operator is always written as the first RTL expression of
the @code{parallel} instruction pattern.  For example,
[ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is used 
> >>before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (plus:SI (reg:SI 0 x0 [ x ])
> >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >> (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >> (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>  (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again,
except it already has two insns (the arith and the compare).  It does
not see that when it splits the insn it can combine the first half with
the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no
conflicts there), and then we could combine 7+34 just fine.  But combine
tries to be linear complexity, and it really cannot change insns around
anyway.


Segher


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-08-31 Thread Jeff Law
On 08/29/2017 02:39 AM, Kyrill Tkachov wrote:

>>> This scares combine away as x0 is used in insn 33 as well as the
>>> comparison in insn 34.
>>> I think the compare-elim pass can help us here.
>> Is it the multiple use or the hard register that combine doesn't
>> appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.
OK.  I can see a case where multiple use causes a problem as well.  That
non-combinable use at insn 33 could well be a problem for combining insn
7 and insn 34 from your sample RTL.


>> What about old style asms?  Do you need to explicit punt those?  I don't
>> think they carry DF info.
> 
> I think we want to bail out on those to be safe. How are they
> represented in RTL?
I think old style asms will be an insn with an ASM_INPUT as their pattern.

New sytle asms (ie, with dataflow) would be an insn with ASM_OPERANDs as
their pattern.
> 
>> Don't you also have to verify that the inputs to ARITH_INSN are
>> unchanged between ARITH_INSN and CMP_INSN?
> 
> I don't think so. As long as there is no flag clobbering instructions
> between them we should be ok.
Right.  It's really a matter of which direction you go.   Working with
your example:


> 
> Consider:
> r1 := r2 + r3  // arith_insn
> r2 := r4 + r5  // arith_insn input modified before cmp_insn
> cc := CMP r1 0 // cmp_insn
combining the r1 := and the cc:= statements essentially means one is
going to move across the r2 := statment.

If the cc assignment moves, then you  need to verify that cc and r1 are
not changed.

If the r1 assignment moves, then you need to verify that r2 and r3 are
not changed.


You're doing the former but I was thinking of the latter.


> 



Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-08-29 Thread Kyrill Tkachov

Hi Jeff,
Some comments since I wrote this patch some time ago...

On 28/08/17 19:26, Jeff Law wrote:

On 08/10/2017 03:14 PM, Michael Collison wrote:

Hi all,

One issue that we keep encountering on aarch64 is GCC not making good use of 
the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the 
result against zero.
They are represented in a fairly standard way in the backend as PARALLEL 
patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])

GCC isn't forming these from separate arithmetic and comparison instructions as 
aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used 
before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
 (plus:SI (reg:SI 0 x0 [ x ])
 (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
  (nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
 (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
 (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
  (nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
 (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
 (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
  (nil))

This scares combine away as x0 is used in insn 33 as well as the comparison in 
insn 34.
I think the compare-elim pass can help us here.

Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.


It's the multiple use IIRC.


This patch extends it by handling comparisons against zero, finding the 
defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that 
will hopefully match the form
described above. If between the comparison and the defining insn we find an 
instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and 
x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* compare-elim.c: Include emit-rtl.h.
(can_merge_compare_into_arith): New function.
(try_validate_parallel): Likewise.
(try_merge_compare): Likewise.
(try_eliminate_compare): Call the above when no previous clobber
is available.
(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
dataflow problems.

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.


pr5198v1.patch


diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..c65d155 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tm_p.h"
  #include "insn-config.h"
  #include "recog.h"
+#include "emit-rtl.h"
  #include "cfgrtl.h"
  #include "tree-pass.h"
  #include "domwalk.h"
@@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn 
*start)
return reg;
  }
  
+/* Return true if it is okay to merge the comparison CMP_INSN with

+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+   insn && insn != arith_insn;
+   insn = PREV_INSN (insn))
+{
+  if (!NONDEBUG_INSN_P (insn))
+   continue;
+  /* Bail if there are jumps or calls in between.  */
+  if (!NONJUMP_INSN_P (insn))
+   return false;
+
+  df_ref ref;
+  /* Find a USE of the flags register.  */
+  FOR_EACH_INSN_USE (ref, insn)
+   if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+ return false;
+
+  /* Find a DEF of the flags register.  */
+  FOR_EACH_INSN_DEF (ref, insn)
+   if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+ return false;
+}
+  return true;
+}

What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.


I think we want to bail out on those to be safe. How are they 
represented in RTL?



Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


I don't think so. As long as there is no flag clobbering instructions 
between them we should be ok.


Consider:
r1 := r2 + r3  // arith_insn
r2 := r4 + r5  // arith_insn input modified before cmp_insn
cc := CMP r1 0 // cmp_insn

this would be transformed into
{r1 :=

Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-08-28 Thread Jeff Law
On 08/10/2017 03:14 PM, Michael Collison wrote:
> Hi all,
> 
> One issue that we keep encountering on aarch64 is GCC not making good use of 
> the flag-setting arithmetic instructions
> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare 
> the result against zero.
> They are represented in a fairly standard way in the backend as PARALLEL 
> patterns:
> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
> 
> GCC isn't forming these from separate arithmetic and comparison instructions 
> as aggressively as it could.
> A particular pain point is when the result of the arithmetic insn is used 
> before the comparison instruction.
> The testcase in this patch is one such example where we have:
> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
> (plus:SI (reg:SI 0 x0 [ x ])
> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>  (nil))
> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>  (nil))
> (insn 34 33 17 2 (set (reg:CC 66 cc)
> (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
> (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>  (nil))
> 
> This scares combine away as x0 is used in insn 33 as well as the comparison 
> in insn 34.
> I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

> 
> This patch extends it by handling comparisons against zero, finding the 
> defining instruction of the compare
> and merging the comparison with the defining instruction into a PARALLEL that 
> will hopefully match the form
> described above. If between the comparison and the defining insn we find an 
> instruction that uses the condition
> registers or any control flow we bail out, and we don't cross basic blocks.
> This simple technique allows us to catch cases such as this example.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu 
> and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
> 
>   * compare-elim.c: Include emit-rtl.h.
>   (can_merge_compare_into_arith): New function.
>   (try_validate_parallel): Likewise.
>   (try_merge_compare): Likewise.
>   (try_eliminate_compare): Call the above when no previous clobber
>   is available.
>   (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>   dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
>   
>   * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
> 
> 
> pr5198v1.patch
> 
> 
> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
> index 7e557a2..c65d155 100644
> --- a/gcc/compare-elim.c
> +++ b/gcc/compare-elim.c
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "recog.h"
> +#include "emit-rtl.h"
>  #include "cfgrtl.h"
>  #include "tree-pass.h"
>  #include "domwalk.h"
> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, 
> rtx_insn *start)
>return reg;
>  }
>  
> +/* Return true if it is okay to merge the comparison CMP_INSN with
> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
> +   that there are no uses or defs of the condition flags or control flow
> +   changes between the two instructions.  */
> +
> +static bool
> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
> +{
> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
> +   insn && insn != arith_insn;
> +   insn = PREV_INSN (insn))
> +{
> +  if (!NONDEBUG_INSN_P (insn))
> + continue;
> +  /* Bail if there are jumps or calls in between.  */
> +  if (!NONJUMP_INSN_P (insn))
> + return false;
> +
> +  df_ref ref;
> +  /* Find a USE of the flags register.  */
> +  FOR_EACH_INSN_USE (ref, insn)
> + if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +   return false;
> +
> +  /* Find a DEF of the flags register.  */
> +  FOR_EACH_INSN_DEF (ref, insn)
> + if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +   return false;
> +}
> +  return true;
> +}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


> +
> +/* Given two SET expressions, SET_A and SET_B determine whether they form
> +   a recognizable pattern when emitted in parallel.  Return that parallel
> +   if so.  Otherwise return NULL.  This tries both permutations of SET_A
> +   and SET_B within the PARALLEL.  */
> +
> +static rtx
> +try_validate