Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
> Are we in agreement that I should revert the patch? I think that you can leave it for now in order to have some feedback (see for example PR rtl-optimization/82597) but ideally it should be rewritten so as to reuse the existing infrastructure of the pass. -- Eric Botcazou
RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Are we in agreement that I should revert the patch? -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, October 17, 2017 1:10 PM To: Michael Collison ; Eric Botcazou Cc: Jeff Law ; GCC Patches ; Segher Boessenkool ; Kyrill Tkachov ; nd Subject: RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison wrote: >Richard and Eric, > >I see you have objected and indicated the additional cost. Have you >quantified how much more expensive the pass is? DF has known quadratic behavior in memory for certain problems. Not sure off head if DU and UD fall into this category. Richard. >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Tuesday, October 17, 2017 4:45 AM >To: Eric Botcazou >Cc: Jeff Law ; GCC Patches ; >Michael Collison ; Segher Boessenkool >; Kyrill Tkachov >; nd >Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal >ops > >On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou >wrote: >>> This looks good. OK for the trunk. >> >> FWIW I disagree. The patch completely shuns the existing >> implementation of the pass, which is based on a forward scan within >> basic blocks to identify the various interesting instructions and >> record them, and uses full-blown def-use and use-def chains instead, >> which are much more costly to compute. It's not clear to me why the >existing implementation couldn't have been extended. >> >> The result is that, for targets for which the pass was initially >written, i.e. >> targets for which most (all) arithmetic instructions clobber the >> flags, the pass will be slower for absolutely no benefits, as the >> existing implementation would already have caught all the interesting >cases. >> >> So it's again a case of a generic change made for a specific target >> without consideration for other, admittedly less mainstream, >targets... > >I agree with Eric here. > >Richard. > >> -- >> Eric Botcazou
RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison wrote: >Richard and Eric, > >I see you have objected and indicated the additional cost. Have you >quantified how much more expensive the pass is? DF has known quadratic behavior in memory for certain problems. Not sure off head if DU and UD fall into this category. Richard. >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Tuesday, October 17, 2017 4:45 AM >To: Eric Botcazou >Cc: Jeff Law ; GCC Patches ; >Michael Collison ; Segher Boessenkool >; Kyrill Tkachov >; nd >Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal >ops > >On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou >wrote: >>> This looks good. OK for the trunk. >> >> FWIW I disagree. The patch completely shuns the existing >> implementation of the pass, which is based on a forward scan within >> basic blocks to identify the various interesting instructions and >> record them, and uses full-blown def-use and use-def chains instead, >> which are much more costly to compute. It's not clear to me why the >existing implementation couldn't have been extended. >> >> The result is that, for targets for which the pass was initially >written, i.e. >> targets for which most (all) arithmetic instructions clobber the >> flags, the pass will be slower for absolutely no benefits, as the >> existing implementation would already have caught all the interesting >cases. >> >> So it's again a case of a generic change made for a specific target >> without consideration for other, admittedly less mainstream, >targets... > >I agree with Eric here. > >Richard. > >> -- >> Eric Botcazou
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
> I see you have objected and indicated the additional cost. Have you > quantified how much more expensive the pass is? No, but use-def chains are known to be slow because DF is slow, see e.g. the comment located a few lines below the call to try_merge_compare: /* ??? This is one point at which one could argue that DF_REF_CHAIN would be useful, but it is thought to be too heavy-weight a solution here. */ Note that the patch breaks e.g. the Visium port, because the pass now sends all kind of junk RTXes to the select_cc_mode target hook, which was written to be in exact keeping with the arithmetic patterns of the MD file. I'm going to fix the breakage of course, but this shows that the patch burns a large amount of cycles for targets like Visium for no benefit. Index: config/visium/visium.c === --- config/visium/visium.c (revision 253767) +++ config/visium/visium.c (working copy) @@ -2938,12 +2938,6 @@ visium_select_cc_mode (enum rtx_code cod /* This is a btst, the result is in C instead of Z. */ return CCCmode; -case CONST_INT: - /* This is a degenerate case, typically an uninitialized variable. */ - gcc_assert (op0 == constm1_rtx); - - /* ... fall through ... */ - case REG: case AND: case IOR: @@ -2953,6 +2947,7 @@ visium_select_cc_mode (enum rtx_code cod case LSHIFTRT: case TRUNCATE: case SIGN_EXTEND: +case ZERO_EXTEND: /* Pretend that the flags are set as for a COMPARE with zero. That's mostly true, except for the 2 right shift insns that will set the C flag. But the C flag is relevant only for @@ -2960,6 +2955,16 @@ visium_select_cc_mode (enum rtx_code cod when applied to a comparison with zero. */ return CCmode; +/* ??? Cater to the junk RTXes sent by try_merge_compare. */ +case ASM_OPERANDS: +case CALL: +case CONST_INT: +case LO_SUM: +case HIGH: +case MEM: +case UNSPEC: + return CCmode; + default: gcc_unreachable (); -- Eric Botcazou
RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Richard and Eric, I see you have objected and indicated the additional cost. Have you quantified how much more expensive the pass is? -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, October 17, 2017 4:45 AM To: Eric Botcazou Cc: Jeff Law ; GCC Patches ; Michael Collison ; Segher Boessenkool ; Kyrill Tkachov ; nd Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou wrote: >> This looks good. OK for the trunk. > > FWIW I disagree. The patch completely shuns the existing > implementation of the pass, which is based on a forward scan within > basic blocks to identify the various interesting instructions and > record them, and uses full-blown def-use and use-def chains instead, > which are much more costly to compute. It's not clear to me why the existing > implementation couldn't have been extended. > > The result is that, for targets for which the pass was initially written, i.e. > targets for which most (all) arithmetic instructions clobber the > flags, the pass will be slower for absolutely no benefits, as the > existing implementation would already have caught all the interesting cases. > > So it's again a case of a generic change made for a specific target > without consideration for other, admittedly less mainstream, targets... I agree with Eric here. Richard. > -- > Eric Botcazou
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou wrote: >> This looks good. OK for the trunk. > > FWIW I disagree. The patch completely shuns the existing implementation of > the pass, which is based on a forward scan within basic blocks to identify the > various interesting instructions and record them, and uses full-blown def-use > and use-def chains instead, which are much more costly to compute. It's not > clear to me why the existing implementation couldn't have been extended. > > The result is that, for targets for which the pass was initially written, i.e. > targets for which most (all) arithmetic instructions clobber the flags, the > pass will be slower for absolutely no benefits, as the existing implementation > would already have caught all the interesting cases. > > So it's again a case of a generic change made for a specific target without > consideration for other, admittedly less mainstream, targets... I agree with Eric here. Richard. > -- > Eric Botcazou
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
> This looks good. OK for the trunk. FWIW I disagree. The patch completely shuns the existing implementation of the pass, which is based on a forward scan within basic blocks to identify the various interesting instructions and record them, and uses full-blown def-use and use-def chains instead, which are much more costly to compute. It's not clear to me why the existing implementation couldn't have been extended. The result is that, for targets for which the pass was initially written, i.e. targets for which most (all) arithmetic instructions clobber the flags, the pass will be slower for absolutely no benefits, as the existing implementation would already have caught all the interesting cases. So it's again a case of a generic change made for a specific target without consideration for other, admittedly less mainstream, targets... -- Eric Botcazou
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On 09/06/2017 09:56 AM, Michael Collison wrote: > Patch updated with all relevant comments and suggestions. > > Bootstrapped and tested on arm-none-linux-gnueabihf, and > aarch64-none-linux-gnu and x86_64. > > Ok for trunk? > > 2017-08-05 Kyrylo Tkachov > Michael Collison > > * compare-elim.c: Include emit-rtl.h. > (can_merge_compare_into_arith): New function. > (try_validate_parallel): Likewise. > (try_merge_compare): Likewise. > (try_eliminate_compare): Call the above when no previous clobber > is available. > (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN > dataflow problems. > > 2017-08-05 Kyrylo Tkachov > Michael Collison > > * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. Sorry for the delay. This looks good. OK for the trunk. jeff
RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Patch updated with all relevant comments and suggestions. Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu and x86_64. Ok for trunk? 2017-08-05 Kyrylo Tkachov Michael Collison * compare-elim.c: Include emit-rtl.h. (can_merge_compare_into_arith): New function. (try_validate_parallel): Likewise. (try_merge_compare): Likewise. (try_eliminate_compare): Call the above when no previous clobber is available. (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN dataflow problems. 2017-08-05 Kyrylo Tkachov Michael Collison * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. -Original Message- From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] Sent: Saturday, September 2, 2017 12:07 AM To: Kyrill Tkachov Cc: Jeff Law ; Michael Collison ; gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops Hi! On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote: > On 28/08/17 19:26, Jeff Law wrote: > >On 08/10/2017 03:14 PM, Michael Collison wrote: > >>One issue that we keep encountering on aarch64 is GCC not making > >>good use of the flag-setting arithmetic instructions like ADDS, > >>SUBS, ANDS etc. that perform an arithmetic operation and compare the > >>result against zero. > >>They are represented in a fairly standard way in the backend as > >>PARALLEL > >>patterns: > >>(parallel [(set (reg x1) (op (reg x2) (reg x3))) > >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int > >>0)))]) That is incorrect: the compare has to come first. From md.texi: @cindex @code{compare}, canonicalization of [ ... ] @item For instructions that inherently set a condition code register, the @code{compare} operator is always written as the first RTL expression of the @code{parallel} instruction pattern. For example, [ ... ] aarch64.md seems to do this correctly, fwiw. > >>GCC isn't forming these from separate arithmetic and comparison > >>instructions as aggressively as it could. > >>A particular pain point is when the result of the arithmetic insn is > >>used before the comparison instruction. > >>The testcase in this patch is one such example where we have: > >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (plus:SI (reg:SI 0 x0 [ x ]) > >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > >> (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 34 33 17 2 (set (reg:CC 66 cc) > >> (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > >> (nil)) > >> > >>This scares combine away as x0 is used in insn 33 as well as the > >>comparison in insn 34. > >>I think the compare-elim pass can help us here. > >Is it the multiple use or the hard register that combine doesn't > >appreciate. The latter would definitely steer us towards compare-elim. > > It's the multiple use IIRC. Multiple use, and multiple set (of x1), and more complications... 7+33 won't combine to an existing insn. 7+34 will not even be tried (insn 33 is the first use of x0, not insn 34). But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so 7 cannot be merged into 34. 7+33+34 results in a parallel of a compare with the same invalid insn as in the 7+33 case. Combine would try to split it to two insns again, except it already has two insns (the arith and the compare). It does not see that when it splits the insn it can combine the first half with the compare. What would be needed is pulling insn 34 before insn 33 (which is fine, no conflicts there), and then we could combine 7+34 just fine. But combine tries to be linear complexity, and it really cannot change insns around anyway. Segher pr5198v2.patch Description: pr5198v2.patch
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Hi! On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote: > On 28/08/17 19:26, Jeff Law wrote: > >On 08/10/2017 03:14 PM, Michael Collison wrote: > >>One issue that we keep encountering on aarch64 is GCC not making good use > >>of the flag-setting arithmetic instructions > >>like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and > >>compare the result against zero. > >>They are represented in a fairly standard way in the backend as PARALLEL > >>patterns: > >>(parallel [(set (reg x1) (op (reg x2) (reg x3))) > >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int > >>0)))]) That is incorrect: the compare has to come first. From md.texi: @cindex @code{compare}, canonicalization of [ ... ] @item For instructions that inherently set a condition code register, the @code{compare} operator is always written as the first RTL expression of the @code{parallel} instruction pattern. For example, [ ... ] aarch64.md seems to do this correctly, fwiw. > >>GCC isn't forming these from separate arithmetic and comparison > >>instructions as aggressively as it could. > >>A particular pain point is when the result of the arithmetic insn is used > >>before the comparison instruction. > >>The testcase in this patch is one such example where we have: > >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (plus:SI (reg:SI 0 x0 [ x ]) > >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > >> (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 34 33 17 2 (set (reg:CC 66 cc) > >> (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > >> (nil)) > >> > >>This scares combine away as x0 is used in insn 33 as well as the > >>comparison in insn 34. > >>I think the compare-elim pass can help us here. > >Is it the multiple use or the hard register that combine doesn't > >appreciate. The latter would definitely steer us towards compare-elim. > > It's the multiple use IIRC. Multiple use, and multiple set (of x1), and more complications... 7+33 won't combine to an existing insn. 7+34 will not even be tried (insn 33 is the first use of x0, not insn 34). But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so 7 cannot be merged into 34. 7+33+34 results in a parallel of a compare with the same invalid insn as in the 7+33 case. Combine would try to split it to two insns again, except it already has two insns (the arith and the compare). It does not see that when it splits the insn it can combine the first half with the compare. What would be needed is pulling insn 34 before insn 33 (which is fine, no conflicts there), and then we could combine 7+34 just fine. But combine tries to be linear complexity, and it really cannot change insns around anyway. Segher
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On 08/29/2017 02:39 AM, Kyrill Tkachov wrote: >>> This scares combine away as x0 is used in insn 33 as well as the >>> comparison in insn 34. >>> I think the compare-elim pass can help us here. >> Is it the multiple use or the hard register that combine doesn't >> appreciate. The latter would definitely steer us towards compare-elim. > > It's the multiple use IIRC. OK. I can see a case where multiple use causes a problem as well. That non-combinable use at insn 33 could well be a problem for combining insn 7 and insn 34 from your sample RTL. >> What about old style asms? Do you need to explicit punt those? I don't >> think they carry DF info. > > I think we want to bail out on those to be safe. How are they > represented in RTL? I think old style asms will be an insn with an ASM_INPUT as their pattern. New sytle asms (ie, with dataflow) would be an insn with ASM_OPERANDs as their pattern. > >> Don't you also have to verify that the inputs to ARITH_INSN are >> unchanged between ARITH_INSN and CMP_INSN? > > I don't think so. As long as there is no flag clobbering instructions > between them we should be ok. Right. It's really a matter of which direction you go. Working with your example: > > Consider: > r1 := r2 + r3 // arith_insn > r2 := r4 + r5 // arith_insn input modified before cmp_insn > cc := CMP r1 0 // cmp_insn combining the r1 := and the cc:= statements essentially means one is going to move across the r2 := statment. If the cc assignment moves, then you need to verify that cc and r1 are not changed. If the r1 assignment moves, then you need to verify that r2 and r3 are not changed. You're doing the former but I was thinking of the latter. >
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Hi Jeff, Some comments since I wrote this patch some time ago... On 28/08/17 19:26, Jeff Law wrote: On 08/10/2017 03:14 PM, Michael Collison wrote: Hi all, One issue that we keep encountering on aarch64 is GCC not making good use of the flag-setting arithmetic instructions like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the result against zero. They are represented in a fairly standard way in the backend as PARALLEL patterns: (parallel [(set (reg x1) (op (reg x2) (reg x3))) (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))]) GCC isn't forming these from separate arithmetic and comparison instructions as aggressively as it could. A particular pain point is when the result of the arithmetic insn is used before the comparison instruction. The testcase in this patch is one such example where we have: (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) (plus:SI (reg:SI 0 x0 [ x ]) (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} (nil)) (insn 33 7 34 2 (set (reg:SI 1 x1 [77]) (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} (nil)) (insn 34 33 17 2 (set (reg:CC 66 cc) (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) (const_int 0 [0]))) "comb.c":4 391 {cmpsi} (nil)) This scares combine away as x0 is used in insn 33 as well as the comparison in insn 34. I think the compare-elim pass can help us here. Is it the multiple use or the hard register that combine doesn't appreciate. The latter would definitely steer us towards compare-elim. It's the multiple use IIRC. This patch extends it by handling comparisons against zero, finding the defining instruction of the compare and merging the comparison with the defining instruction into a PARALLEL that will hopefully match the form described above. If between the comparison and the defining insn we find an instruction that uses the condition registers or any control flow we bail out, and we don't cross basic blocks. This simple technique allows us to catch cases such as this example. Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and x86_64. Ok for trunk? 2017-08-05 Kyrylo Tkachov Michael Collison * compare-elim.c: Include emit-rtl.h. (can_merge_compare_into_arith): New function. (try_validate_parallel): Likewise. (try_merge_compare): Likewise. (try_eliminate_compare): Call the above when no previous clobber is available. (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN dataflow problems. 2017-08-05 Kyrylo Tkachov Michael Collison * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. pr5198v1.patch diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c index 7e557a2..c65d155 100644 --- a/gcc/compare-elim.c +++ b/gcc/compare-elim.c @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "tm_p.h" #include "insn-config.h" #include "recog.h" +#include "emit-rtl.h" #include "cfgrtl.h" #include "tree-pass.h" #include "domwalk.h" @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start) return reg; } +/* Return true if it is okay to merge the comparison CMP_INSN with + the instruction ARITH_INSN. Both instructions are assumed to be in the + same basic block with ARITH_INSN appearing before CMP_INSN. This checks + that there are no uses or defs of the condition flags or control flow + changes between the two instructions. */ + +static bool +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn) +{ + for (rtx_insn *insn = PREV_INSN (cmp_insn); + insn && insn != arith_insn; + insn = PREV_INSN (insn)) +{ + if (!NONDEBUG_INSN_P (insn)) + continue; + /* Bail if there are jumps or calls in between. */ + if (!NONJUMP_INSN_P (insn)) + return false; + + df_ref ref; + /* Find a USE of the flags register. */ + FOR_EACH_INSN_USE (ref, insn) + if (DF_REF_REGNO (ref) == targetm.flags_regnum) + return false; + + /* Find a DEF of the flags register. */ + FOR_EACH_INSN_DEF (ref, insn) + if (DF_REF_REGNO (ref) == targetm.flags_regnum) + return false; +} + return true; +} What about old style asms? Do you need to explicit punt those? I don't think they carry DF info. I think we want to bail out on those to be safe. How are they represented in RTL? Don't you also have to verify that the inputs to ARITH_INSN are unchanged between ARITH_INSN and CMP_INSN? I don't think so. As long as there is no flag clobbering instructions between them we should be ok. Consider: r1 := r2 + r3 // arith_insn r2 := r4 + r5 // arith_insn input modified before cmp_insn cc := CMP r1 0 // cmp_insn this would be transformed into {r1 :=
Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
On 08/10/2017 03:14 PM, Michael Collison wrote: > Hi all, > > One issue that we keep encountering on aarch64 is GCC not making good use of > the flag-setting arithmetic instructions > like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare > the result against zero. > They are represented in a fairly standard way in the backend as PARALLEL > patterns: > (parallel [(set (reg x1) (op (reg x2) (reg x3))) >(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))]) > > GCC isn't forming these from separate arithmetic and comparison instructions > as aggressively as it could. > A particular pain point is when the result of the arithmetic insn is used > before the comparison instruction. > The testcase in this patch is one such example where we have: > (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) > (plus:SI (reg:SI 0 x0 [ x ]) > (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > (nil)) > (insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) > (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > (nil)) > (insn 34 33 17 2 (set (reg:CC 66 cc) > (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) > (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > (nil)) > > This scares combine away as x0 is used in insn 33 as well as the comparison > in insn 34. > I think the compare-elim pass can help us here. Is it the multiple use or the hard register that combine doesn't appreciate. The latter would definitely steer us towards compare-elim. > > This patch extends it by handling comparisons against zero, finding the > defining instruction of the compare > and merging the comparison with the defining instruction into a PARALLEL that > will hopefully match the form > described above. If between the comparison and the defining insn we find an > instruction that uses the condition > registers or any control flow we bail out, and we don't cross basic blocks. > This simple technique allows us to catch cases such as this example. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64. > > Ok for trunk? > > 2017-08-05 Kyrylo Tkachov > Michael Collison > > * compare-elim.c: Include emit-rtl.h. > (can_merge_compare_into_arith): New function. > (try_validate_parallel): Likewise. > (try_merge_compare): Likewise. > (try_eliminate_compare): Call the above when no previous clobber > is available. > (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN > dataflow problems. > > 2017-08-05 Kyrylo Tkachov > Michael Collison > > * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. > > > pr5198v1.patch > > > diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c > index 7e557a2..c65d155 100644 > --- a/gcc/compare-elim.c > +++ b/gcc/compare-elim.c > @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see > #include "tm_p.h" > #include "insn-config.h" > #include "recog.h" > +#include "emit-rtl.h" > #include "cfgrtl.h" > #include "tree-pass.h" > #include "domwalk.h" > @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, > rtx_insn *start) >return reg; > } > > +/* Return true if it is okay to merge the comparison CMP_INSN with > + the instruction ARITH_INSN. Both instructions are assumed to be in the > + same basic block with ARITH_INSN appearing before CMP_INSN. This checks > + that there are no uses or defs of the condition flags or control flow > + changes between the two instructions. */ > + > +static bool > +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn) > +{ > + for (rtx_insn *insn = PREV_INSN (cmp_insn); > + insn && insn != arith_insn; > + insn = PREV_INSN (insn)) > +{ > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + /* Bail if there are jumps or calls in between. */ > + if (!NONJUMP_INSN_P (insn)) > + return false; > + > + df_ref ref; > + /* Find a USE of the flags register. */ > + FOR_EACH_INSN_USE (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > + > + /* Find a DEF of the flags register. */ > + FOR_EACH_INSN_DEF (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > +} > + return true; > +} What about old style asms? Do you need to explicit punt those? I don't think they carry DF info. Don't you also have to verify that the inputs to ARITH_INSN are unchanged between ARITH_INSN and CMP_INSN? > + > +/* Given two SET expressions, SET_A and SET_B determine whether they form > + a recognizable pattern when emitted in parallel. Return that parallel > + if so. Otherwise return NULL. This tries both permutations of SET_A > + and SET_B within the PARALLEL. */ > + > +static rtx > +try_validate