Hi all, I'm looking into improving usage of aarch64 conditional compare instructions and PR 67628 is one area of improvement I identified. The problem there is different tree-level behaviour for the expressions: (a > b && b <= c) && c > d vs. a > b && (b <= c && c > d)
The second variant generates worse code for aarch64 (and x86_64) because tree ifcombine doesn't do as good a job at merging basic blocks. I've tracked this down to the function ifcombine_andif in tree-ssa-ifcombine.c. With this patch I get the good codegen from that PR, both forms are combined into a single basic block and the conditional compare expansion code in ccmp.c does a good job at generating the ccmp instructions. I also see about 2% more ccmp instructions being generated for SPEC2006 with the code generally looking better as well (if I remove this whole single-conditional restriction completely I get 19% more conditional compares in SPEC2006, but that's something I want to investigate separately). The idea is that for the two expressions given above the control flow and contents of the basic blocks is the same, but when ifcombine_ifandif is called the inner_cond_bb and outer_cond_bb arguments are reversed i.e. for the first expression the inner_cond_bb contains a single comparison c > d and outer_cond_bb contains multiple comparisons a > b && b <= c. In the second case the outer cond contains the single comparison a > b and the inner block this time contains the multiple conditions a > b && b <= c, thus failing the "Only do this optimization if the inner bb contains only the conditional" check. The patch makes this condition more symmetrical by rejecting this optimisation only if neither the inner nor the outer condition blocks are simple. Unfortunately, I see a testsuite regression with this patch: FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<" The reduced part of that test is: void test1 (int x, unsigned u) { if ((1U << x) != 64 || (2 << x) != u || (x << x) != 384 || (3 << x) == 9 || (x << 14) != 98304U || (1 << x) == 14 || (3 << 2) != 12) __builtin_abort (); } The patched ifcombine pass works more or less as expected and produces fewer basic blocks. Before this patch a relevant part of the ifcombine dump for test1 is: ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot if (x_1(D) != 6) goto <bb 6>; else goto <bb 3>; ;; basic block 3, loop depth 0, count 0, freq 9996, maybe hot _2 = 2 << x_1(D); _3 = (unsigned intD.10) _2; if (_3 != u_4(D)) goto <bb 6>; else goto <bb 4>; After this patch it is: ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot _2 = 2 << x_1(D); _3 = (unsigned intD.10) _2; _9 = _3 != u_4(D); _10 = x_1(D) != 6; _11 = _9 | _10; if (_11 != 0) goto <bb 5>; else goto <bb 3>; The second form ends up generating worse codegen however, and the badness starts with the dom1 pass. In the unpatched case it manages to deduce that x must be 6 by the time it reaches basic block 3 and uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from basic block 3 In the patched case it is unable to make that call, I think because the x != 6 condition is IORed with another test. I'm not familiar with the internals of the dom pass, so I'm not sure where to go looking for a fix for this. Is the ifcombine change a step in the right direction? If so, what would need to be done to fix the issue with the dom pass? I suppose what we want is to not combine basic blocks if the sequence and conditions of the basic blocks are such that dom can potentially exploit them, but how do we express that? Thanks, Kyrill P.S. I can provide more complete tree dumps for the examples if needed. 2015-09-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> PR tree-optimization/67628 * tree-ssa-ifcombine.c (ifcombine_ifandif): Allow optimization when either the inner or outer block contain a single conditional.
diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c index 9f04174..bfe17ba 100644 --- a/gcc/tree-ssa-ifcombine.c +++ b/gcc/tree-ssa-ifcombine.c @@ -511,8 +511,12 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv, gimple_stmt_iterator gsi; if (!LOGICAL_OP_NON_SHORT_CIRCUIT) return false; - /* Only do this optimization if the inner bb contains only the conditional. */ - if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb))) + /* Only do this optimization if inner or outer bb contains + only the conditional. */ + if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb ( + inner_cond_bb)) + && !gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb ( + outer_cond_bb))) return false; t1 = fold_build2_loc (gimple_location (inner_cond), inner_cond_code,