[RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

Kyrill Tkachov Tue, 22 Sep 2015 06:37:19 -0700

Hi all,

I'm looking into improving usage of aarch64 conditional compare instructions 
and PR 67628
is one area of improvement I identified. The problem there is different 
tree-level behaviour
for the expressions:
(a > b && b <= c) && c > d
vs.
a > b && (b <= c && c > d)


The second variant generates worse code for aarch64 (and x86_64) because tree 
ifcombine doesn't
do as good a job at merging basic blocks.

I've tracked this down to the function ifcombine_andif in tree-ssa-ifcombine.c.
With this patch I get the good codegen from that PR, both forms are combined 
into a single basic block
and the conditional compare expansion code in ccmp.c does a good job at 
generating the ccmp instructions.
I also see about 2% more ccmp instructions being generated for SPEC2006 with 
the code generally looking
better as well (if I remove this whole single-conditional restriction 
completely I get 19% more conditional
compares in SPEC2006, but that's something I want to investigate separately).

The idea is that for the two expressions given above the control flow and 
contents of the basic blocks
is the same, but when ifcombine_ifandif is called the inner_cond_bb and 
outer_cond_bb arguments are
reversed i.e. for the first expression the inner_cond_bb contains a single 
comparison c > d
and outer_cond_bb contains multiple comparisons a > b && b <= c.
In the second case the outer cond contains the single comparison a > b and the 
inner block this
time contains the multiple conditions a > b && b <= c, thus failing the
"Only do this optimization if the inner bb contains only the conditional" check.
The patch makes this condition more symmetrical by rejecting this optimisation 
only if
neither the inner nor the outer condition blocks are simple.

Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
  if ((1U << x) != 64
      || (2 << x) != u
      || (x << x) != 384
      || (3 << x) == 9
      || (x << 14) != 98304U
      || (1 << x) == 14
      || (3 << 2) != 12)
    __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces fewer 
basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
  if (x_1(D) != 6)
    goto <bb 6>;
  else
    goto <bb 3>;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
  _2 = 2 << x_1(D);
  _3 = (unsigned intD.10) _2;
  if (_3 != u_4(D))
    goto <bb 6>;
  else
    goto <bb 4>;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
  _2 = 2 << x_1(D);
  _3 = (unsigned intD.10) _2;
  _9 = _3 != u_4(D);
  _10 = x_1(D) != 6;
  _11 = _9 | _10;
  if (_11 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

The second form ends up generating worse codegen however, and the badness 
starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the time it 
reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from basic 
block 3
In the patched case it is unable to make that call, I think because the x != 6 
condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure where to 
go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what would need 
to be done to fix the issue with
the dom pass?

I suppose what we want is to not combine basic blocks if the sequence and 
conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express that?

Thanks,
Kyrill

P.S. I can provide more complete tree dumps for the examples if needed.

2015-09-22  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR tree-optimization/67628
    * tree-ssa-ifcombine.c (ifcombine_ifandif): Allow optimization
    when either the inner or outer block contain a single conditional.

diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
index 9f04174..bfe17ba 100644
--- a/gcc/tree-ssa-ifcombine.c
+++ b/gcc/tree-ssa-ifcombine.c
@@ -511,8 +511,12 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 	  gimple_stmt_iterator gsi;
 	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
 	    return false;
-	  /* Only do this optimization if the inner bb contains only the conditional. */
-	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
+	  /* Only do this optimization if inner or outer bb contains
+	     only the conditional. */
+	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (
+				     inner_cond_bb))
+	      && !gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (
+					 outer_cond_bb)))
 	    return false;
 	  t1 = fold_build2_loc (gimple_location (inner_cond),
 				inner_cond_code,

[RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

Reply via email to