On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pins...@gmail.com> wrote:
> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pins...@gmail.com> wrote:
>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>> <zhenqiang.c...@linaro.org> wrote:
>>> On 18 October 2013 00:58, Jeff Law <l...@redhat.com> wrote:
>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>
>>>>>>> Is it OK for trunk?
>>>>>>
>>>>>>
>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>> can update it if people think this is a better approach).
>>>>>
>>>>>
>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>> for force_gimple_operand).
>>>>
>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>> important.
>>>>
>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>
>>> Here is a rough compare:
>>>
>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>
>> This should be an easy change, I am working on this right now.
>>
>>>
>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) <= 0;
>>>   _10 = ~_6;
>>>   _11 = _9 & _10;
>>>   if (_11 == 0)
>>>
>>> With my patch, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) > 0;
>>>   _10 = _6 | _9;
>>>   if (_10 != 0)
>>
>> As mentioned otherwise, this seems like a missed optimization inside
>> forwprop.  When I originally wrote this code there used to be two
>> cases one for & and one for |, but this was removed sometime and I
>> just made the code evolve with that.
>
> Actually I can make a small patch (3 lines) to my current patch which
> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>
>  if (result_inv)
>    {
>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>      result_inv = false;
>    }
>
> I will submit a new patch after some testing of my current patch which
> fixes items 1 and 2.


Here is my latest patch which adds the testcases from Zhenqiang's
patch and fixes item 1 and 2.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.


testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
intermediate".




>
> Thanks,
> Andrew Pinski
>
>
>>
>>>
>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>> the basic block walk" so it can do combine recursively.
>>>
>>> But I think we need some heuristic to control the number of ifs. Move
>>> too much compares from
>>> the inner_bb to outer_bb is not good.
>>
>>
>> I think this depends on the target.  For MIPS we don't want an upper
>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>> with MIPS in mind as that was the target I was working on.
>>
>>>
>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>> existing functions. So it looks much simple.
>>
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>>>>
>>>>> With that we should be able to kill the fold-const.c transform?
>>>>
>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>
>>> That's my final goal to "kill the fold-const.c transform". I think we
>>> may combine the two changes to make a "simple" and "good" patch.
>>>
>>> Thanks!
>>> -Zhenqiang
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c    (revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    if (b > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c        (revision 204095)
+++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c        (working copy)
@@ -42,7 +42,7 @@ expand_one_var (tree var, unsigned char
     abort ();
 }
 /* We should thread the jump, through an intermediate block.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) 
incoming edge;  \\(.*\\) joiner;  \\(.*\\) nocopy;" 1 "dom1"} } */
+/* { dg-final { scan-tree-dump "Threaded" "dom1"} } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) 
incoming edge;" 1 "dom1"} } */
 /* { dg-final { cleanup-tree-dump "dom1" } } */
 
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c    (revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L1;
+  return 0;
+L1:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c    (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  else
+    goto L2;
+L1:
+  if (b > 0)
+    goto L2;
+  return 5;
+L2:
+  return 6;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c    (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L2;
+L1:
+  return 0;
+L2:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/phi-opt-9.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/phi-opt-9.c       (revision 204095)
+++ testsuite/gcc.dg/tree-ssa/phi-opt-9.c       (working copy)
@@ -2,13 +2,14 @@
 /* { dg-options "-O -fdump-tree-optimized" } */
 
 int g(int,int);
+int h(int);
 int f(int t, int c)
 {
   int d = 0;
   int e = 0;
   if (t)
     {
-      d = c+1;
+      d = h(c);
       e = t;
     }
   else d = 0, e = 0;
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c    (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 && b > 0 && c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\&" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c    (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c    (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* 
powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 || b > 0 || c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\\|" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: tree-ssa-ifcombine.c
===================================================================
--- tree-ssa-ifcombine.c        (revision 204095)
+++ tree-ssa-ifcombine.c        (working copy)
@@ -22,6 +22,10 @@ along with GCC; see the file COPYING3.
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+/* rtl is needed only because arm back-end requires it for
+   BRANCH_COST.  */
+#include "rtl.h"
+#include "tm_p.h"
 #include "tree.h"
 #include "basic-block.h"
 #include "tree-pretty-print.h"
@@ -32,6 +36,12 @@ along with GCC; see the file COPYING3.
 #include "ssa-iterators.h"
 #include "tree-pass.h"
 
+#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
+#define LOGICAL_OP_NON_SHORT_CIRCUIT \
+  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
+                false) >= 2)
+#endif
+
 /* This pass combines COND_EXPRs to simplify control flow.  It
    currently recognizes bit tests and comparisons in chains that
    represent logical and or logical or of two COND_EXPRs.
@@ -488,7 +498,35 @@ ifcombine_ifandif (basic_block inner_con
                                            outer_cond_code,
                                            gimple_cond_lhs (outer_cond),
                                            gimple_cond_rhs (outer_cond))))
-       return false;
+       {
+         tree t1, t2;
+         gimple_stmt_iterator gsi;
+         if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
+           return false;
+         /* Only do this optimization if the inner bb contains only the 
conditional. */
+         if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb 
(inner_cond_bb)))
+           return false;
+         t1 = fold_build2_loc (gimple_location (inner_cond),
+                               inner_cond_code,
+                               boolean_type_node,
+                               gimple_cond_lhs (inner_cond),
+                               gimple_cond_rhs (inner_cond));
+         t2 = fold_build2_loc (gimple_location (outer_cond),
+                               outer_cond_code,
+                               boolean_type_node,
+                               gimple_cond_lhs (outer_cond),
+                               gimple_cond_rhs (outer_cond));
+         t = fold_build2_loc (gimple_location (inner_cond), 
+                              TRUTH_AND_EXPR, boolean_type_node, t1, t2);
+         if (result_inv)
+           {
+             t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
+             result_inv = false;
+           }
+         gsi = gsi_for_stmt (inner_cond);
+         t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, 
true,
+                                         GSI_SAME_STMT);
+        }
       if (result_inv)
        t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
       t = canonicalize_cond_expr_cond (t);
@@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);
 
-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
     {
       basic_block bb = bbs[i];
       gimple stmt = last_stmt (bb);
Index: gimple.h
===================================================================
--- gimple.h    (revision 204095)
+++ gimple.h    (working copy)
@@ -5533,6 +5533,19 @@ gsi_start_nondebug_bb (basic_block bb)
 
   return i;
 }
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)
+{
+  gimple_stmt_iterator i = gsi_after_labels (bb);
+
+  if (!gsi_end_p (i) && is_gimple_debug (gsi_stmt (i)))
+    gsi_next_nondebug (&i);
+
+  return i;
+}
 
 /* Return a new iterator pointing to the last non-debug statement in
    basic block BB.  */

Reply via email to