Hi Bin, Thanks for the review. Please find the revised patch based on the review comments.
Thanks, Kugan On 17 May 2018 at 19:56, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: >> Hi Richard, >> >> On 6 March 2018 at 02:24, Richard Biener <richard.guent...@gmail.com> wrote: >>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah >>> <kugan.vivekanandara...@linaro.org> wrote: >>>> Hi Richard, >>>> >>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> >>>> wrote: >>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah >>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>> Hi Richard, >>>>>> >>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> >>>>>> wrote: >>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> Thanks for the review. >>>>>>>> On 25 January 2018 at 20:04, Richard Biener >>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>>>>>> would like to queue this for review for next stage 1. >>>>>>>>>> >>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>>>>>> above. >>>>>>>>>> 2. This does not distribute loop to detect popcount (like >>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please >>>>>>>>>> correct >>>>>>>>>> me if I am wrong. >>>>>>>>> >>>>>>>>> But then it has no business inside loop distribution but instead is >>>>>>>>> doing final value >>>>>>>>> replacement, right? You are pattern-matching the whole loop after >>>>>>>>> all. I think >>>>>>>>> final value replacement would already do the correct thing if you >>>>>>>>> teached number of >>>>>>>>> iteration analysis that niter for >>>>>>>>> >>>>>>>>> <bb 3> [local count: 955630224]: >>>>>>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>>>>>> _1 = b_11 + -1; >>>>>>>>> b_8 = _1 & b_11; >>>>>>>>> if (b_8 != 0) >>>>>>>>> goto <bb 6>; [89.00%] >>>>>>>>> else >>>>>>>>> goto <bb 8>; [11.00%] >>>>>>>>> >>>>>>>>> <bb 6> [local count: 850510900]: >>>>>>>>> goto <bb 3>; [100.00%] >>>>>>>> >>>>>>>> I am looking into this approach. What should be the scalar evolution >>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>>>>>> and can this be represented with the scev? >>>>>>> >>>>>>> No, it's not affine and thus cannot be represented. You only need the >>>>>>> scalar evolution of the counting IV which is already handled and >>>>>>> the number of iteration analysis needs to handle the above IV - this >>>>>>> is the missing part. >>>>>> Thanks for the clarification. I am now matching this loop pattern in >>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions >>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in >>>>>> the loop preheater and setting the loop niter with this. This will be >>>>>> used by the final value replacement. Is this what you wanted? >>>>> >>>>> No, you shouldn't insert a popcount stmt but instead the niter >>>>> GENERIC tree should be a CALL_EXPR to popcount with the >>>>> appropriate argument. >>>> >>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if >>>> niter in tree_niter_desc can take such. >>>> >>>> Attached patch now does this. Also had to add support for CALL_EXPR in >>>> few places to handle niter with CALL_EXPR. Does this look OK? >>> >>> Overall this looks ok - the patch includes changes in places that I don't >>> think >>> need changes such as chrec_convert_1 or extract_ops_from_tree. >>> The expression_expensive_p change should be more specific than making >>> all calls inexpensive as well. >> >> Changed it. >> >>> >>> The verify_ssa change looks bogus, you do >>> >>> + dest = gimple_phi_result (count_phi); >>> + tree var = make_ssa_name (TREE_TYPE (dest), NULL); >>> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >>> + >>> + var = build_call_expr (fn, 1, src); >>> + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, >>> + build_int_cst (TREE_TYPE (dest), 1)); >>> >>> why do you allocate a new SSA name here? It seems unused >>> as you overwrive 'var' with the CALL_EXPR immediately. >> Changed now. >> >>> >>> I didn't review the pattern matching thoroughly nor the exact place you >>> call it. But >>> >>> + if (check_popcount_pattern (loop, &count)) >>> + { >>> + niter->assumptions = boolean_false_node; >>> + niter->control.base = NULL_TREE; >>> + niter->control.step = NULL_TREE; >>> + niter->control.no_overflow = false; >>> + niter->niter = count; >>> + niter->assumptions = boolean_true_node; >>> + niter->may_be_zero = boolean_false_node; >>> + niter->max = -1; >>> + niter->bound = NULL_TREE; >>> + niter->cmp = ERROR_MARK; >>> + return true; >>> + } >>> >>> simply setting may_be_zero to false looks fishy. >> Should I set this to (argument to popcount == zero)? > No, I think that's unnecessary. The number of iterations is computed > as: may_be_zero ? 0 : niter; > Here niter is ZERO even when may_be_zero is set to false, and niters > is computed correctly. > > I think the point is that may_be_zero is false doesn't imply that > niters is non-zero. > >> >>> Try with -fno-tree-loop-ch. >> I changed the pattern matching to handle loop without header copying >> too. Looks a bit complicated checking all the conditions. Wondering if >> this can be done in a simpler and easier to read way. >> >>> Also max should not be negative, >>> it should be the number of bits in the IV type? >> >> Changed this too. >>> >>> A related testcase could be that we can completely peel >>> a loop like the following which iterates at most 8 times: >>> >>> int a[8]; >>> void foo (unsigned char ctrl) >>> { >>> int c = 0; >>> while (ctrl) >>> { >>> ctrl = ctrl & (ctrl - 1); >>> a[c++] = ctrl; >>> } >>> } >> >> Hmm, this is an interesting test case but as I am now trying to match >> a loop which does popcount, this is not handled. >> >> >> Attaching the current version for review. > Here are some comments. > >> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c >> index 7a54c5f..f390321 100644 >> --- a/gcc/tree-ssa-loop-niter.c >> +++ b/gcc/tree-ssa-loop-niter.c >> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop >> *loop, edge exit, >> return (!integer_zerop (niter->assumptions)); >> } >> >> +/* See if LOOP is a popcout implementation of the form >> + >> + int c = 0; >> + while (b) { >> + b = b & (b - 1); >> + c++; >> + } >> + >> + If so, Set NITER to __builtin_popcount (b) - 1 >> + return true if we did, false otherwise. */ >> + >> +static bool >> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max) >> +{ >> + tree lhs, rhs; >> + tree dest; >> + gimple *and_minus_one; >> + gimple *phi; >> + int count = 0; >> + gimple *count_stmt = NULL; >> + bool adjust = true; >> + >> + if (!single_exit (loop)) >> + return false; >> + >> + /* Check loop terminating branch is like >> + if (b != 0). */ >> + gimple *stmt = last_stmt (loop->header); >> + if (!stmt >> + || gimple_code (stmt) != GIMPLE_COND >> + || !zerop (gimple_cond_rhs (stmt))) > The check doesn't fully match the comment. NE is not checked here. > Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this > point, making simple checks earlier. > >> + return false; >> + >> + /* Check the loop closed SSA definition for just the variable c defined in >> + loop. */ >> + basic_block bb = single_exit (loop)->dest; > single_exit is repeatedly called various times, call it once and use > the returning edge instead? I am not sure GCC is smart enough > removing repeated call instances. Same to loop_latch_edge. > >> + for (gphi_iterator gpi = gsi_start_phis (bb); >> + !gsi_end_p (gpi); gsi_next (&gpi)) >> + { >> + phi = gpi.phi (); >> + count++; >> + } >> + >> + if (count != 1) >> + return false; >> + >> + rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx); >> + if (TREE_CODE (rhs) != SSA_NAME) >> + return false; >> + count_stmt = SSA_NAME_DEF_STMT (rhs); >> + lhs = gimple_cond_lhs (stmt); >> + if (TREE_CODE (lhs) != SSA_NAME) >> + return false; >> + gimple *and_stmt = SSA_NAME_DEF_STMT (lhs); >> + >> + /* Depending on copy-header is performed, feeding PHI stmts might be in >> + the loop header or loop exit, handle this. */ >> + if (gimple_code (count_stmt) == GIMPLE_PHI) >> + { >> + tree t; >> + if (gimple_code (and_stmt) != GIMPLE_PHI >> + || gimple_bb (and_stmt) != single_exit (loop)->src >> + || gimple_bb (count_stmt) != single_exit (loop)->src) >> + return false; >> + t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx); >> + if (TREE_CODE (t) != SSA_NAME) >> + return false; >> + count_stmt = SSA_NAME_DEF_STMT (t); >> + t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx); >> + if (TREE_CODE (t) != SSA_NAME) >> + return false; >> + and_stmt = SSA_NAME_DEF_STMT (t); >> + adjust = false; >> + } >> + >> + /* Make sure we have a count by one. */ >> + if (!is_gimple_assign (count_stmt) >> + || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR) >> + || !integer_onep (gimple_assign_rhs2 (count_stmt))) >> + return false; >> + >> + /* Cheeck "b = b & (b - 1)" is calculated. */ > Typo. > >> + if (!is_gimple_assign (and_stmt) >> + || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR) >> + return false; >> + >> + lhs = gimple_assign_rhs1 (and_stmt); >> + rhs = gimple_assign_rhs2 (and_stmt); >> + if (TREE_CODE (lhs) == SSA_NAME >> + && (and_minus_one = SSA_NAME_DEF_STMT (lhs)) >> + && is_gimple_assign (and_minus_one) >> + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) >> + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) >> + lhs = rhs; >> + else if (TREE_CODE (rhs) == SSA_NAME >> + && (and_minus_one = SSA_NAME_DEF_STMT (rhs)) >> + && is_gimple_assign (and_minus_one) >> + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) >> + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) > Could you avoid duplication by factoring the condition into an inline > function? They are exactly the same for lhs/rhs. > >> + ; >> + else >> + return false; >> + >> + if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one)) >> + && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 >> (and_minus_one))) > Here you already got lhs correctly, so don't need to check on and_stmt > rhs directly. You can even merge this check into above one. > >> + return false; >> + >> + /* Check the recurrence. */ >> + phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); >> + gimple *src_phi = SSA_NAME_DEF_STMT (lhs); >> + if (gimple_code (phi) != GIMPLE_PHI >> + || gimple_code (src_phi) != GIMPLE_PHI) > I think this is redundant since you have lhs equals to > gimple_assign_rhs1 (and_minus_one). So phi == src_phi is always true? > >> + return false; >> + >> + dest = gimple_assign_lhs (count_stmt); >> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >> + tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge >> (loop)->dest_idx); >> + if (adjust) >> + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), >> + build_call_expr (fn, 1, src), >> + build_int_cst (TREE_TYPE (dest), 1)); >> + else >> + *niter = build_call_expr (fn, 1, src); >> + *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest))); >> + return true; >> +} >> + >> + >> /* Like number_of_iterations_exit_assumptions, but return TRUE only if >> the niter information holds unconditionally. */ >> >> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge >> exit, >> gcond *stmt; >> if (!number_of_iterations_exit_assumptions (loop, exit, niter, >> &stmt, every_iteration)) >> - return false; >> + { >> + tree count; >> + HOST_WIDE_INT max; >> + if (check_popcount_pattern (loop, &count, &max)) >> + { >> + niter->assumptions = boolean_false_node; >> + niter->control.base = NULL_TREE; >> + niter->control.step = NULL_TREE; >> + niter->control.no_overflow = false; >> + niter->niter = count; >> + niter->assumptions = boolean_true_node; >> + niter->may_be_zero = boolean_false_node; >> + niter->max = max; >> + niter->bound = NULL_TREE; >> + niter->cmp = ERROR_MARK; >> + return true; >> + } > Better to merge these compound statement into check_popcount_pattern > and rename it into something like number_of_iterations_popcount. > > I wondered if the more inefficient version popcount should be checked, like: > > int count = 0; > while (x) > { > count += x & 1; > x = x >> 1; > } > > Thanks, > bin > >> + return false; >> + } >> >> if (integer_nonzerop (niter->assumptions)) >> return true; >> -- >> 2.7.4 >>
From 413aafbb5d53812546c4af5f556f362aafdb32d4 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> Date: Thu, 10 May 2018 21:41:53 +1000 Subject: [PATCH] popcount Change-Id: I2f796dd4518937495cc1855852b0dfa5c4ff1143 --- gcc/ipa-fnsummary.c | 4 + gcc/testsuite/gcc.dg/tree-ssa/popcount.c | 41 ++++++++ gcc/testsuite/gcc.dg/tree-ssa/popcount2.c | 29 ++++++ gcc/testsuite/gcc.dg/tree-ssa/popcount3.c | 28 ++++++ gcc/tree-scalar-evolution.c | 7 ++ gcc/tree-ssa-loop-ivopts.c | 10 ++ gcc/tree-ssa-loop-niter.c | 156 +++++++++++++++++++++++++++++- 7 files changed, 274 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index bdf9ba1..4dc156a 100644 --- a/gcc/ipa-fnsummary.c +++ b/gcc/ipa-fnsummary.c @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info, nonconstant_names); return p2.or_with (summary->conds, p1); } + else if (TREE_CODE (expr) == CALL_EXPR) + { + return false; + } else { debug_tree (expr); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c new file mode 100644 index 0000000..86a66cb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +extern int foo (int); + +int PopCount (long b) { + int c = 0; + b++; + + while (b) { + b &= b - 1; + c++; + } + return c; +} +int PopCount2 (long b) { + int c = 0; + + while (b) { + b &= b - 1; + c++; + } + foo (c); + return foo (c); +} + +void PopCount3 (long b1) { + + for (long i = 0; i < b1; ++i) + { + long b = i; + int c = 0; + while (b) { + b &= b - 1; + c++; + } + foo (c); + } +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c new file mode 100644 index 0000000..52afc2d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c @@ -0,0 +1,29 @@ +/* { dg-do execute } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int +__attribute__ ((noinline, noclone)) +foo (int i, long b) +{ + int c = i; + + while (b) { + b &= b - 1; + c++; + } + return c; +} + +int main() +{ + + if (foo (1, 7) != 4) + __builtin_abort (); + if (foo (0, 0) != 0) + __builtin_abort (); + if (foo (8, 0xff) != 16) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c new file mode 100644 index 0000000..0c69d97 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c @@ -0,0 +1,28 @@ +/* { dg-do execute } */ +/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */ + +int +__attribute__ ((noinline, noclone)) +foo (long b) +{ + int c = i; + + while (b) { + b &= b - 1; + c++; + } + return c; +} + +int main() +{ + if (foo (7) != 3) + __builtin_abort (); + if (foo (0) != 0) + __builtin_abort (); + if (foo (0xff) != 8) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index fefc9de..967b154 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1984,6 +1984,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr) return expr; if (TREE_CODE (expr) == POLYNOMIAL_CHREC + || TREE_CODE (expr) == CALL_EXPR || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS) return chrec_dont_know; @@ -3472,6 +3473,7 @@ bool expression_expensive_p (tree expr) { enum tree_code code; + tree fndecl; if (is_gimple_val (expr)) return false; @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr) if (!integer_pow2p (TREE_OPERAND (expr, 1))) return true; } + if (code == CALL_EXPR + && (fndecl = get_callee_fndecl (expr)) + && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL + && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl))) + return false; switch (TREE_CODE_CLASS (code)) { diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index b313571..519649a 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr) code = TREE_CODE (expr); codeclass = TREE_CODE_CLASS (code); + if (code == CALL_EXPR) + { + tree arg; + call_expr_arg_iterator iter; + FOR_EACH_CALL_EXPR_ARG (arg, iter, expr) + if (contains_abnormal_ssa_name_p (arg)) + return true; + return false; + } + if (code == SSA_NAME) return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0; diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 7a54c5f..66f9b4f 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2430,6 +2430,156 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit, return (!integer_zerop (niter->assumptions)); } +/* Utility function to check if OP is defined by a stmt + that is a val - 1. If that is the case, set it to STMT. */ + +static bool +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt) +{ + if (TREE_CODE (op) == SSA_NAME + && (*stmt = SSA_NAME_DEF_STMT (op)) + && is_gimple_assign (*stmt) + && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR) + && val == gimple_assign_rhs1 (*stmt) + && integer_minus_onep (gimple_assign_rhs2 (*stmt))) + return true; + else + return false; +} + +/* See if LOOP is a popcout implementation of the form + + int c = 0; + while (b) { + b = b & (b - 1); + c++; + } + + If popcount pattern, update NITER accordingly. + i.e., set NITER to __builtin_popcount (b) + return true if we did, false otherwise. + + */ + +static bool +number_of_iterations_popcount (loop_p loop, struct tree_niter_desc *niter) +{ + tree rhs1, rhs2; + tree dest; + gimple *and_minus_one; + gimple *phi; + int count = 0; + gimple *count_stmt = NULL; + bool adjust = true; + edge exit; + tree iter; + HOST_WIDE_INT max; + + if (!(exit = single_exit (loop))) + return false; + + /* Check loop terminating branch is like + if (b != 0). */ + gimple *stmt = last_stmt (loop->header); + if (!stmt + || gimple_code (stmt) != GIMPLE_COND + || gimple_cond_code (stmt) != NE_EXPR + || !zerop (gimple_cond_rhs (stmt)) + || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME) + return false; + + /* Check the loop closed SSA definition for just the variable c defined in + loop. */ + basic_block bb = exit->dest; + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + { + phi = gpi.phi (); + count++; + } + + if (count != 1) + return false; + + rhs1 = gimple_phi_arg_def (phi, exit->dest_idx); + if (TREE_CODE (rhs1) != SSA_NAME) + return false; + count_stmt = SSA_NAME_DEF_STMT (rhs1); + gimple *and_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt)); + + /* Depending on copy-header is performed, feeding PHI stmts might be in + the loop header or loop exit, handle this. */ + if (gimple_code (count_stmt) == GIMPLE_PHI) + { + tree t; + if (gimple_code (and_stmt) != GIMPLE_PHI + || gimple_bb (and_stmt) != exit->src + || gimple_bb (count_stmt) != exit->src) + return false; + t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx); + if (TREE_CODE (t) != SSA_NAME) + return false; + count_stmt = SSA_NAME_DEF_STMT (t); + t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx); + if (TREE_CODE (t) != SSA_NAME) + return false; + and_stmt = SSA_NAME_DEF_STMT (t); + adjust = false; + } + + /* Make sure we have a count by one. */ + if (!is_gimple_assign (count_stmt) + || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR) + || !integer_onep (gimple_assign_rhs2 (count_stmt))) + return false; + + /* Check "b = b & (b - 1)" is calculated. */ + if (!is_gimple_assign (and_stmt) + || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR) + return false; + + rhs1 = gimple_assign_rhs1 (and_stmt); + rhs2 = gimple_assign_rhs2 (and_stmt); + + if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one)) + rhs1 = rhs2; + else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one)) + ; + else + return false; + + /* Check the recurrence. */ + phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); + gimple *src_phi = SSA_NAME_DEF_STMT (rhs2); + if (gimple_code (phi) != GIMPLE_PHI + || gimple_code (src_phi) != GIMPLE_PHI) + return false; + + dest = gimple_assign_lhs (count_stmt); + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); + tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx); + if (adjust) + iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), + build_call_expr (fn, 1, src), + build_int_cst (TREE_TYPE (dest), 1)); + else + iter = build_call_expr (fn, 1, src); + max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest))); + + niter->assumptions = boolean_false_node; + niter->control.base = NULL_TREE; + niter->control.step = NULL_TREE; + niter->control.no_overflow = false; + niter->niter = iter; + niter->assumptions = boolean_true_node; + niter->may_be_zero = boolean_false_node; + niter->max = max; + niter->bound = NULL_TREE; + niter->cmp = ERROR_MARK; + return true; +} + + /* Like number_of_iterations_exit_assumptions, but return TRUE only if the niter information holds unconditionally. */ @@ -2441,7 +2591,11 @@ number_of_iterations_exit (struct loop *loop, edge exit, gcond *stmt; if (!number_of_iterations_exit_assumptions (loop, exit, niter, &stmt, every_iteration)) - return false; + { + if (number_of_iterations_popcount (loop, niter)) + return true; + return false; + } if (integer_nonzerop (niter->assumptions)) return true; -- 2.7.4