On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
<kugan.vivekanandara...@linaro.org> wrote:
> Hi Richard,
>
> On 6 March 2018 at 02:24, Richard Biener <richard.guent...@gmail.com> wrote:
>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>> <kugan.vivekanandara...@linaro.org> wrote:
>>> Hi Richard,
>>>
>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Thanks for the review.
>>>>>>> On 25 January 2018 at 20:04, Richard Biener 
>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>>>
>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and 
>>>>>>>>> above.
>>>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct
>>>>>>>>> me if I am wrong.
>>>>>>>>
>>>>>>>> But then it has no business inside loop distribution but instead is
>>>>>>>> doing final value
>>>>>>>> replacement, right?  You are pattern-matching the whole loop after 
>>>>>>>> all.  I think
>>>>>>>> final value replacement would already do the correct thing if you
>>>>>>>> teached number of
>>>>>>>> iteration analysis that niter for
>>>>>>>>
>>>>>>>>   <bb 3> [local count: 955630224]:
>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>
>>>>>>>>   _1 = b_11 + -1;
>>>>>>>>   b_8 = _1 & b_11;
>>>>>>>>   if (b_8 != 0)
>>>>>>>>     goto <bb 6>; [89.00%]
>>>>>>>>   else
>>>>>>>>     goto <bb 8>; [11.00%]
>>>>>>>>
>>>>>>>>   <bb 6> [local count: 850510900]:
>>>>>>>>   goto <bb 3>; [100.00%]
>>>>>>>
>>>>>>> I am looking into this approach. What should be the scalar evolution
>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>>>> and can this be represented with the scev?
>>>>>>
>>>>>> No, it's not affine and thus cannot be represented.  You only need the
>>>>>> scalar evolution of the counting IV which is already handled and
>>>>>> the number of iteration analysis needs to handle the above IV - this
>>>>>> is the missing part.
>>>>> Thanks for the clarification. I am now matching this loop pattern in
>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>>>> the loop preheater and setting the loop niter with this. This will be
>>>>> used by the final value replacement. Is this what you wanted?
>>>>
>>>> No, you shouldn't insert a popcount stmt but instead the niter
>>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>>> appropriate argument.
>>>
>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>> niter in tree_niter_desc can take such.
>>>
>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>
>> Overall this looks ok - the patch includes changes in places that I don't 
>> think
>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>> The expression_expensive_p change should be more specific than making
>> all calls inexpensive as well.
>
> Changed it.
>
>>
>> The verify_ssa change looks bogus, you do
>>
>> +  dest = gimple_phi_result (count_phi);
>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>> +
>> +  var = build_call_expr (fn, 1, src);
>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>> +                       build_int_cst (TREE_TYPE (dest), 1));
>>
>> why do you allocate a new SSA name here?  It seems unused
>> as you overwrive 'var' with the CALL_EXPR immediately.
> Changed now.
>
>>
>> I didn't review the pattern matching thoroughly nor the exact place you
>> call it.  But
>>
>> +      if (check_popcount_pattern (loop, &count))
>> +       {
>> +         niter->assumptions = boolean_false_node;
>> +         niter->control.base = NULL_TREE;
>> +         niter->control.step = NULL_TREE;
>> +         niter->control.no_overflow = false;
>> +         niter->niter = count;
>> +         niter->assumptions = boolean_true_node;
>> +         niter->may_be_zero = boolean_false_node;
>> +         niter->max = -1;
>> +         niter->bound = NULL_TREE;
>> +         niter->cmp = ERROR_MARK;
>> +         return true;
>> +       }
>>
>> simply setting may_be_zero to false looks fishy.
> Should I set this to (argument to popcount == zero)?
No, I think that's unnecessary.  The number of iterations is computed
as: may_be_zero ? 0 : niter;
Here niter is ZERO even when may_be_zero is set to false, and niters
is computed correctly.

I think the point is that may_be_zero is false doesn't imply that
niters is non-zero.

>
>> Try with -fno-tree-loop-ch.
> I changed the pattern matching to handle loop without header copying
> too. Looks a bit complicated checking all the conditions. Wondering if
> this can be done in a simpler and easier to read way.
>
>>  Also max should not be negative,
>> it should be the number of bits in the IV type?
>
> Changed this too.
>>
>> A related testcase could be that we can completely peel
>> a loop like the following which iterates at most 8 times:
>>
>> int a[8];
>> void foo (unsigned char ctrl)
>> {
>>   int c = 0;
>>   while (ctrl)
>>     {
>>        ctrl = ctrl & (ctrl - 1);
>>        a[c++] = ctrl;
>>     }
>> }
>
> Hmm, this is an interesting test case but as I am now trying to match
> a loop which does popcount, this is not handled.
>
>
> Attaching the current version for review.
Here are some comments.

> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 7a54c5f..f390321 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop 
> *loop, edge exit,
>    return (!integer_zerop (niter->assumptions));
>  }
>
> +/* See if LOOP is a popcout implementation of the form
> +
> +    int c = 0;
> +    while (b) {
> +    b = b & (b - 1);
> +    c++;
> +    }
> +
> +    If so, Set NITER to  __builtin_popcount (b) - 1
> +    return true if we did, false otherwise.  */
> +
> +static bool
> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)
> +{
> +  tree lhs, rhs;
> +  tree dest;
> +  gimple *and_minus_one;
> +  gimple *phi;
> +  int count = 0;
> +  gimple *count_stmt = NULL;
> +  bool adjust = true;
> +
> +  if (!single_exit (loop))
> +    return false;
> +
> +  /* Check loop terminating branch is like
> +     if (b != 0).  */
> +  gimple *stmt = last_stmt (loop->header);
> +  if (!stmt
> +      || gimple_code (stmt) != GIMPLE_COND
> +      || !zerop (gimple_cond_rhs (stmt)))
The check doesn't fully match the comment.  NE is not checked here.
Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this
point, making simple checks earlier.

> +    return false;
> +
> +  /* Check the loop closed SSA definition for just the variable c defined in
> +     loop.  */
> +  basic_block bb = single_exit (loop)->dest;
single_exit is repeatedly called various times, call it once and use
the returning edge instead?  I am not sure GCC is smart enough
removing repeated call instances.  Same to loop_latch_edge.

> +  for (gphi_iterator gpi = gsi_start_phis (bb);
> +       !gsi_end_p (gpi); gsi_next (&gpi))
> +    {
> +      phi = gpi.phi ();
> +      count++;
> +    }
> +
> +  if (count != 1)
> +    return false;
> +
> +  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +  count_stmt = SSA_NAME_DEF_STMT (rhs);
> +  lhs = gimple_cond_lhs (stmt);
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
> +
> +  /* Depending on copy-header is performed, feeding PHI stmts might be in
> +     the loop header or loop exit, handle this.  */
> +  if (gimple_code (count_stmt) == GIMPLE_PHI)
> +    {
> +      tree t;
> +      if (gimple_code (and_stmt) != GIMPLE_PHI
> +      || gimple_bb (and_stmt) != single_exit (loop)->src
> +      || gimple_bb (count_stmt) != single_exit (loop)->src)
> +    return false;
> +      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
> +      if (TREE_CODE (t) != SSA_NAME)
> +    return false;
> +      count_stmt = SSA_NAME_DEF_STMT (t);
> +      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
> +      if (TREE_CODE (t) != SSA_NAME)
> +    return false;
> +      and_stmt = SSA_NAME_DEF_STMT (t);
> +      adjust = false;
> +    }
> +
> +  /* Make sure we have a count by one.  */
> +  if (!is_gimple_assign (count_stmt)
> +      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
> +      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
> +    return false;
> +
> +  /* Cheeck "b = b & (b - 1)" is calculated.  */
Typo.

> +  if (!is_gimple_assign (and_stmt)
> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
> +    return false;
> +
> +  lhs = gimple_assign_rhs1 (and_stmt);
> +  rhs = gimple_assign_rhs2 (and_stmt);
> +  if (TREE_CODE (lhs) == SSA_NAME
> +      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
> +      && is_gimple_assign (and_minus_one)
> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
> +      lhs = rhs;
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
> +      && is_gimple_assign (and_minus_one)
> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
Could you avoid duplication by factoring the condition into an inline
function?  They are exactly the same for lhs/rhs.

> +      ;
> +  else
> +    return false;
> +
> +  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
> +      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 
> (and_minus_one)))
Here you already got lhs correctly, so don't need to check on and_stmt
rhs directly.  You can even merge this check into above one.

> +    return false;
> +
> +  /* Check the recurrence.  */
> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
> +  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
> +  if (gimple_code (phi) != GIMPLE_PHI
> +      || gimple_code (src_phi) != GIMPLE_PHI)
I think this is redundant since you have lhs equals to
gimple_assign_rhs1 (and_minus_one).  So phi == src_phi is always true?

> +    return false;
> +
> +  dest = gimple_assign_lhs (count_stmt);
> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge 
> (loop)->dest_idx);
> +  if (adjust)
> +    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
> +              build_call_expr (fn, 1, src),
> +              build_int_cst (TREE_TYPE (dest), 1));
> +  else
> +    *niter = build_call_expr (fn, 1, src);
> +  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
> +  return true;
> +}
> +
> +
>  /* Like number_of_iterations_exit_assumptions, but return TRUE only if
>     the niter information holds unconditionally.  */
>
> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge 
> exit,
>    gcond *stmt;
>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,
>                            &stmt, every_iteration))
> -    return false;
> +    {
> +      tree count;
> +      HOST_WIDE_INT max;
> +      if (check_popcount_pattern (loop, &count, &max))
> +    {
> +      niter->assumptions = boolean_false_node;
> +      niter->control.base = NULL_TREE;
> +      niter->control.step = NULL_TREE;
> +      niter->control.no_overflow = false;
> +      niter->niter = count;
> +      niter->assumptions = boolean_true_node;
> +      niter->may_be_zero = boolean_false_node;
> +      niter->max = max;
> +      niter->bound = NULL_TREE;
> +      niter->cmp = ERROR_MARK;
> +      return true;
> +    }
Better to merge these compound statement into check_popcount_pattern
and rename it into something like number_of_iterations_popcount.

I wondered if the more inefficient version popcount should be checked, like:

int count = 0;
while (x)
{
  count += x & 1;
  x = x >> 1;
}

Thanks,
bin

> +      return false;
> +    }
>
>    if (integer_nonzerop (niter->assumptions))
>      return true;
> --
> 2.7.4
>

Reply via email to