On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>> <kugan.vivekanandara...@linaro.org> wrote:
>>> Hi Richard,
>>>
>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Thanks for the review.
>>>>>>> On 25 January 2018 at 20:04, Richard Biener 
>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>>>
>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and 
>>>>>>>>> above.
>>>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct
>>>>>>>>> me if I am wrong.
>>>>>>>>
>>>>>>>> But then it has no business inside loop distribution but instead is
>>>>>>>> doing final value
>>>>>>>> replacement, right?  You are pattern-matching the whole loop after 
>>>>>>>> all.  I think
>>>>>>>> final value replacement would already do the correct thing if you
>>>>>>>> teached number of
>>>>>>>> iteration analysis that niter for
>>>>>>>>
>>>>>>>>   <bb 3> [local count: 955630224]:
>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>
>>>>>>>>   _1 = b_11 + -1;
>>>>>>>>   b_8 = _1 & b_11;
>>>>>>>>   if (b_8 != 0)
>>>>>>>>     goto <bb 6>; [89.00%]
>>>>>>>>   else
>>>>>>>>     goto <bb 8>; [11.00%]
>>>>>>>>
>>>>>>>>   <bb 6> [local count: 850510900]:
>>>>>>>>   goto <bb 3>; [100.00%]
>>>>>>>
>>>>>>> I am looking into this approach. What should be the scalar evolution
>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>>>> and can this be represented with the scev?
>>>>>>
>>>>>> No, it's not affine and thus cannot be represented.  You only need the
>>>>>> scalar evolution of the counting IV which is already handled and
>>>>>> the number of iteration analysis needs to handle the above IV - this
>>>>>> is the missing part.
>>>>> Thanks for the clarification. I am now matching this loop pattern in
>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>>>> the loop preheater and setting the loop niter with this. This will be
>>>>> used by the final value replacement. Is this what you wanted?
>>>>
>>>> No, you shouldn't insert a popcount stmt but instead the niter
>>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>>> appropriate argument.
>>>
>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>> niter in tree_niter_desc can take such.
>>>
>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>
>> Overall this looks ok - the patch includes changes in places that I don't 
>> think
>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>> The expression_expensive_p change should be more specific than making
>> all calls inexpensive as well.
>>
>> The verify_ssa change looks bogus, you do
>>
>> +  dest = gimple_phi_result (count_phi);
>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>> +
>> +  var = build_call_expr (fn, 1, src);
>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>> +                       build_int_cst (TREE_TYPE (dest), 1));
>>
>> why do you allocate a new SSA name here?  It seems unused
>> as you overwrive 'var' with the CALL_EXPR immediately.
>>
>> I didn't review the pattern matching thoroughly nor the exact place you
>> call it.  But
>>
>> +      if (check_popcount_pattern (loop, &count))
>> +       {
>> +         niter->assumptions = boolean_false_node;
>> +         niter->control.base = NULL_TREE;
>> +         niter->control.step = NULL_TREE;
>> +         niter->control.no_overflow = false;
>> +         niter->niter = count;
>> +         niter->assumptions = boolean_true_node;
>> +         niter->may_be_zero = boolean_false_node;
>> +         niter->max = -1;
>> +         niter->bound = NULL_TREE;
>> +         niter->cmp = ERROR_MARK;
>> +         return true;
>> +       }
>>
>> simply setting may_be_zero to false looks fishy.  Try
>> with -fno-tree-loop-ch.  Also max should not be negative,
>> it should be the number of bits in the IV type?
>>
>> A related testcase could be that we can completely peel
>> a loop like the following which iterates at most 8 times:
>>
>> int a[8];
>> void foo (unsigned char ctrl)
>> {
>>   int c = 0;
>>   while (ctrl)
>>     {
>>        ctrl = ctrl & (ctrl - 1);
>>        a[c++] = ctrl;
>>     }
>> }
>>
>> This is now stage1 material so please update and re-post.  Maybe Bin has
>> further suggestions as well.
> Sorry for being late on this.  Actually I thought about popcount in
> distribution before.  More like the first patch, but handled as
> another distribution pattern rather than a special case.  It's a bit
> strange to compute and store the info in niters.  It's also not
> straight forward when/where the transformation is finally done.

It's done at final value replacement if the counter is used.  But with
it in place we should also be able to compute niters for the case
where it isn't (like the above case).  So I believe that in the end handling
this in niter analysis is more powerful.

> I haven't looked into the details so not sure how appropriate it will
> be as a distribution pattern (current ones are only about data
> references).  So I am okay with this version if it's more appropriate.

Yeah, adding distribution patterns for scalar reduction forms is certainly
appropriate but then this is the same or at least similar as final
value replacement.

Richard.

> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>
>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
>>>     * tree-chrec.c (chrec_convert_1): Likewise.
>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.
>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.
>>>     (number_of_iterations_exit): Record niter for popcount patern.
>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>
>>>     * gcc.dg/tree-ssa/popcount.c: New test.
>>>
>>>
>>>>
>>>>> In general, there is also a condition gating the loop like
>>>>>
>>>>> if (b_4 != 0)
>>>>>   goto loop;
>>>>> else
>>>>>   end:
>>>>>
>>>>> This of course will not be removed by final value replacement. Since
>>>>> popcount (0) is defined, this is redundant and could be removed but
>>>>> not removed.
>>>>
>>>> Yeah, that's probably sth for another pass though.  I suppose the
>>>> end: case just uses zero in which case you'll have a PHI and you
>>>> can optimize
>>>>
>>>>   if (b != 0)
>>>>     return popcount (b);
>>>>   return 0;
>>>>
>>>> in phiopt.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Kugan
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Kugan
>>>>>>>>
>>>>>>>> is related to popcount (b_5).
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>>>>>>>>> regressions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kugan
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>
>>>>>>>>>     PR middle-end/82479
>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.
>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>
>>>>>>>>>     PR middle-end/82479
>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Reply via email to