On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>> Hi Richard,
>>>>
>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> 
>>>> wrote:
>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> 
>>>>>> wrote:
>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> Thanks for the review.
>>>>>>>> On 25 January 2018 at 20:04, Richard Biener 
>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>>>>
>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and 
>>>>>>>>>> above.
>>>>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please 
>>>>>>>>>> correct
>>>>>>>>>> me if I am wrong.
>>>>>>>>>
>>>>>>>>> But then it has no business inside loop distribution but instead is
>>>>>>>>> doing final value
>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after 
>>>>>>>>> all.  I think
>>>>>>>>> final value replacement would already do the correct thing if you
>>>>>>>>> teached number of
>>>>>>>>> iteration analysis that niter for
>>>>>>>>>
>>>>>>>>>   <bb 3> [local count: 955630224]:
>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>
>>>>>>>>>   _1 = b_11 + -1;
>>>>>>>>>   b_8 = _1 & b_11;
>>>>>>>>>   if (b_8 != 0)
>>>>>>>>>     goto <bb 6>; [89.00%]
>>>>>>>>>   else
>>>>>>>>>     goto <bb 8>; [11.00%]
>>>>>>>>>
>>>>>>>>>   <bb 6> [local count: 850510900]:
>>>>>>>>>   goto <bb 3>; [100.00%]
>>>>>>>>
>>>>>>>> I am looking into this approach. What should be the scalar evolution
>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>>>>> and can this be represented with the scev?
>>>>>>>
>>>>>>> No, it's not affine and thus cannot be represented.  You only need the
>>>>>>> scalar evolution of the counting IV which is already handled and
>>>>>>> the number of iteration analysis needs to handle the above IV - this
>>>>>>> is the missing part.
>>>>>> Thanks for the clarification. I am now matching this loop pattern in
>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>>>>> the loop preheater and setting the loop niter with this. This will be
>>>>>> used by the final value replacement. Is this what you wanted?
>>>>>
>>>>> No, you shouldn't insert a popcount stmt but instead the niter
>>>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>>>> appropriate argument.
>>>>
>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>>> niter in tree_niter_desc can take such.
>>>>
>>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>>
>>> Overall this looks ok - the patch includes changes in places that I don't 
>>> think
>>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>>> The expression_expensive_p change should be more specific than making
>>> all calls inexpensive as well.
>>>
>>> The verify_ssa change looks bogus, you do
>>>
>>> +  dest = gimple_phi_result (count_phi);
>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>>> +
>>> +  var = build_call_expr (fn, 1, src);
>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>>> +                       build_int_cst (TREE_TYPE (dest), 1));
>>>
>>> why do you allocate a new SSA name here?  It seems unused
>>> as you overwrive 'var' with the CALL_EXPR immediately.
>>>
>>> I didn't review the pattern matching thoroughly nor the exact place you
>>> call it.  But
>>>
>>> +      if (check_popcount_pattern (loop, &count))
>>> +       {
>>> +         niter->assumptions = boolean_false_node;
>>> +         niter->control.base = NULL_TREE;
>>> +         niter->control.step = NULL_TREE;
>>> +         niter->control.no_overflow = false;
>>> +         niter->niter = count;
>>> +         niter->assumptions = boolean_true_node;
>>> +         niter->may_be_zero = boolean_false_node;
>>> +         niter->max = -1;
>>> +         niter->bound = NULL_TREE;
>>> +         niter->cmp = ERROR_MARK;
>>> +         return true;
>>> +       }
>>>
>>> simply setting may_be_zero to false looks fishy.  Try
>>> with -fno-tree-loop-ch.  Also max should not be negative,
>>> it should be the number of bits in the IV type?
>>>
>>> A related testcase could be that we can completely peel
>>> a loop like the following which iterates at most 8 times:
>>>
>>> int a[8];
>>> void foo (unsigned char ctrl)
>>> {
>>>   int c = 0;
>>>   while (ctrl)
>>>     {
>>>        ctrl = ctrl & (ctrl - 1);
>>>        a[c++] = ctrl;
>>>     }
>>> }
>>>
>>> This is now stage1 material so please update and re-post.  Maybe Bin has
>>> further suggestions as well.
>> Sorry for being late on this.  Actually I thought about popcount in
>> distribution before.  More like the first patch, but handled as
>> another distribution pattern rather than a special case.  It's a bit
>> strange to compute and store the info in niters.  It's also not
>> straight forward when/where the transformation is finally done.
>
> It's done at final value replacement if the counter is used.  But with
> it in place we should also be able to compute niters for the case
> where it isn't (like the above case).  So I believe that in the end handling
> this in niter analysis is more powerful.
Yes, you are right, as distribution pattern, we would only be able to
handle standalone popcount loop.

Thanks,
bin
>
>> I haven't looked into the details so not sure how appropriate it will
>> be as a distribution pattern (current ones are only about data
>> references).  So I am okay with this version if it's more appropriate.
>
> Yeah, adding distribution patterns for scalar reduction forms is certainly
> appropriate but then this is the same or at least similar as final
> value replacement.
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Kugan
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>
>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
>>>>     * tree-chrec.c (chrec_convert_1): Likewise.
>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.
>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.
>>>>     (number_of_iterations_exit): Record niter for popcount patern.
>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>
>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
>>>>
>>>>
>>>>>
>>>>>> In general, there is also a condition gating the loop like
>>>>>>
>>>>>> if (b_4 != 0)
>>>>>>   goto loop;
>>>>>> else
>>>>>>   end:
>>>>>>
>>>>>> This of course will not be removed by final value replacement. Since
>>>>>> popcount (0) is defined, this is redundant and could be removed but
>>>>>> not removed.
>>>>>
>>>>> Yeah, that's probably sth for another pass though.  I suppose the
>>>>> end: case just uses zero in which case you'll have a PHI and you
>>>>> can optimize
>>>>>
>>>>>   if (b != 0)
>>>>>     return popcount (b);
>>>>>   return 0;
>>>>>
>>>>> in phiopt.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks,
>>>>>> Kugan
>>>>>>
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kugan
>>>>>>>>>
>>>>>>>>> is related to popcount (b_5).
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>>>>>>>>>> regressions.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Kugan
>>>>>>>>>>
>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>>
>>>>>>>>>>     PR middle-end/82479
>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.
>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>>
>>>>>>>>>>     PR middle-end/82479
>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Reply via email to