On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> wrote: > Hi Richard, > > On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> > wrote: >> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah >> <kugan.vivekanandara...@linaro.org> wrote: >>> Hi Richard, >>> >>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> >>> wrote: >>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>> Hi Richard, >>>>> >>>>> Thanks for the review. >>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>>> would like to queue this for review for next stage 1. >>>>>>> >>>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>>> above. >>>>>>> 2. This does not distribute loop to detect popcount (like >>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct >>>>>>> me if I am wrong. >>>>>> >>>>>> But then it has no business inside loop distribution but instead is >>>>>> doing final value >>>>>> replacement, right? You are pattern-matching the whole loop after all. >>>>>> I think >>>>>> final value replacement would already do the correct thing if you >>>>>> teached number of >>>>>> iteration analysis that niter for >>>>>> >>>>>> <bb 3> [local count: 955630224]: >>>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>>> _1 = b_11 + -1; >>>>>> b_8 = _1 & b_11; >>>>>> if (b_8 != 0) >>>>>> goto <bb 6>; [89.00%] >>>>>> else >>>>>> goto <bb 8>; [11.00%] >>>>>> >>>>>> <bb 6> [local count: 850510900]: >>>>>> goto <bb 3>; [100.00%] >>>>> >>>>> I am looking into this approach. What should be the scalar evolution >>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>>> and can this be represented with the scev? >>>> >>>> No, it's not affine and thus cannot be represented. You only need the >>>> scalar evolution of the counting IV which is already handled and >>>> the number of iteration analysis needs to handle the above IV - this >>>> is the missing part. >>> Thanks for the clarification. I am now matching this loop pattern in >>> number_of_iterations_exit when number_of_iterations_exit_assumptions >>> fails. If the pattern matches, I am inserting the _builtin_popcount in >>> the loop preheater and setting the loop niter with this. This will be >>> used by the final value replacement. Is this what you wanted? >> >> No, you shouldn't insert a popcount stmt but instead the niter >> GENERIC tree should be a CALL_EXPR to popcount with the >> appropriate argument. > > Thats what I tried earlier but ran into some ICEs. I wasn't sure if > niter in tree_niter_desc can take such. > > Attached patch now does this. Also had to add support for CALL_EXPR in > few places to handle niter with CALL_EXPR. Does this look OK?
Overall this looks ok - the patch includes changes in places that I don't think need changes such as chrec_convert_1 or extract_ops_from_tree. The expression_expensive_p change should be more specific than making all calls inexpensive as well. The verify_ssa change looks bogus, you do + dest = gimple_phi_result (count_phi); + tree var = make_ssa_name (TREE_TYPE (dest), NULL); + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); + + var = build_call_expr (fn, 1, src); + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, + build_int_cst (TREE_TYPE (dest), 1)); why do you allocate a new SSA name here? It seems unused as you overwrive 'var' with the CALL_EXPR immediately. I didn't review the pattern matching thoroughly nor the exact place you call it. But + if (check_popcount_pattern (loop, &count)) + { + niter->assumptions = boolean_false_node; + niter->control.base = NULL_TREE; + niter->control.step = NULL_TREE; + niter->control.no_overflow = false; + niter->niter = count; + niter->assumptions = boolean_true_node; + niter->may_be_zero = boolean_false_node; + niter->max = -1; + niter->bound = NULL_TREE; + niter->cmp = ERROR_MARK; + return true; + } simply setting may_be_zero to false looks fishy. Try with -fno-tree-loop-ch. Also max should not be negative, it should be the number of bits in the IV type? A related testcase could be that we can completely peel a loop like the following which iterates at most 8 times: int a[8]; void foo (unsigned char ctrl) { int c = 0; while (ctrl) { ctrl = ctrl & (ctrl - 1); a[c++] = ctrl; } } This is now stage1 material so please update and re-post. Maybe Bin has further suggestions as well. Thanks, Richard. > Thanks, > Kugan > > > gcc/ChangeLog: > > 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> > > * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR. > * tree-chrec.c (chrec_convert_1): Likewise. > * tree-scalar-evolution.c (expression_expensive_p): Likewise. > * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise. > * tree-ssa-loop-niter.c (check_popcount_pattern): New. > (number_of_iterations_exit): Record niter for popcount patern. > * tree-ssa.c (verify_ssa): Check stmt to be non NULL. > > gcc/testsuite/ChangeLog: > > 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> > > * gcc.dg/tree-ssa/popcount.c: New test. > > >> >>> In general, there is also a condition gating the loop like >>> >>> if (b_4 != 0) >>> goto loop; >>> else >>> end: >>> >>> This of course will not be removed by final value replacement. Since >>> popcount (0) is defined, this is redundant and could be removed but >>> not removed. >> >> Yeah, that's probably sth for another pass though. I suppose the >> end: case just uses zero in which case you'll have a PHI and you >> can optimize >> >> if (b != 0) >> return popcount (b); >> return 0; >> >> in phiopt. >> >> Richard. >> >>> Thanks, >>> Kugan >>> >>>> >>>> Richard. >>>> >>>>> Thanks, >>>>> Kugan >>>>>> >>>>>> is related to popcount (b_5). >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>>>>>> regressions. >>>>>>> >>>>>>> Thanks, >>>>>>> Kugan >>>>>>> >>>>>>> gcc/ChangeLog: >>>>>>> >>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>> >>>>>>> PR middle-end/82479 >>>>>>> * tree-loop-distribution.c (handle_popcount): New. >>>>>>> (pass_loop_distribution::execute): Use handle_popcount. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>> >>>>>>> PR middle-end/82479 >>>>>>> * gcc.dg/tree-ssa/popcount.c: New test.