Re: [PATCH 0/3][POPCOUNT]

Richard Biener Mon, 25 Jun 2018 02:46:46 -0700

On Mon, Jun 25, 2018 at 11:30 AM Bin.Cheng <amker.ch...@gmail.com> wrote:
>
> On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah
> <kugan.vivekanandara...@linaro.org> wrote:
> > Hi Bin,
> >
> > On 25 June 2018 at 13:56, Bin.Cheng <amker.ch...@gmail.com> wrote:
> >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
> >> <kugan.vivekanandara...@linaro.org> wrote:
> >>> Hi Bin,
> >>>
> >>> Thanks for your comments.
> >>>
> >>> On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote:
> >>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
> >>>> <kugan.vivekanandara...@linaro.org> wrote:
> >>>>> When we set niter with maybe_zero, currently final_value_relacement
> >>>>> will not happen due to expression_expensive_p not handling. Patch 1
> >>>>> adds this.
> >>>>>
> >>>>> With that we have the following optimized gimple.
> >>>>>
> >>>>>   <bb 2> [local count: 118111601]:
> >>>>>   if (b_4(D) != 0)
> >>>>>     goto <bb 3>; [89.00%]
> >>>>>   else
> >>>>>     goto <bb 4>; [11.00%]
> >>>>>
> >>>>>   <bb 3> [local count: 105119324]:
> >>>>>   _2 = (unsigned long) b_4(D);
> >>>>>   _9 = __builtin_popcountl (_2);
> >>>>>   c_3 = b_4(D) != 0 ? _9 : 1;
> >>>>>
> >>>>>   <bb 4> [local count: 118111601]:
> >>>>>   # c_12 = PHI <c_3(3), 0(2)>
> >>>>>
> >>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
> >>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
> >>>> computed should be 0.  Point is you can never get b_4(D) == 0 with
> >>>> guard condition in basic block 2.  So the result should simply be:
> >>>
> >>> When we do  calculate niter (for the copy header case), we set
> >>> may_be_zero as (which I think is correct)
> >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
> >>>                   build_zero_cst
> >>>                   (TREE_TYPE (src)));
> >>>
> >>> Then in final_value_replacement_loop (struct loop *loop)
> >>>
> >>> for the PHI stmt for which we are going to do the final value replacement,
> >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
> >>>
> >>> then we do
> >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree 
> >>> evolution_fn)
> >>>
> >>> where when we do chrec_apply to the polynomial_chrec with niter from
> >>> popcount which also has the may_be_zero, we end up with the 1.
> >>> Looking at this, I am not sure if this is wrong. May be I am missing 
> >>> something.
> >> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
> >> 0?  Though it never happens in this case.
> >
> > We dont set popcount = 1. When we set niter for popcount pattern with
> > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
> >                   build_zero_cst (TREE_TYPE (src)));
> Hmm, I think this is unnecessary and causing the weird cond_expr in
> following optimization.  What happens if you simply set it to false?


It is surely not unnecessary because we set it to non-NULL only when
the result is _not_ simply popcount() but popcount()-1.

All the above suggests that SCEV does sth wrong.  Dumps show

  (set_nb_iterations_in_loop = b_4(D) != 0 ? (unsigned long)
__builtin_popcountl ((unsigned long) b_4(D)) + 18446744073709551615 :
0))

(chrec_apply
  (varying_loop = 1
)
  (chrec = {1, +, 1}_1)
  (x = b_4(D) != 0 ? (int) ((unsigned int) __builtin_popcountl
((unsigned long) b_4(D)) + 4294967295) : 0)
  (res = b_4(D) != 0 ? __builtin_popcountl ((unsigned long) b_4(D)) : 1))
)

this is because the analysis works on a header-copied loop and thus
the IV for C is {1, +, 1}
so this is all correct and to be simply optimized by following passes
factoring in
the range of b_4(D) which was tested to be _not_zero before.

Richard.

> Thanks,
> bin
> >
> > Because of which, we have an niter in the final_value_replacement, we have
> > (gdb) p debug_tree (niter)
> >  <cond_expr 0x7ffff6a76a80
> >     type <integer_type 0x7ffff694d1f8 public unsigned DI
> >         size <integer_cst 0x7ffff692dcf0 constant 64>
> >         unit-size <integer_cst 0x7ffff692dd08 constant 8>
> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst
> > 0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0
> > 18446744073709551615>>
> >
> >     arg:0 <ne_expr 0x7ffff6a80910
> >         type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI
> >             size <integer_cst 0x7ffff692dde0 constant 8>
> >             unit-size <integer_cst 0x7ffff692ddf8 constant 1>
> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff6945b28 precision:1 min <integer_cst
> > 0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>>
> >
> >         arg:0 <ssa_name 0x7ffff6937a68 type <integer_type
> > 0x7ffff6945738 long int>
> >             visited var <parm_decl 0x7ffff6a79000 b>
> >             def_stmt GIMPLE_NOPvolu
> >             version:4>
> >         arg:1 <integer_cst 0x7ffff6a64720 constant 0>>
> >     arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8>
> >
> >         arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8>
> >
> >             arg:0 <call_expr 0x7ffff69396c8 type <integer_type
> > 0x7ffff69455e8 int>
> >
> >                 fn <addr_expr 0x7ffff6a883a0 type <pointer_type 
> > 0x7ffff6a55888>
> >                     readonly constant arg:0 <function_decl
> > 0x7ffff69ff600 __builtin_popcountl>>
> >                 arg:0 <nop_expr 0x7ffff6a88380 type <integer_type
> > 0x7ffff694d1f8>
> >                     arg:0 <ssa_name 0x7ffff6937a68>>>>
> >         arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>>
> >     arg:2 <integer_cst 0x7ffff694a120 type <integer_type
> > 0x7ffff694d1f8> constant 0>>
> >
> > Then from there then we do compute_overall_effect_of_inner_loop for
> > scalar evolution of PHI with niter we get the 1.
> >
> >>>
> >>> In this testcase, before we enter the loop we have a check for (b_4(D)
> >>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but
> >>> conservatively correct (?).
> >> Yes, but not necessarily.  Setting maybe_zero could confuse following
> >> optimizations and we should avoid doing that whenever possible.  If
> >> any pass goes wrong because it's not set conservatively, it is that
> >> pass' responsibility and should be fixed accordingly.  Here IMHO, we
> >> don't need to set it.
> >
> > My patch 2 is for not setting this when we know know a_4(D) is not
> > zero in this path.
> >
> > Thanks,
> > Kugan
> >
> >
> >
> >
> >>
> >> Thanks,
> >> bin
> >>>
> >>> Thanks,
> >>> Kugan
> >>>
> >>>>
> >>>>>   <bb 2> [local count: 118111601]:
> >>>>>   if (b_4(D) != 0)
> >>>>>     goto <bb 3>; [89.00%]
> >>>>>   else
> >>>>>     goto <bb 4>; [11.00%]
> >>>>>
> >>>>>   <bb 3> [local count: 105119324]:
> >>>>>   _2 = (unsigned long) b_4(D);
> >>>>>   c_3 = __builtin_popcountl (_2);
> >>>>>
> >>>>>   <bb 4> [local count: 118111601]:
> >>>>>   # c_12 = PHI <c_3(3), 0(2)>
> >>>>
> >>>> I think this is the code generated if maybe_zero is not set?  which it
> >>>> should not be set here.
> >>>> For the same reason, it can be further optimized into:
> >>>>
> >>>>>   <bb 2> [local count: 118111601]:
> >>>>>   _2 = (unsigned long) b_4(D);
> >>>>>   c_12 = __builtin_popcountl (_2);
> >>>>>
> >>>>
> >>>>> latch execute zero times for b_4 == 0 means that the body will execute
> >>>>> ones.
> >>>> You never get zero times latch here with the aforementioned guard 
> >>>> condition.
> >>>>
> >>>> BTW, I didn't look at following patches which could be wanted 
> >>>> optimizations.
> >>>>
> >>>> Thanks,
> >>>> bin
> >>>>>
> >>>>> The issue here is, since we are checking if (b_4(D) != 0) before
> >>>>> entering the loop means we don't need to set maybe_zero. Patch 2
> >>>>> handles this.
> >>>>>
> >>>>> With that we have
> >>>>>   <bb 2> [local count: 118111601]:
> >>>>>   if (b_4(D) != 0)
> >>>>>     goto <bb 3>; [89.00%]
> >>>>>   else
> >>>>>     goto <bb 4>; [11.00%]
> >>>>>
> >>>>>   <bb 3> [local count: 105119324]:
> >>>>>   _2 = (unsigned long) b_4(D);
> >>>>>   _9 = __builtin_popcountl (_2);
> >>>>>
> >>>>>   <bb 4> [local count: 118111601]:
> >>>>>   # c_12 = PHI <0(2), _9(3)>
> >>>>>
> >>>>> As advised earlier, patch 3 adds phiopt support to remove this.
> >>>>>
> >>>>> Bootstrap and regression testing are ongoing.
> >>>>>
> >>>>> Is this OK for trunk.
> >>>>>
> >>>>> Thanks,
> >>>>> Kugan

Re: [PATCH 0/3][POPCOUNT]

Reply via email to