On Mon, Jun 25, 2018 at 11:30 AM Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: > > Hi Bin, > > > > On 25 June 2018 at 13:56, Bin.Cheng <amker.ch...@gmail.com> wrote: > >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah > >> <kugan.vivekanandara...@linaro.org> wrote: > >>> Hi Bin, > >>> > >>> Thanks for your comments. > >>> > >>> On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote: > >>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah > >>>> <kugan.vivekanandara...@linaro.org> wrote: > >>>>> When we set niter with maybe_zero, currently final_value_relacement > >>>>> will not happen due to expression_expensive_p not handling. Patch 1 > >>>>> adds this. > >>>>> > >>>>> With that we have the following optimized gimple. > >>>>> > >>>>> <bb 2> [local count: 118111601]: > >>>>> if (b_4(D) != 0) > >>>>> goto <bb 3>; [89.00%] > >>>>> else > >>>>> goto <bb 4>; [11.00%] > >>>>> > >>>>> <bb 3> [local count: 105119324]: > >>>>> _2 = (unsigned long) b_4(D); > >>>>> _9 = __builtin_popcountl (_2); > >>>>> c_3 = b_4(D) != 0 ? _9 : 1; > >>>>> > >>>>> <bb 4> [local count: 118111601]: > >>>>> # c_12 = PHI <c_3(3), 0(2)> > >>>>> > >>>>> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the > >>>> No, it doesn't make much sense. when b_4(D) == 0, the popcount > >>>> computed should be 0. Point is you can never get b_4(D) == 0 with > >>>> guard condition in basic block 2. So the result should simply be: > >>> > >>> When we do calculate niter (for the copy header case), we set > >>> may_be_zero as (which I think is correct) > >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > >>> build_zero_cst > >>> (TREE_TYPE (src))); > >>> > >>> Then in final_value_replacement_loop (struct loop *loop) > >>> > >>> for the PHI stmt for which we are going to do the final value replacement, > >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. > >>> > >>> then we do > >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree > >>> evolution_fn) > >>> > >>> where when we do chrec_apply to the polynomial_chrec with niter from > >>> popcount which also has the may_be_zero, we end up with the 1. > >>> Looking at this, I am not sure if this is wrong. May be I am missing > >>> something. > >> I think it is wrong. How could you get popcount == 1 when b_4(D) == > >> 0? Though it never happens in this case. > > > > We dont set popcount = 1. When we set niter for popcount pattern with > > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > > build_zero_cst (TREE_TYPE (src))); > Hmm, I think this is unnecessary and causing the weird cond_expr in > following optimization. What happens if you simply set it to false?
It is surely not unnecessary because we set it to non-NULL only when the result is _not_ simply popcount() but popcount()-1. All the above suggests that SCEV does sth wrong. Dumps show (set_nb_iterations_in_loop = b_4(D) != 0 ? (unsigned long) __builtin_popcountl ((unsigned long) b_4(D)) + 18446744073709551615 : 0)) (chrec_apply (varying_loop = 1 ) (chrec = {1, +, 1}_1) (x = b_4(D) != 0 ? (int) ((unsigned int) __builtin_popcountl ((unsigned long) b_4(D)) + 4294967295) : 0) (res = b_4(D) != 0 ? __builtin_popcountl ((unsigned long) b_4(D)) : 1)) ) this is because the analysis works on a header-copied loop and thus the IV for C is {1, +, 1} so this is all correct and to be simply optimized by following passes factoring in the range of b_4(D) which was tested to be _not_zero before. Richard. > Thanks, > bin > > > > Because of which, we have an niter in the final_value_replacement, we have > > (gdb) p debug_tree (niter) > > <cond_expr 0x7ffff6a76a80 > > type <integer_type 0x7ffff694d1f8 public unsigned DI > > size <integer_cst 0x7ffff692dcf0 constant 64> > > unit-size <integer_cst 0x7ffff692dd08 constant 8> > > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst > > 0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0 > > 18446744073709551615>> > > > > arg:0 <ne_expr 0x7ffff6a80910 > > type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI > > size <integer_cst 0x7ffff692dde0 constant 8> > > unit-size <integer_cst 0x7ffff692ddf8 constant 1> > > align:8 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7ffff6945b28 precision:1 min <integer_cst > > 0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>> > > > > arg:0 <ssa_name 0x7ffff6937a68 type <integer_type > > 0x7ffff6945738 long int> > > visited var <parm_decl 0x7ffff6a79000 b> > > def_stmt GIMPLE_NOPvolu > > version:4> > > arg:1 <integer_cst 0x7ffff6a64720 constant 0>> > > arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8> > > > > arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8> > > > > arg:0 <call_expr 0x7ffff69396c8 type <integer_type > > 0x7ffff69455e8 int> > > > > fn <addr_expr 0x7ffff6a883a0 type <pointer_type > > 0x7ffff6a55888> > > readonly constant arg:0 <function_decl > > 0x7ffff69ff600 __builtin_popcountl>> > > arg:0 <nop_expr 0x7ffff6a88380 type <integer_type > > 0x7ffff694d1f8> > > arg:0 <ssa_name 0x7ffff6937a68>>>> > > arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>> > > arg:2 <integer_cst 0x7ffff694a120 type <integer_type > > 0x7ffff694d1f8> constant 0>> > > > > Then from there then we do compute_overall_effect_of_inner_loop for > > scalar evolution of PHI with niter we get the 1. > > > >>> > >>> In this testcase, before we enter the loop we have a check for (b_4(D) > >>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but > >>> conservatively correct (?). > >> Yes, but not necessarily. Setting maybe_zero could confuse following > >> optimizations and we should avoid doing that whenever possible. If > >> any pass goes wrong because it's not set conservatively, it is that > >> pass' responsibility and should be fixed accordingly. Here IMHO, we > >> don't need to set it. > > > > My patch 2 is for not setting this when we know know a_4(D) is not > > zero in this path. > > > > Thanks, > > Kugan > > > > > > > > > >> > >> Thanks, > >> bin > >>> > >>> Thanks, > >>> Kugan > >>> > >>>> > >>>>> <bb 2> [local count: 118111601]: > >>>>> if (b_4(D) != 0) > >>>>> goto <bb 3>; [89.00%] > >>>>> else > >>>>> goto <bb 4>; [11.00%] > >>>>> > >>>>> <bb 3> [local count: 105119324]: > >>>>> _2 = (unsigned long) b_4(D); > >>>>> c_3 = __builtin_popcountl (_2); > >>>>> > >>>>> <bb 4> [local count: 118111601]: > >>>>> # c_12 = PHI <c_3(3), 0(2)> > >>>> > >>>> I think this is the code generated if maybe_zero is not set? which it > >>>> should not be set here. > >>>> For the same reason, it can be further optimized into: > >>>> > >>>>> <bb 2> [local count: 118111601]: > >>>>> _2 = (unsigned long) b_4(D); > >>>>> c_12 = __builtin_popcountl (_2); > >>>>> > >>>> > >>>>> latch execute zero times for b_4 == 0 means that the body will execute > >>>>> ones. > >>>> You never get zero times latch here with the aforementioned guard > >>>> condition. > >>>> > >>>> BTW, I didn't look at following patches which could be wanted > >>>> optimizations. > >>>> > >>>> Thanks, > >>>> bin > >>>>> > >>>>> The issue here is, since we are checking if (b_4(D) != 0) before > >>>>> entering the loop means we don't need to set maybe_zero. Patch 2 > >>>>> handles this. > >>>>> > >>>>> With that we have > >>>>> <bb 2> [local count: 118111601]: > >>>>> if (b_4(D) != 0) > >>>>> goto <bb 3>; [89.00%] > >>>>> else > >>>>> goto <bb 4>; [11.00%] > >>>>> > >>>>> <bb 3> [local count: 105119324]: > >>>>> _2 = (unsigned long) b_4(D); > >>>>> _9 = __builtin_popcountl (_2); > >>>>> > >>>>> <bb 4> [local count: 118111601]: > >>>>> # c_12 = PHI <0(2), _9(3)> > >>>>> > >>>>> As advised earlier, patch 3 adds phiopt support to remove this. > >>>>> > >>>>> Bootstrap and regression testing are ongoing. > >>>>> > >>>>> Is this OK for trunk. > >>>>> > >>>>> Thanks, > >>>>> Kugan