Hi Bin, Thanks for your comments.
On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: >> When we set niter with maybe_zero, currently final_value_relacement >> will not happen due to expression_expensive_p not handling. Patch 1 >> adds this. >> >> With that we have the following optimized gimple. >> >> <bb 2> [local count: 118111601]: >> if (b_4(D) != 0) >> goto <bb 3>; [89.00%] >> else >> goto <bb 4>; [11.00%] >> >> <bb 3> [local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> c_3 = b_4(D) != 0 ? _9 : 1; >> >> <bb 4> [local count: 118111601]: >> # c_12 = PHI <c_3(3), 0(2)> >> >> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the > No, it doesn't make much sense. when b_4(D) == 0, the popcount > computed should be 0. Point is you can never get b_4(D) == 0 with > guard condition in basic block 2. So the result should simply be: When we do calculate niter (for the copy header case), we set may_be_zero as (which I think is correct) niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst (TREE_TYPE (src))); Then in final_value_replacement_loop (struct loop *loop) for the PHI stmt for which we are going to do the final value replacement, we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. then we do compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) where when we do chrec_apply to the polynomial_chrec with niter from popcount which also has the may_be_zero, we end up with the 1. Looking at this, I am not sure if this is wrong. May be I am missing something. In this testcase, before we enter the loop we have a check for (b_4(D) > 0). Thus, setting niter->may_be_zero is not strictly necessary but conservatively correct (?). Thanks, Kugan > >> <bb 2> [local count: 118111601]: >> if (b_4(D) != 0) >> goto <bb 3>; [89.00%] >> else >> goto <bb 4>; [11.00%] >> >> <bb 3> [local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> c_3 = __builtin_popcountl (_2); >> >> <bb 4> [local count: 118111601]: >> # c_12 = PHI <c_3(3), 0(2)> > > I think this is the code generated if maybe_zero is not set? which it > should not be set here. > For the same reason, it can be further optimized into: > >> <bb 2> [local count: 118111601]: >> _2 = (unsigned long) b_4(D); >> c_12 = __builtin_popcountl (_2); >> > >> latch execute zero times for b_4 == 0 means that the body will execute >> ones. > You never get zero times latch here with the aforementioned guard condition. > > BTW, I didn't look at following patches which could be wanted optimizations. > > Thanks, > bin >> >> The issue here is, since we are checking if (b_4(D) != 0) before >> entering the loop means we don't need to set maybe_zero. Patch 2 >> handles this. >> >> With that we have >> <bb 2> [local count: 118111601]: >> if (b_4(D) != 0) >> goto <bb 3>; [89.00%] >> else >> goto <bb 4>; [11.00%] >> >> <bb 3> [local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> >> <bb 4> [local count: 118111601]: >> # c_12 = PHI <0(2), _9(3)> >> >> As advised earlier, patch 3 adds phiopt support to remove this. >> >> Bootstrap and regression testing are ongoing. >> >> Is this OK for trunk. >> >> Thanks, >> Kugan