Hi Bin,

Thanks for your comments.

On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
> <kugan.vivekanandara...@linaro.org> wrote:
>> When we set niter with maybe_zero, currently final_value_relacement
>> will not happen due to expression_expensive_p not handling. Patch 1
>> adds this.
>>
>> With that we have the following optimized gimple.
>>
>>   <bb 2> [local count: 118111601]:
>>   if (b_4(D) != 0)
>>     goto <bb 3>; [89.00%]
>>   else
>>     goto <bb 4>; [11.00%]
>>
>>   <bb 3> [local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>
>>   <bb 4> [local count: 118111601]:
>>   # c_12 = PHI <c_3(3), 0(2)>
>>
>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
> computed should be 0.  Point is you can never get b_4(D) == 0 with
> guard condition in basic block 2.  So the result should simply be:

When we do  calculate niter (for the copy header case), we set
may_be_zero as (which I think is correct)
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
                  build_zero_cst
                  (TREE_TYPE (src)));

Then in final_value_replacement_loop (struct loop *loop)

for the PHI stmt for which we are going to do the final value replacement,
we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

then we do
compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

where when we do chrec_apply to the polynomial_chrec with niter from
popcount which also has the may_be_zero, we end up with the 1.
Looking at this, I am not sure if this is wrong. May be I am missing something.

In this testcase, before we enter the loop we have a check for (b_4(D)
> 0). Thus, setting niter->may_be_zero is not strictly necessary but
conservatively correct (?).

Thanks,
Kugan

>
>>   <bb 2> [local count: 118111601]:
>>   if (b_4(D) != 0)
>>     goto <bb 3>; [89.00%]
>>   else
>>     goto <bb 4>; [11.00%]
>>
>>   <bb 3> [local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   c_3 = __builtin_popcountl (_2);
>>
>>   <bb 4> [local count: 118111601]:
>>   # c_12 = PHI <c_3(3), 0(2)>
>
> I think this is the code generated if maybe_zero is not set?  which it
> should not be set here.
> For the same reason, it can be further optimized into:
>
>>   <bb 2> [local count: 118111601]:
>>   _2 = (unsigned long) b_4(D);
>>   c_12 = __builtin_popcountl (_2);
>>
>
>> latch execute zero times for b_4 == 0 means that the body will execute
>> ones.
> You never get zero times latch here with the aforementioned guard condition.
>
> BTW, I didn't look at following patches which could be wanted optimizations.
>
> Thanks,
> bin
>>
>> The issue here is, since we are checking if (b_4(D) != 0) before
>> entering the loop means we don't need to set maybe_zero. Patch 2
>> handles this.
>>
>> With that we have
>>   <bb 2> [local count: 118111601]:
>>   if (b_4(D) != 0)
>>     goto <bb 3>; [89.00%]
>>   else
>>     goto <bb 4>; [11.00%]
>>
>>   <bb 3> [local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>
>>   <bb 4> [local count: 118111601]:
>>   # c_12 = PHI <0(2), _9(3)>
>>
>> As advised earlier, patch 3 adds phiopt support to remove this.
>>
>> Bootstrap and regression testing are ongoing.
>>
>> Is this OK for trunk.
>>
>> Thanks,
>> Kugan

Reply via email to