Hi Bin,

Thanks for the review!!

on 2020/8/8 下午4:01, Bin.Cheng wrote:
> Hi Kewen,
> Sorry for the late reply.
> The patch's most important change is below cost computation:
> 
>> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct 
>> iv_cand *cand)
>>     cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
>>   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>>
>> +  /* Consider additional step updates during unrolling.  */
>> +  if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p)
>> +    cost += (data->current_loop->estimated_unroll - 1) * cost_step;
> This is a bit strange, to me the add instructions are additional
> computation caused by unrolling+addressing_mode, rather than a native
> part in candidate itself.  Specifically, an additional cost is needed
> if candidates (without reg_offset_p) are chosen for the address type
> group/uses.

Good point, ideally it should be one additional cost for each cand set,
when we select one cand for one group, we need to check this pair need
more (estimated_unroll - 1) step costs, we probably need to care about
this during remove/replace etc.  IIUC the current IVOPTs cost framework
doesn't support this and it could increase the selection complexity and
time.  I hesitated to do it and put it to cand step cost initially instead.

I was thinking those candidates with reg_offset_p should be only used for
those reg_offset_p groups in most cases (very limited) meanwhile the others
are simply scaled up like before.  But indeed this can cover some similar
cases like one cand is only used for the compare type group which is for
loop closing, then it doesn't need more step costs for unrolling.

Do you prefer me to improve the current cost framework?

>> +
>>   /* Prefer the original ivs unless we may gain something by replacing it.
>>      The reason is to make debugging simpler; so this is not relevant for
>>      artificial ivs created by other optimization passes.  */
>>
> 
>> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data,
>>       return;
>>     }
>>
>> +  /* Since we priced more on non reg_offset IV cand step cost, we should 
>> scale
>> +     up the appropriate IV group costs.  Simply consider USE_COMPARE at the
>> +     loop exit, FIXME if multiple exits supported or no loop exit 
>> comparisons
>> +     matter.  */
>> +  if (data->consider_reg_offset_for_unroll_p
>> +      && group->vuses[0]->type != USE_COMPARE)
>> +    cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll;
> Not quite follow here, giving "pricing more on on-reg_offset IV cand"
> doesn't make much sense to me.  Also why generic type uses are not
> skipped?  We want to model the cost required for address computation,
> however, for generic type uses there is no way to save the computation
> in "address expression".  Once unrolled, the computation is always
> there?
> 

The main intention is to scale up the group/cand cost for unrolling since
we have scaled up the step costs.  The assumption is that the original
costing (without this patch) can be viewed as either for all unrolled
iterations or just one single iteration.  Since IVOPTs doesn't support
fractional costing, I interpreted it as single iterations, to emulate
unrolling scenario based on the previous step cost scaling, we need to 
scale up the cost for all computation.

In most cases, the compare type use is for loop closing, there is still
only one computation even unrolling happens, so I excluded it here.
As "FIXME", if we find some cases are off, we can further restrict it to
those USE_COMPARE uses which is exactly for loop closing.

> And what's the impact on targets supporting [base + index + offset]
> addressing mode?

Good question, I didn't notice it since power doesn't support it.
I noticed the comments of function addr_offset_valid_p only mentioning
[base + offset], I guess it excludes [base + index + offset]?
But I guess the address-based IV can work for this mode?

> 
> Given the patch is not likely to harm because rtl loop unrolling is
> (or was?) by default disabled, so I am OK once the above comments are
> addressed.
> 

Yes, it needs explicit unrolling options, excepting for some targets
wants to enable it for some cases with specific loop_unroll_adjust checks.

> I wonder if it's possible to get 10% of (all which should be unrolled)
> loops unrolled (conservatively) on gimple and enable it by default at
> O3, rather than teaching ivopts to model a future pass which not
> likely be used outside of benchmarks?
> 

Yeah, it would be nice if the unrolling happen before IVOPTs and won't
have any future unrollings to get it off.  PR[1] seems to have some
discussion on gimple unrolling.

Richi suggested driven-by-need gimple unrolling in the previous discussion[2]
on the RFC of this.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537645.html

BR,
Kewen

Reply via email to