On Mon, Aug 10, 2020 at 12:27 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Bin,
>
> Thanks for the review!!
>
> on 2020/8/8 下午4:01, Bin.Cheng wrote:
> > Hi Kewen,
> > Sorry for the late reply.
> > The patch's most important change is below cost computation:
> >
> >> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct 
> >> iv_cand *cand)
> >>     cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
> >>   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
> >>
> >> +  /* Consider additional step updates during unrolling.  */
> >> +  if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p)
> >> +    cost += (data->current_loop->estimated_unroll - 1) * cost_step;
> > This is a bit strange, to me the add instructions are additional
> > computation caused by unrolling+addressing_mode, rather than a native
> > part in candidate itself.  Specifically, an additional cost is needed
> > if candidates (without reg_offset_p) are chosen for the address type
> > group/uses.
>
> Good point, ideally it should be one additional cost for each cand set,
> when we select one cand for one group, we need to check this pair need
> more (estimated_unroll - 1) step costs, we probably need to care about
> this during remove/replace etc.  IIUC the current IVOPTs cost framework
> doesn't support this and it could increase the selection complexity and
> time.  I hesitated to do it and put it to cand step cost initially instead.
>
> I was thinking those candidates with reg_offset_p should be only used for
> those reg_offset_p groups in most cases (very limited) meanwhile the others
> are simply scaled up like before.  But indeed this can cover some similar
> cases like one cand is only used for the compare type group which is for
> loop closing, then it doesn't need more step costs for unrolling.
>
> Do you prefer me to improve the current cost framework?
No, I don't think it's relevant to the candidate selecting algorithm.
I was thinking about adjusting cost somehow in
determine_group_iv_cost_address. Given we don't expose selected
addressing mode in this function, you may need to do it in
get_address_cost, either way.

>
> >> +
> >>   /* Prefer the original ivs unless we may gain something by replacing it.
> >>      The reason is to make debugging simpler; so this is not relevant for
> >>      artificial ivs created by other optimization passes.  */
> >>
> >
> >> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data,
> >>       return;
> >>     }
> >>
> >> +  /* Since we priced more on non reg_offset IV cand step cost, we should 
> >> scale
> >> +     up the appropriate IV group costs.  Simply consider USE_COMPARE at 
> >> the
> >> +     loop exit, FIXME if multiple exits supported or no loop exit 
> >> comparisons
> >> +     matter.  */
> >> +  if (data->consider_reg_offset_for_unroll_p
> >> +      && group->vuses[0]->type != USE_COMPARE)
> >> +    cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll;
> > Not quite follow here, giving "pricing more on on-reg_offset IV cand"
> > doesn't make much sense to me.  Also why generic type uses are not
> > skipped?  We want to model the cost required for address computation,
> > however, for generic type uses there is no way to save the computation
> > in "address expression".  Once unrolled, the computation is always
> > there?
> >
>
> The main intention is to scale up the group/cand cost for unrolling since
> we have scaled up the step costs.  The assumption is that the original
If we adjust cost appropriately in function *group_iv_cost_address,
this would become unnecessary, right?  And naturally.
> costing (without this patch) can be viewed as either for all unrolled
> iterations or just one single iteration.  Since IVOPTs doesn't support
> fractional costing, I interpreted it as single iterations, to emulate
> unrolling scenario based on the previous step cost scaling, we need to
> scale up the cost for all computation.
>
> In most cases, the compare type use is for loop closing, there is still
> only one computation even unrolling happens, so I excluded it here.
> As "FIXME", if we find some cases are off, we can further restrict it to
> those USE_COMPARE uses which is exactly for loop closing.
>
> > And what's the impact on targets supporting [base + index + offset]
> > addressing mode?
>
> Good question, I didn't notice it since power doesn't support it.
> I noticed the comments of function addr_offset_valid_p only mentioning
> [base + offset], I guess it excludes [base + index + offset]?
> But I guess the address-based IV can work for this mode?
No, addr_offset_valid_p is only used to split address use groups.  See
get_address_cost and struct mem_address.
I forgot to ask, what about target only supports [base + offset]
addressing mode like RISC-V?  I would expect it's not affected at all.

>
> >
> > Given the patch is not likely to harm because rtl loop unrolling is
> > (or was?) by default disabled, so I am OK once the above comments are
> > addressed.
> >
>
> Yes, it needs explicit unrolling options, excepting for some targets
> wants to enable it for some cases with specific loop_unroll_adjust checks.
>
> > I wonder if it's possible to get 10% of (all which should be unrolled)
> > loops unrolled (conservatively) on gimple and enable it by default at
> > O3, rather than teaching ivopts to model a future pass which not
> > likely be used outside of benchmarks?
> >
>
> Yeah, it would be nice if the unrolling happen before IVOPTs and won't
> have any future unrollings to get it off.  PR[1] seems to have some
> discussion on gimple unrolling.
Thanks for directing me to the discussion.  I am on Wilco's side on
this problem, IMHO, It might be useful getting small loops unrolled
(at O3 by default) by simply saving induction variable stepping and
exit condition check, which can be modeled on gimple.  Especially for
RISC-V, it doesn't support index addressing mode, which means there
will be as many induction variables as distinct arrays.  Also
interleaving after unrolling is not that important, it's at high
chance that small loops eligible for interleaving are handled by
vectorizer already.

Thanks,
bin
>
> Richi suggested driven-by-need gimple unrolling in the previous discussion[2]
> on the RFC of this.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537645.html
>
> BR,
> Kewen

Reply via email to