On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/8/16 下午2:49, Richard Biener wrote:
> > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> >> consider the cost to extract the final value from the vector
> >> for reduc operations.  This patch is to add one time of
> >> vec_to_scalar cost for extracting.
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> >> The testing on x86_64 and aarch64 is ongoing.
> >>
> >> Is it ok for trunk?
> >
> > There's no such instruction necessary, the way the costing works
> > the result is in lane zero already.  Note the optabs are defined
> > to reduce to a scalar already.  So if your arch implements those and
> > requires such move then the backend costing needs to handle that.
> >
>
> Yes, these reduc_<operation>_scal_<mode> should have made the
> operand[0] as the final scalar result.
>
> > That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> > but for BB reductions we don't actually build a SLP node with such
> > representative stmt to pass down (yet).
> >
>
> OK, thanks for the explanation.  It explains why we cost the
> IFN_REDUC_* as one vect_stmt in loop vect but cost it as
> conservative (shuffle and reduc_op) as possible here.
>
> > I guess you're running into a integer reduction where there's
> > a vector -> gpr move missing in costing?  I suppose costing
> > vec_to_scalar works for that but in the end we should maybe
> > find a way to cost the IFN_REDUC_* ...
>
> Yeah, it's a reduction on plus, initially I wanted to adjust backend
> costing for various IFN_REDUC* (since for some variants Power has more
> than one instructions for them), then I noticed we cost the reduction
> as shuffle and reduc_op during SLP for now, I guess it's good to get
> vec_to_scalar considered here for consistency?  Then it can be removed
> together when we have a better modeling in the end?

Yeah, I guess that works for now.

Thanks,
Richard.

> BR,
> Kewen
>
> >
> > Richard.
> >
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost 
> >> for
> >>         value extraction.
> >>
> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> >> index b9d88c2d943..841a0872afa 100644
> >> --- a/gcc/tree-vect-slp.c
> >> +++ b/gcc/tree-vect-slp.c
> >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance 
> >> instance,
> >>      return false;
> >>
> >>    /* There's no way to cost a horizontal vector reduction via REDUC_FN so
> >> -     cost log2 vector operations plus shuffles.  */
> >> +     cost log2 vector operations plus shuffles and one extraction.  */
> >>    unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> >>    record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> >>                     vectype, 0, vect_body);
> >>    record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> >>                     vectype, 0, vect_body);
> >> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> >> +                   vectype, 0, vect_body);
> >>    return true;
> >>  }
>
>

Reply via email to