On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for the comments! > > on 2021/8/16 下午2:49, Richard Biener wrote: > > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> IIUC, the function vectorizable_bb_reduc_epilogue missed to > >> consider the cost to extract the final value from the vector > >> for reduc operations. This patch is to add one time of > >> vec_to_scalar cost for extracting. > >> > >> Bootstrapped & regtested on powerpc64le-linux-gnu P9. > >> The testing on x86_64 and aarch64 is ongoing. > >> > >> Is it ok for trunk? > > > > There's no such instruction necessary, the way the costing works > > the result is in lane zero already. Note the optabs are defined > > to reduce to a scalar already. So if your arch implements those and > > requires such move then the backend costing needs to handle that. > > > > Yes, these reduc_<operation>_scal_<mode> should have made the > operand[0] as the final scalar result. > > > That said, ideally we'd simply cost the IFN_REDUC_* in the backend > > but for BB reductions we don't actually build a SLP node with such > > representative stmt to pass down (yet). > > > > OK, thanks for the explanation. It explains why we cost the > IFN_REDUC_* as one vect_stmt in loop vect but cost it as > conservative (shuffle and reduc_op) as possible here. > > > I guess you're running into a integer reduction where there's > > a vector -> gpr move missing in costing? I suppose costing > > vec_to_scalar works for that but in the end we should maybe > > find a way to cost the IFN_REDUC_* ... > > Yeah, it's a reduction on plus, initially I wanted to adjust backend > costing for various IFN_REDUC* (since for some variants Power has more > than one instructions for them), then I noticed we cost the reduction > as shuffle and reduc_op during SLP for now, I guess it's good to get > vec_to_scalar considered here for consistency? Then it can be removed > together when we have a better modeling in the end?
Yeah, I guess that works for now. Thanks, Richard. > BR, > Kewen > > > > > Richard. > > > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost > >> for > >> value extraction. > >> > >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > >> index b9d88c2d943..841a0872afa 100644 > >> --- a/gcc/tree-vect-slp.c > >> +++ b/gcc/tree-vect-slp.c > >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance > >> instance, > >> return false; > >> > >> /* There's no way to cost a horizontal vector reduction via REDUC_FN so > >> - cost log2 vector operations plus shuffles. */ > >> + cost log2 vector operations plus shuffles and one extraction. */ > >> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); > >> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], > >> vectype, 0, vect_body); > >> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], > >> vectype, 0, vect_body); > >> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], > >> + vectype, 0, vect_body); > >> return true; > >> } > >