Tamar Christina <tamar.christ...@arm.com> writes: >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > When determining issue rates we currently discount non-constant MLA >> > accumulators for Advanced SIMD but don't do it for the latency. >> > >> > This means the costs for Advanced SIMD with a constant accumulator are >> > wrong and results in us costing SVE and Advanced SIMD the same. This >> > can cauze us to vectorize with Advanced SIMD instead of SVE in some cases. >> > >> > This patch adds the same discount for SVE and Scalar as we do for issue >> > rate. >> > >> > My assumption was that on issue rate we reject all scalar constants >> > early because we take into account the extra instruction to create the >> constant? >> > Though I'd have expected this to be in prologue costs. For this >> > reason I added an extra parameter to allow me to force the check to at >> > least look for the multiplication. >> >> I'm not sure that was it. I wish I'd added a comment to say what it was >> though :( I suspect different parts of this function were written at >> different >> times, hence the inconsistency. >> >> > This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large >> > Neoverse cores. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param >> > allow_constants. >> > (aarch64_adjust_stmt_cost): Use it. >> > (aarch64_vector_costs::count_ops): Likewise. >> > (aarch64_vector_costs::add_stmt_cost): Pass vinfo to >> > aarch64_adjust_stmt_cost. >> > >> > --- inline copy of patch -- >> > diff --git a/gcc/config/aarch64/aarch64.cc >> > b/gcc/config/aarch64/aarch64.cc index >> > >> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c9725 >> 57e2f83 >> > b63ba365fea9 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum >> vect_cost_for_stmt kind, >> > or multiply-subtract sequence that might be suitable for fusing into a >> > single instruction. If VEC_FLAGS is zero, analyze the operation as >> > a scalar one, otherwise analyze it as an operation on vectors with >> > those >> > - VEC_* flags. */ >> > + VEC_* flags. When ALLOW_CONSTANTS we'll recognize all accumulators >> including >> > + constant ones. */ >> > static bool >> > aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info, >> > - unsigned int vec_flags) >> > + unsigned int vec_flags, bool allow_constants) >> > { >> > gassign *assign = dyn_cast<gassign *> (stmt_info->stmt); >> > if (!assign) >> > @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo, >> stmt_vec_info stmt_info, >> > if (code != PLUS_EXPR && code != MINUS_EXPR) >> > return false; >> > >> > - if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign)) >> > - || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign))) >> > + if (!allow_constants >> > + && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign)) >> > + || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))) >> > return false; >> > >> > for (int i = 1; i < 3; ++i) >> > @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo, >> stmt_vec_info stmt_info, >> > if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR) >> > continue; >> > >> > - if (vec_flags & VEC_ADVSIMD) >> > + if (!allow_constants && (vec_flags & VEC_ADVSIMD)) >> > { >> > /* Scalar and SVE code can tie the result to any FMLA input (or none, >> > although that requires a MOVPRFX for SVE). However, Advanced >> > SIMD @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info >> *vinfo, stmt_vec_info stmt_info, >> > return false; >> > def_stmt_info = vinfo->lookup_def (rhs); >> > if (!def_stmt_info >> > - || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def) >> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def >> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def) >> >> Do you see vect_constant_defs in practice, or is this just for completeness? >> I would expect any constants to appear as direct operands. I don't mind >> keeping it if it's just a belt-and-braces thing though. > > In the latency case where I had allow_constants the early rejection based on > the operand itself wouldn't be rejected so in that case I still needed to > reject > them but do so after the multiply check. While they do appear as direct > operands as well they also have their own nodes, in particular for SLP so the > constants are handled as a group.
Ah, OK, thanks. > But can also check CONSTANT_CLASS_P (rhs) if that's preferrable. No, what you did is more correct. I just wasn't sure at first which case it was handling. Thanks, Richard