Tamar Christina <tamar.christ...@arm.com> writes:
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > When determining issue rates we currently discount non-constant MLA
>> > accumulators for Advanced SIMD but don't do it for the latency.
>> >
>> > This means the costs for Advanced SIMD with a constant accumulator are
>> > wrong and results in us costing SVE and Advanced SIMD the same.  This
>> > can cauze us to vectorize with Advanced SIMD instead of SVE in some cases.
>> >
>> > This patch adds the same discount for SVE and Scalar as we do for issue 
>> > rate.
>> >
>> > My assumption was that on issue rate we reject all scalar constants
>> > early because we take into account the extra instruction to create the
>> constant?
>> > Though I'd have expected this to be in prologue costs.  For this
>> > reason I added an extra parameter to allow me to force the check to at
>> > least look for the multiplication.
>> 
>> I'm not sure that was it.  I wish I'd added a comment to say what it was
>> though :(  I suspect different parts of this function were written at 
>> different
>> times, hence the inconsistency.
>> 
>> > This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
>> > Neoverse cores.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >    * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
>> >    allow_constants.
>> >    (aarch64_adjust_stmt_cost): Use it.
>> >    (aarch64_vector_costs::count_ops): Likewise.
>> >    (aarch64_vector_costs::add_stmt_cost): Pass vinfo to
>> >    aarch64_adjust_stmt_cost.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> > b/gcc/config/aarch64/aarch64.cc index
>> >
>> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c9725
>> 57e2f83
>> > b63ba365fea9 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum
>> vect_cost_for_stmt kind,
>> >     or multiply-subtract sequence that might be suitable for fusing into a
>> >     single instruction.  If VEC_FLAGS is zero, analyze the operation as
>> >     a scalar one, otherwise analyze it as an operation on vectors with 
>> > those
>> > -   VEC_* flags.  */
>> > +   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators
>> including
>> > +   constant ones.  */
>> >  static bool
>> >  aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
>> > -                  unsigned int vec_flags)
>> > +                  unsigned int vec_flags, bool allow_constants)
>> >  {
>> >    gassign *assign = dyn_cast<gassign *> (stmt_info->stmt);
>> >    if (!assign)
>> > @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >    if (code != PLUS_EXPR && code != MINUS_EXPR)
>> >      return false;
>> >
>> > -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>> > -      || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
>> > +  if (!allow_constants
>> > +      && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>> > +    || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign))))
>> >      return false;
>> >
>> >    for (int i = 1; i < 3; ++i)
>> > @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >        if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
>> >    continue;
>> >
>> > -      if (vec_flags & VEC_ADVSIMD)
>> > +      if (!allow_constants && (vec_flags & VEC_ADVSIMD))
>> >    {
>> >      /* Scalar and SVE code can tie the result to any FMLA input (or none,
>> >         although that requires a MOVPRFX for SVE).  However, Advanced
>> > SIMD @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info
>> *vinfo, stmt_vec_info stmt_info,
>> >        return false;
>> >      def_stmt_info = vinfo->lookup_def (rhs);
>> >      if (!def_stmt_info
>> > -        || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
>> > +        || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
>> > +        || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
>> 
>> Do you see vect_constant_defs in practice, or is this just for completeness?
>> I would expect any constants to appear as direct operands.  I don't mind
>> keeping it if it's just a belt-and-braces thing though.
>
> In the latency case where I had allow_constants the early rejection based on
> the operand itself wouldn't be rejected so in that case I still needed to 
> reject
> them but do so after the multiply check.  While they do appear as direct
> operands as well they also have their own nodes, in particular for SLP so the
> constants are handled as a group.

Ah, OK, thanks.

> But can also check CONSTANT_CLASS_P (rhs) if that's preferrable. 

No, what you did is more correct.  I just wasn't sure at first which case
it was handling.

Thanks,
Richard

Reply via email to