scatter decode overhead into account

Richard Sandiford Fri, 02 Aug 2024 04:50:13 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Tamar Christina <tamar.christ...@arm.com>
>> Sent: Thursday, August 1, 2024 9:51 AM
>> To: Richard Sandiford <richard.sandif...@arm.com>
>> Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; gcc-patches@gcc.gnu.org; nd
>> <n...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>; Marcus
>> Shawcroft <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: RE: [PATCH 8/8]AArch64: take gather/scatter decode overhead into
>> account
>>
>> > -----Original Message-----
>> > From: Richard Sandiford <richard.sandif...@arm.com>
>> > Sent: Wednesday, July 31, 2024 7:17 PM
>> > To: Tamar Christina <tamar.christ...@arm.com>
>> > Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; gcc-patches@gcc.gnu.org; nd
>> > <n...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>; Marcus
>> > Shawcroft <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
>> > Subject: Re: [PATCH 8/8]AArch64: take gather/scatter decode overhead into
>> > account
>> >
>> > Tamar Christina <tamar.christ...@arm.com> writes:
>> > > @@ -289,6 +293,12 @@ struct sve_vec_cost : simd_vec_cost
>> > >    const int gather_load_x32_cost;
>> > >    const int gather_load_x64_cost;
>> > >
>> > > +  /* Additional loop initialization cost of using a gather load 
>> > > instruction.  The
>> x32
>> >
>> > Sorry for the trivia, but: long line.
>>
>> Yeah, noticed it after I sent out the patch 😊
>>
>> >
>> > > +     value is for loads of 32-bit elements and the x64 value is for 
>> > > loads of
>> > > +     64-bit elements.  */
>> > > +  const int gather_load_x32_init_cost;
>> > > +  const int gather_load_x64_init_cost;
>> > > +
>> > >    /* The per-element cost of a scatter store.  */
>> > >    const int scatter_store_elt_cost;
>> > >  };
>> > > [...]
>> > > @@ -17291,6 +17295,20 @@ aarch64_vector_costs::add_stmt_cost (int
>> count,
>> > vect_cost_for_stmt kind,
>> > >   stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
>> > >                                                   stmt_info, vectype,
>> > >                                                   where, stmt_cost);
>> > > +
>> > > +      /* Check if we've seen an SVE gather/scatter operation and which 
>> > > size.  */
>> > > +      if (kind == scalar_load
>> > > +   && aarch64_sve_mode_p (TYPE_MODE (vectype))
>> > > +   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) ==
>> > VMAT_GATHER_SCATTER)
>> > > + {
>> > > +   const sve_vec_cost *sve_costs = aarch64_tune_params.vec_costs->sve;
>> >
>> > I think we need to check whether this is nonnull, since not all tuning
>> > targets provide SVE costs.
>>
>> Will do, but I thought since this was in a block that checked
>> aarch64_use_new_vector_costs_p ()
>> that the SVE costs were required.  What does that predicate mean then?
>
> Ah nevermind, just realized it just means to apply the new vector cost 
> routines,
> I hadn't realized that we supported new costing for non-SVE models as well.
>
> Will add the check.


Yeah, the eventual plan was to remove aarch64_use_new_vector_costs_p
and make everything use the same path (without any changes to the
tuning structures being needed).  But it's never bubbled high enough
up the to-do list.

The reason for adding aarch64_use_new_vector_costs_p originally was
because the associated cost changes were made late in a release cycle,
and I didn't want to change the code for anything that didn't actively
need the new costs.

Thanks,
Richard

Re: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account

Reply via email to