RE: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

Tamar Christina Sun, 20 Jul 2025 00:33:00 -0700

> -----Original Message-----
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Sent: Friday, July 18, 2025 5:48 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Sandiford
> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
> Pinski <pins...@gmail.com>
> Subject: Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic 
> consistent
> 
> Hi Tamar,
> 
> > On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote:
> >
> > Hi Kyrill,
> >
> >> -----Original Message-----
> >> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> >> Sent: Friday, July 18, 2025 10:40 AM
> >> To: GCC Patches <gcc-patches@gcc.gnu.org>
> >> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford
> >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
> >> Pinski <pins...@gmail.com>
> >> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent
> >>
> >> Hi all,
> >>
> >> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets
> >> the cost unconditionally to the movi, dup or extract fields of extra_cost,
> >> when the normal practice in that function is to use extra_cost only when 
> >> speed
> >> is set.  When speed is false the function should estimate the size cost 
> >> only.
> >> This patch makes the logic consistent by using the extra_cost fields to
> >> increment the cost when speed is set.  This requires reducing the 
> >> extra_cost
> values
> >> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn 
> >> being
> >> costed
> >> has a cost of COSTS_N_INSNS (1) at the start of the function.  The cost 
> >> tables
> for
> >> the CPUs are updated in line with this.
> >>
> >
> > I've always personally interpreted the "extra" costs structures to mean
> "additional costs"
> > not extra costs on top of a baseline costs, hence the = vs +=.
> >
> > I think this diverged between the vector and scalar variants, so I'm not 
> > against
> this but,
> 
> Yes, the scalar parts definitely use the += form to treat extra_costs as 
> speed-related
> differentiators.
> The comment close to the start says:
>   /* By default, assume that everything has equivalent cost to the
>      cheapest instruction.  Any additional costs are applied as a delta
>      above this default.  */
>


Yeah in the original patch[1] it was += but I changed it because it wasn't
clear that the costing would be consistent.  There were/are many cases
where the costs would be reset.  It looks like they're not cases where we'd
expect to find a constant.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582630.html

> >
> > I'm not sure I understand why for SPEED we don't take into account that we 
> > still
> want a single instruction.
> 
> TBH the extra cost tables can be very fragile and often what we want is a 
> binary
> yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE flag
> instead.
> Though in the case of patch 2/2 they do seem to work exactly as desired: a
> realistic-looking CPU-specific cost number derived from the latency driving a
> correct codegen decision for different CPUs and -Os.
> 
> >
> > i.e. even for -Os we still want movi x, #1 over a literal load.  I think 
> > all these
> instructions are better
> > for both codesize and speed but I'm probably missing something?
> 
> Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which 
> for -
> Os is the “cheapest” cost. I think that should prefer the optimizations that 
> were
> introduced with the code that added them?

Agreed, though my concern is that by having the same cost for RTXes that we're
then fully reliant on whatever RTX a target independent optimization chooses for
equal costs. But I see our current costing for !speed seems to be just don't 
cost
most things.  So this just makes it more consistent I guess..

After checking some of the original testcases we were discussing during the 
patch
I think this is OK.

So OK unless someone else disagrees.

Thanks,
Tamar 

> Thanks,
> Kyrill
> 
> >
> > Thanks,
> > Tamar
> >
> >> With these changes the testsuite is unaffected so no different costing
> >> decisions are made and this patch is just a cleanup.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Ok for trunk?
> >> Thanks
> >> Kyrill
> >>
> >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> >>
> >> gcc/
> >>
> >> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values
> >> only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT
> >> cases.
> >> * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
> >> thunderx_extra_costs, thunderx2t99_extra_costs,
> >> thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs,
> >> ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs):
> >> Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1).
> >> * config/arm/aarch-cost-tables.h (generic_extra_costs,
> >> cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
> >> exynosm1_extra_costs, xgene1_extra_costs): Likewise.
>

RE: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

Reply via email to