> -----Original Message----- > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Sent: Friday, July 18, 2025 5:48 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Sandiford > <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew > Pinski <pins...@gmail.com> > Subject: Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic > consistent > > Hi Tamar, > > > On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote: > > > > Hi Kyrill, > > > >> -----Original Message----- > >> From: Kyrylo Tkachov <ktkac...@nvidia.com> > >> Sent: Friday, July 18, 2025 10:40 AM > >> To: GCC Patches <gcc-patches@gcc.gnu.org> > >> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford > >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew > >> Pinski <pins...@gmail.com> > >> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent > >> > >> Hi all, > >> > >> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets > >> the cost unconditionally to the movi, dup or extract fields of extra_cost, > >> when the normal practice in that function is to use extra_cost only when > >> speed > >> is set. When speed is false the function should estimate the size cost > >> only. > >> This patch makes the logic consistent by using the extra_cost fields to > >> increment the cost when speed is set. This requires reducing the > >> extra_cost > values > >> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn > >> being > >> costed > >> has a cost of COSTS_N_INSNS (1) at the start of the function. The cost > >> tables > for > >> the CPUs are updated in line with this. > >> > > > > I've always personally interpreted the "extra" costs structures to mean > "additional costs" > > not extra costs on top of a baseline costs, hence the = vs +=. > > > > I think this diverged between the vector and scalar variants, so I'm not > > against > this but, > > Yes, the scalar parts definitely use the += form to treat extra_costs as > speed-related > differentiators. > The comment close to the start says: > /* By default, assume that everything has equivalent cost to the > cheapest instruction. Any additional costs are applied as a delta > above this default. */ >
Yeah in the original patch[1] it was += but I changed it because it wasn't clear that the costing would be consistent. There were/are many cases where the costs would be reset. It looks like they're not cases where we'd expect to find a constant. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582630.html > > > > I'm not sure I understand why for SPEED we don't take into account that we > > still > want a single instruction. > > TBH the extra cost tables can be very fragile and often what we want is a > binary > yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE flag > instead. > Though in the case of patch 2/2 they do seem to work exactly as desired: a > realistic-looking CPU-specific cost number derived from the latency driving a > correct codegen decision for different CPUs and -Os. > > > > > i.e. even for -Os we still want movi x, #1 over a literal load. I think > > all these > instructions are better > > for both codesize and speed but I'm probably missing something? > > Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which > for - > Os is the “cheapest” cost. I think that should prefer the optimizations that > were > introduced with the code that added them? Agreed, though my concern is that by having the same cost for RTXes that we're then fully reliant on whatever RTX a target independent optimization chooses for equal costs. But I see our current costing for !speed seems to be just don't cost most things. So this just makes it more consistent I guess.. After checking some of the original testcases we were discussing during the patch I think this is OK. So OK unless someone else disagrees. Thanks, Tamar > Thanks, > Kyrill > > > > > Thanks, > > Tamar > > > >> With these changes the testsuite is unaffected so no different costing > >> decisions are made and this patch is just a cleanup. > >> > >> Bootstrapped and tested on aarch64-none-linux-gnu. > >> Ok for trunk? > >> Thanks > >> Kyrill > >> > >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > >> > >> gcc/ > >> > >> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values > >> only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT > >> cases. > >> * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, > >> thunderx_extra_costs, thunderx2t99_extra_costs, > >> thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs, > >> ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs): > >> Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1). > >> * config/arm/aarch-cost-tables.h (generic_extra_costs, > >> cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, > >> exynosm1_extra_costs, xgene1_extra_costs): Likewise. >