Hi Tamar, > On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi Kyrill, > >> -----Original Message----- >> From: Kyrylo Tkachov <ktkac...@nvidia.com> >> Sent: Friday, July 18, 2025 10:40 AM >> To: GCC Patches <gcc-patches@gcc.gnu.org> >> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew >> Pinski <pins...@gmail.com> >> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent >> >> Hi all, >> >> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets >> the cost unconditionally to the movi, dup or extract fields of extra_cost, >> when the normal practice in that function is to use extra_cost only when >> speed >> is set. When speed is false the function should estimate the size cost only. >> This patch makes the logic consistent by using the extra_cost fields to >> increment the cost when speed is set. This requires reducing the extra_cost >> values >> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn being >> costed >> has a cost of COSTS_N_INSNS (1) at the start of the function. The cost >> tables for >> the CPUs are updated in line with this. >> > > I've always personally interpreted the "extra" costs structures to mean > "additional costs" > not extra costs on top of a baseline costs, hence the = vs +=. > > I think this diverged between the vector and scalar variants, so I'm not > against this but,
Yes, the scalar parts definitely use the += form to treat extra_costs as speed-related differentiators. The comment close to the start says: /* By default, assume that everything has equivalent cost to the cheapest instruction. Any additional costs are applied as a delta above this default. */ > > I'm not sure I understand why for SPEED we don't take into account that we > still want a single instruction. TBH the extra cost tables can be very fragile and often what we want is a binary yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE flag instead. Though in the case of patch 2/2 they do seem to work exactly as desired: a realistic-looking CPU-specific cost number derived from the latency driving a correct codegen decision for different CPUs and -Os. > > i.e. even for -Os we still want movi x, #1 over a literal load. I think all > these instructions are better > for both codesize and speed but I'm probably missing something? Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which for -Os is the “cheapest” cost. I think that should prefer the optimizations that were introduced with the code that added them? Thanks, Kyrill > > Thanks, > Tamar > >> With these changes the testsuite is unaffected so no different costing >> decisions are made and this patch is just a cleanup. >> >> Bootstrapped and tested on aarch64-none-linux-gnu. >> Ok for trunk? >> Thanks >> Kyrill >> >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> >> >> gcc/ >> >> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values >> only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT >> cases. >> * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, >> thunderx_extra_costs, thunderx2t99_extra_costs, >> thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs, >> ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs): >> Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1). >> * config/arm/aarch-cost-tables.h (generic_extra_costs, >> cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, >> exynosm1_extra_costs, xgene1_extra_costs): Likewise.