Hi Tamar,

> On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote:
>
> Hi Kyrill,
>
>> -----Original Message-----
>> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> Sent: Friday, July 18, 2025 10:40 AM
>> To: GCC Patches <gcc-patches@gcc.gnu.org>
>> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford
>> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
>> Pinski <pins...@gmail.com>
>> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent
>>
>> Hi all,
>>
>> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets
>> the cost unconditionally to the movi, dup or extract fields of extra_cost,
>> when the normal practice in that function is to use extra_cost only when 
>> speed
>> is set.  When speed is false the function should estimate the size cost only.
>> This patch makes the logic consistent by using the extra_cost fields to
>> increment the cost when speed is set.  This requires reducing the extra_cost 
>> values
>> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn being
>> costed
>> has a cost of COSTS_N_INSNS (1) at the start of the function.  The cost 
>> tables for
>> the CPUs are updated in line with this.
>>
>
> I've always personally interpreted the "extra" costs structures to mean 
> "additional costs"
> not extra costs on top of a baseline costs, hence the = vs +=.
>
> I think this diverged between the vector and scalar variants, so I'm not 
> against this but,

Yes, the scalar parts definitely use the += form to treat extra_costs as 
speed-related differentiators.
The comment close to the start says:
  /* By default, assume that everything has equivalent cost to the
     cheapest instruction.  Any additional costs are applied as a delta
     above this default.  */

>
> I'm not sure I understand why for SPEED we don't take into account that we 
> still want a single instruction.

TBH the extra cost tables can be very fragile and often what we want is a 
binary yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE 
flag instead.
Though in the case of patch 2/2 they do seem to work exactly as desired: a 
realistic-looking CPU-specific cost number derived from the latency driving a 
correct codegen decision for different CPUs and -Os.

>
> i.e. even for -Os we still want movi x, #1 over a literal load.  I think all 
> these instructions are better
> for both codesize and speed but I'm probably missing something?

Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which for 
-Os is the “cheapest” cost. I think that should prefer the optimizations that 
were introduced with the code that added them?
Thanks,
Kyrill

>
> Thanks,
> Tamar
>
>> With these changes the testsuite is unaffected so no different costing
>> decisions are made and this patch is just a cleanup.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for trunk?
>> Thanks
>> Kyrill
>>
>> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>>
>> gcc/
>>
>> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values
>> only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT
>> cases.
>> * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
>> thunderx_extra_costs, thunderx2t99_extra_costs,
>> thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs,
>> ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs):
>> Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1).
>> * config/arm/aarch-cost-tables.h (generic_extra_costs,
>> cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
>> exynosm1_extra_costs, xgene1_extra_costs): Likewise.

Reply via email to