Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> Sent: Friday, July 18, 2025 5:48 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Sandiford
>> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
>> Pinski <pins...@gmail.com>
>> Subject: Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic 
>> consistent
>> 
>> Hi Tamar,
>> 
>> > On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote:
>> >
>> > Hi Kyrill,
>> >
>> >> -----Original Message-----
>> >> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> >> Sent: Friday, July 18, 2025 10:40 AM
>> >> To: GCC Patches <gcc-patches@gcc.gnu.org>
>> >> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford
>> >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
>> >> Pinski <pins...@gmail.com>
>> >> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic 
>> >> consistent
>> >>
>> >> Hi all,
>> >>
>> >> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets
>> >> the cost unconditionally to the movi, dup or extract fields of extra_cost,
>> >> when the normal practice in that function is to use extra_cost only when 
>> >> speed
>> >> is set.  When speed is false the function should estimate the size cost 
>> >> only.
>> >> This patch makes the logic consistent by using the extra_cost fields to
>> >> increment the cost when speed is set.  This requires reducing the 
>> >> extra_cost
>> values
>> >> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn 
>> >> being
>> >> costed
>> >> has a cost of COSTS_N_INSNS (1) at the start of the function.  The cost 
>> >> tables
>> for
>> >> the CPUs are updated in line with this.
>> >>
>> >
>> > I've always personally interpreted the "extra" costs structures to mean
>> "additional costs"
>> > not extra costs on top of a baseline costs, hence the = vs +=.
>> >
>> > I think this diverged between the vector and scalar variants, so I'm not 
>> > against
>> this but,
>> 
>> Yes, the scalar parts definitely use the += form to treat extra_costs as 
>> speed-related
>> differentiators.
>> The comment close to the start says:
>>   /* By default, assume that everything has equivalent cost to the
>>      cheapest instruction.  Any additional costs are applied as a delta
>>      above this default.  */
>> 
>
> Yeah in the original patch[1] it was += but I changed it because it wasn't
> clear that the costing would be consistent.  There were/are many cases
> where the costs would be reset.  It looks like they're not cases where we'd
> expect to find a constant.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582630.html
>
>> >
>> > I'm not sure I understand why for SPEED we don't take into account that we 
>> > still
>> want a single instruction.
>> 
>> TBH the extra cost tables can be very fragile and often what we want is a 
>> binary
>> yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE flag
>> instead.
>> Though in the case of patch 2/2 they do seem to work exactly as desired: a
>> realistic-looking CPU-specific cost number derived from the latency driving a
>> correct codegen decision for different CPUs and -Os.
>> 
>> >
>> > i.e. even for -Os we still want movi x, #1 over a literal load.  I think 
>> > all these
>> instructions are better
>> > for both codesize and speed but I'm probably missing something?
>> 
>> Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which 
>> for -
>> Os is the “cheapest” cost. I think that should prefer the optimizations that 
>> were
>> introduced with the code that added them?
>
> Agreed, though my concern is that by having the same cost for RTXes that we're
> then fully reliant on whatever RTX a target independent optimization chooses 
> for
> equal costs. But I see our current costing for !speed seems to be just don't 
> cost
> most things.  So this just makes it more consistent I guess..
>
> After checking some of the original testcases we were discussing during the 
> patch
> I think this is OK.
>
> So OK unless someone else disagrees.

LGTM too FWIW.

Richard

Reply via email to