Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Kyrylo Tkachov <ktkac...@nvidia.com> >> Sent: Friday, July 18, 2025 5:48 PM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Sandiford >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew >> Pinski <pins...@gmail.com> >> Subject: Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic >> consistent >> >> Hi Tamar, >> >> > On 18 Jul 2025, at 18:25, Tamar Christina <tamar.christ...@arm.com> wrote: >> > >> > Hi Kyrill, >> > >> >> -----Original Message----- >> >> From: Kyrylo Tkachov <ktkac...@nvidia.com> >> >> Sent: Friday, July 18, 2025 10:40 AM >> >> To: GCC Patches <gcc-patches@gcc.gnu.org> >> >> Cc: Tamar Christina <tamar.christ...@arm.com>; Richard Sandiford >> >> <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew >> >> Pinski <pins...@gmail.com> >> >> Subject: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic >> >> consistent >> >> >> >> Hi all, >> >> >> >> The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets >> >> the cost unconditionally to the movi, dup or extract fields of extra_cost, >> >> when the normal practice in that function is to use extra_cost only when >> >> speed >> >> is set. When speed is false the function should estimate the size cost >> >> only. >> >> This patch makes the logic consistent by using the extra_cost fields to >> >> increment the cost when speed is set. This requires reducing the >> >> extra_cost >> values >> >> of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn >> >> being >> >> costed >> >> has a cost of COSTS_N_INSNS (1) at the start of the function. The cost >> >> tables >> for >> >> the CPUs are updated in line with this. >> >> >> > >> > I've always personally interpreted the "extra" costs structures to mean >> "additional costs" >> > not extra costs on top of a baseline costs, hence the = vs +=. >> > >> > I think this diverged between the vector and scalar variants, so I'm not >> > against >> this but, >> >> Yes, the scalar parts definitely use the += form to treat extra_costs as >> speed-related >> differentiators. >> The comment close to the start says: >> /* By default, assume that everything has equivalent cost to the >> cheapest instruction. Any additional costs are applied as a delta >> above this default. */ >> > > Yeah in the original patch[1] it was += but I changed it because it wasn't > clear that the costing would be consistent. There were/are many cases > where the costs would be reset. It looks like they're not cases where we'd > expect to find a constant. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582630.html > >> > >> > I'm not sure I understand why for SPEED we don't take into account that we >> > still >> want a single instruction. >> >> TBH the extra cost tables can be very fragile and often what we want is a >> binary >> yes/no codegen decision that could be encoded as an AARCH64_EXTRA_TUNE flag >> instead. >> Though in the case of patch 2/2 they do seem to work exactly as desired: a >> realistic-looking CPU-specific cost number derived from the latency driving a >> correct codegen decision for different CPUs and -Os. >> >> > >> > i.e. even for -Os we still want movi x, #1 over a literal load. I think >> > all these >> instructions are better >> > for both codesize and speed but I'm probably missing something? >> >> Now for -Os these cases will all get the cheapest COSTS_N_INSNS (1), which >> for - >> Os is the “cheapest” cost. I think that should prefer the optimizations that >> were >> introduced with the code that added them? > > Agreed, though my concern is that by having the same cost for RTXes that we're > then fully reliant on whatever RTX a target independent optimization chooses > for > equal costs. But I see our current costing for !speed seems to be just don't > cost > most things. So this just makes it more consistent I guess.. > > After checking some of the original testcases we were discussing during the > patch > I think this is OK. > > So OK unless someone else disagrees.
LGTM too FWIW. Richard