Re: [pushed] [RA]: Improve cost calculation of pseudos with equivalences
On 9/14/23 09:28, Vladimir Makarov via Gcc-patches wrote: I've committed the following patch. The reason for this patch is explained in its commit message. The patch was successfully bootstrapped and tested on x86-64, aarch64, and ppc64le. ra-equiv-cost.patch_ZN7cObject4dropEP12cOwnedObject-stores commit 3c834d85f2ec42c60995c2b678196a06cb744959 Author: Vladimir N. Makarov Date: Thu Sep 14 10:26:48 2023 -0400 [RA]: Improve cost calculation of pseudos with equivalences RISCV target developers reported that RA can spill pseudo used in a loop although there are enough registers to assign. It happens when the pseudo has an equivalence outside the loop and the equivalence is not merged into insns using the pseudo. IRA sets up that memory cost to zero when the pseudo has an equivalence and it means that the pseudo will be probably spilled. This approach worked well for i686 (different approaches were benchmarked long time ago on spec2k). Although common sense says that the code is wrong and this was confirmed by RISCV developers. I've tried the following patch on I7-9700k and it improved spec17 fp by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45% (8.54 vs 8.58). The average generated code size is practically the same (0.001% difference). In the future we probably need to try more sophisticated cost calculation which should take into account that the equiv can not be combined in usage insns and the costs of reloads because of this. gcc/ChangeLog: * ira-costs.cc (find_costs_and_classes): Decrease memory cost by equiv savings. Thanks for diving into this! What's rather strange is when I do an A/B test with this patch on RISC-V it appears to be a pretty consistent loss for integer code. This would seem to match your findings on x86 as well. I still need to dig into it more deeply, but I see higher ALU as well as higher load/store traffic. The load/store traffic in the one case I've looked at so far (omnetpp) appears to be prologue/epilogue related. Essentially we're using an additional callee saved register on paths that don't trigger at runtime. Jeff
[pushed] [RA]: Improve cost calculation of pseudos with equivalences
I've committed the following patch. The reason for this patch is explained in its commit message. The patch was successfully bootstrapped and tested on x86-64, aarch64, and ppc64le. commit 3c834d85f2ec42c60995c2b678196a06cb744959 Author: Vladimir N. Makarov Date: Thu Sep 14 10:26:48 2023 -0400 [RA]: Improve cost calculation of pseudos with equivalences RISCV target developers reported that RA can spill pseudo used in a loop although there are enough registers to assign. It happens when the pseudo has an equivalence outside the loop and the equivalence is not merged into insns using the pseudo. IRA sets up that memory cost to zero when the pseudo has an equivalence and it means that the pseudo will be probably spilled. This approach worked well for i686 (different approaches were benchmarked long time ago on spec2k). Although common sense says that the code is wrong and this was confirmed by RISCV developers. I've tried the following patch on I7-9700k and it improved spec17 fp by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45% (8.54 vs 8.58). The average generated code size is practically the same (0.001% difference). In the future we probably need to try more sophisticated cost calculation which should take into account that the equiv can not be combined in usage insns and the costs of reloads because of this. gcc/ChangeLog: * ira-costs.cc (find_costs_and_classes): Decrease memory cost by equiv savings. diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index d9e700e8947..8c93ace5094 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1947,15 +1947,8 @@ find_costs_and_classes (FILE *dump_file) } if (i >= first_moveable_pseudo && i < last_moveable_pseudo) i_mem_cost = 0; - else if (equiv_savings < 0) - i_mem_cost = -equiv_savings; - else if (equiv_savings > 0) - { - i_mem_cost = 0; - for (k = cost_classes_ptr->num - 1; k >= 0; k--) - i_costs[k] += equiv_savings; - } - + else + i_mem_cost -= equiv_savings; best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1; best = ALL_REGS; alt_class = NO_REGS;