https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
Alex Coplan <acoplan at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2024-05-09 Status|UNCONFIRMED |NEW CC| |acoplan at gcc dot gnu.org, | |vmakarov at gcc dot gnu.org Ever confirmed|0 |1 Keywords| |missed-optimization, ra --- Comment #1 from Alex Coplan <acoplan at gcc dot gnu.org> --- Confirmed. There is a lot to unpack here. Of course, the include isn't needed in this testcase and the problem can be seen more clearly with a slightly smaller array size: typedef struct { int arr[16]; } S; void g (S *); void h (S); void f(int x) { S s; g (&s); h (s); } In this case sizeof(S) = 64 so we should be able to do the copy with 2 LDPs + 2 STPs. So just for clarity, the missed ldp/stp started when we turned off the early ldp/stp formation in memcpy expansion, i.e. with r14-9373-g19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0 . However, things already started to regress earlier for this testcase with r14-4944-gf55cdce3f8dd8503e080e35be59c5f5390f6d95e i.e. commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e Author: Vladimir N. Makarov <vmaka...@redhat.com> Date: Thu Oct 26 14:50:40 2023 [RA]: Modfify cost calculation for dealing with equivalences before that RA change we get: f: stp x29, x30, [sp, -144]! mov x29, sp add x0, sp, 80 bl g ldp q29, q28, [sp, 80] add x0, sp, 16 ldp q31, q30, [sp, 112] stp q29, q28, [sp, 16] stp q31, q30, [sp, 48] bl h ldp x29, x30, [sp], 144 ret and afterwards we get: f: stp x29, x30, [sp, -160]! mov x29, sp str x19, [sp, 16] add x19, sp, 96 mov x0, x19 bl g add x0, sp, 32 ldp q29, q28, [x19] ldp q31, q30, [x19, 32] stp q29, q28, [x0] stp q31, q30, [x0, 32] bl h ldr x19, [sp, 16] ldp x29, x30, [sp], 160 ret which is really not great as now we have a save/restore of x19 and the accesses end up using different (non-sp) registers which I suspect doesn't help with the ldp/stp formation (on trunk). I will try to give a detailed analysis on what goes wrong with the ldp/stp formation at the RTL level shortly (there are a lot of different issues), but I think that RA change is a contributing factor.