https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-05-09
             Status|UNCONFIRMED                 |NEW
                 CC|                            |acoplan at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization, ra

--- Comment #1 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed.  There is a lot to unpack here.  Of course, the include isn't needed
in this testcase and the problem can be seen more clearly with a slightly
smaller array size:

typedef struct { int arr[16]; } S;

void g (S *);
void h (S);
void f(int x)
{
  S s;
  g (&s);
  h (s);
}

In this case sizeof(S) = 64 so we should be able to do the copy with 2 LDPs + 2
STPs.

So just for clarity, the missed ldp/stp started when we turned off the early
ldp/stp formation in memcpy expansion, i.e. with
r14-9373-g19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0 .

However, things already started to regress earlier for this testcase with
r14-4944-gf55cdce3f8dd8503e080e35be59c5f5390f6d95e i.e.

commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e
Author: Vladimir N. Makarov <vmaka...@redhat.com>
Date:   Thu Oct 26 14:50:40 2023

    [RA]: Modfify cost calculation for dealing with equivalences

before that RA change we get:

f:
        stp     x29, x30, [sp, -144]!
        mov     x29, sp
        add     x0, sp, 80
        bl      g
        ldp     q29, q28, [sp, 80]
        add     x0, sp, 16
        ldp     q31, q30, [sp, 112]
        stp     q29, q28, [sp, 16]
        stp     q31, q30, [sp, 48]
        bl      h
        ldp     x29, x30, [sp], 144
        ret

and afterwards we get:

f:
        stp     x29, x30, [sp, -160]!
        mov     x29, sp
        str     x19, [sp, 16]
        add     x19, sp, 96
        mov     x0, x19
        bl      g
        add     x0, sp, 32
        ldp     q29, q28, [x19]
        ldp     q31, q30, [x19, 32]
        stp     q29, q28, [x0]
        stp     q31, q30, [x0, 32]
        bl      h
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 160
        ret

which is really not great as now we have a save/restore of x19 and the accesses
end up using different (non-sp) registers which I suspect doesn't help with the
ldp/stp formation (on trunk).

I will try to give a detailed analysis on what goes wrong with the ldp/stp
formation at the RTL level shortly (there are a lot of different issues), but I
think that RA change is a contributing factor.

Reply via email to