https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115097
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|unknown |15.0 Component|c |tree-optimization CC| |jamborm at gcc dot gnu.org Last reconfirmed| |2024-05-15 Status|UNCONFIRMED |NEW Target| |x86_64-*-* Keywords| |missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. The IL difference is struct A test1 (struct A & a) { struct A D.2842; <bb 2> [local count: 1073741824]: D.2842 = MEM[(const struct A &)a_2(D)]; return D.2842; vs struct A test2 (struct A & a) { struct A D.2873; struct A retval.4; <bb 2> [local count: 1073741824]: D.2873 = MEM[(const struct A &)a_2(D)]; retval.4 = D.2873; return retval.4; so there's an additional aggregate copy. With -O2 SRA scalarizes that copy and we're not able to elide the resulting code on RTL while without the SRA we can handle this fine. SRA makes test2 into struct A test2 (struct A & a) { short int SR.12; int SR.11; struct A retval.4; <bb 2> [local count: 1073741824]: SR.11_3 = MEM[(const struct A &)a_2(D)].a; SR.12_6 = MEM[(const struct A &)a_2(D)].b; retval.4.a = SR.11_3; retval.4.b = SR.12_6; return retval.4; The extra copy is introduced during gimplfication, the GENERIC looks the same (but of course there's a hidden difference): ;; Function A test1(A&) (null) ;; enabled by -tree-original <<cleanup_point return <retval> = TARGET_EXPR <D.2823, *(const struct A &) a>>>; ;; Function A test2(A&&) (null) ;; enabled by -tree-original <<cleanup_point return <retval> = TARGET_EXPR <D.2833, *(const struct A &) a>>>;