https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aagarwa at gcc dot gnu.org, | |amonakov at gcc dot gnu.org --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Pontakorn Prasertsuk from comment #12) > I notice that GCC also does not optimize this case: > https://godbolt.org/z/7oGqjqqz4 Yes. To quote: #include <array> #include <cstdint> #include <cstdlib> #include <iostream> struct MyClass { std::array<uint64_t, 6> arr; }; MyClass globalA; // Prevent optimization void sink(MyClass *m) { std::cout << m->arr[0] << std::endl; } void __attribute__((noinline)) gg(MyClass &a) { MyClass c = a; MyClass *b = (MyClass *)malloc(sizeof(MyClass)); *b = c; sink(b); } and we do RTL expansion from <bb 2> [local count: 1073741824]: vect_c_arr__M_elems_0_6.31_25 = MEM <vector(2) long unsigned int> [(long unsigned int *)a_2(D)]; vect_c_arr__M_elems_0_6.32_27 = MEM <vector(2) long unsigned int> [(long unsigned int *)a_2(D) + 16B]; vect_c_arr__M_elems_0_6.33_29 = MEM <vector(2) long unsigned int> [(long unsigned int *)a_2(D) + 32B]; b_4 = malloc (48); MEM <vector(2) long unsigned int> [(long unsigned int *)b_4] = vect_c_arr__M_elems_0_6.31_25; MEM <vector(2) long unsigned int> [(long unsigned int *)b_4 + 16B] = vect_c_arr__M_elems_0_6.32_27; MEM <vector(2) long unsigned int> [(long unsigned int *)b_4 + 32B] = vect_c_arr__M_elems_0_6.33_29; sink (b_4); [tail call] note that the temporary was elided but we specifically avoid TER (some magic scheduling of stmts in a basic-block) to cross function calls and there's no optimization phase that would try to optimize register pressure over calls. In this case we want to sink the loads across the call, in other cases we want to avoid doing so. In the end this would be a job for a late running pass that factors in things like register pressure and the set of call clobbered register. I'll note that -fschedule-insns doesn't seem to have any effect here, but I also remember that scheduling around calls was recently fiddled with, specifically in r13-5154-g733a1b777f16cd which restricts motion even with -fsched-pressure (not sure how that honors call clobbered regs). In the above case the GPR for a_2(D) would be needed after the call (but there are not call clobbered GPRs) but the three data vectors in xmm would no longer be live across the call (and all vector registers are call clobbered on x86). Of course I'm not sure at all whether RTL scheduling can disambiguate against a 'malloc' call.