------- Comment #1 from rguenth at gcc dot gnu dot org 2010-04-22 09:07 ------- Hm, frob1 looks like
_Z5frob1RK5foo_tRS_: .LFB18: movss (%rdi), %xmm3 movss 4(%rdi), %xmm2 movaps %xmm3, %xmm4 movaps %xmm2, %xmm0 mulss %xmm3, %xmm4 movss 8(%rdi), %xmm1 mulss %xmm2, %xmm0 addss %xmm4, %xmm0 movaps %xmm1, %xmm4 mulss %xmm1, %xmm4 addss %xmm4, %xmm0 rsqrtss %xmm0, %xmm4 mulss %xmm4, %xmm0 mulss %xmm4, %xmm0 mulss .LC1(%rip), %xmm4 addss .LC0(%rip), %xmm0 mulss %xmm4, %xmm0 mulss %xmm0, %xmm3 mulss %xmm0, %xmm2 mulss %xmm1, %xmm0 movss %xmm3, (%rsi) movss %xmm2, 4(%rsi) movss %xmm0, 8(%rsi) ret and frob2 like _Z5frob2RK5bar_tRS_: .LFB19: movss (%rdi), %xmm3 movss 4(%rdi), %xmm2 movaps %xmm3, %xmm4 movaps %xmm2, %xmm0 mulss %xmm3, %xmm4 movss 8(%rdi), %xmm1 mulss %xmm2, %xmm0 addss %xmm4, %xmm0 movaps %xmm1, %xmm4 mulss %xmm1, %xmm4 addss %xmm4, %xmm0 rsqrtss %xmm0, %xmm4 mulss %xmm4, %xmm0 mulss %xmm4, %xmm0 mulss .LC1(%rip), %xmm4 addss .LC0(%rip), %xmm0 mulss %xmm4, %xmm0 mulss %xmm0, %xmm3 mulss %xmm0, %xmm2 mulss %xmm1, %xmm0 movss %xmm3, -24(%rsp) movss %xmm2, -20(%rsp) movq -24(%rsp), %rax movss %xmm0, -16(%rsp) movq %rax, (%rsi) movl -16(%rsp), %eax movl %eax, 8(%rsi) ret so it's an aggregate copy that is not scalarized in frob2: b_1(D)->x = D.2444_20; b_1(D)->y = D.2443_19; b_1(D)->z = D.2442_18; return; vs. D.2464.m[0] = D.2473_20; D.2464.m[1] = D.2472_19; D.2464.m[2] = D.2471_18; *b_1(D) = D.2464; return; all inlining happens during early inlining and frob1 and frob2 are reasonably similar after early inlining. But then we have early SRA which does ;; Function void frob1(const foo_t&, foo_t&) (_Z5frob1RK5foo_tRS_) Candidate (2452): D.2452 Candidate (2434): v Candidate (2435): D.2435 Will attempt to totally scalarize D.2435 (UID: 2435): Will attempt to totally scalarize D.2452 (UID: 2452): Marking v offset: 0, size: 32: to be replaced. Marking v offset: 32, size: 32: to be replaced. Marking v offset: 64, size: 32: to be replaced. ... ;; Function void frob2(const bar_t&, bar_t&) (_Z5frob2RK5bar_tRS_) Candidate (2481): D.2481 Candidate (2464): D.2464 Candidate (2463): v Marking v offset: 0, size: 32: to be replaced. Marking v offset: 32, size: 32: to be replaced. Marking v offset: 64, size: 32: to be replaced. ... ! Disqualifying D.2464 - No scalar replacements to be created. so it doesn't consider the struct with the array for total scalarization for some reason. Martin? -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu dot | |org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2010-04-22 09:07:40 date| | Summary|4.5.0 regression, array vs |[4.5 Regression] array vs |members, dead code removal |members, total scalarization |issues |issues Target Milestone|--- |4.5.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846