------- Comment #1 from rguenth at gcc dot gnu dot org  2010-04-22 09:07 -------
Hm, frob1 looks like

_Z5frob1RK5foo_tRS_:
.LFB18:
        movss   (%rdi), %xmm3
        movss   4(%rdi), %xmm2
        movaps  %xmm3, %xmm4
        movaps  %xmm2, %xmm0
        mulss   %xmm3, %xmm4
        movss   8(%rdi), %xmm1
        mulss   %xmm2, %xmm0
        addss   %xmm4, %xmm0
        movaps  %xmm1, %xmm4
        mulss   %xmm1, %xmm4
        addss   %xmm4, %xmm0
        rsqrtss %xmm0, %xmm4
        mulss   %xmm4, %xmm0
        mulss   %xmm4, %xmm0
        mulss   .LC1(%rip), %xmm4
        addss   .LC0(%rip), %xmm0
        mulss   %xmm4, %xmm0
        mulss   %xmm0, %xmm3
        mulss   %xmm0, %xmm2
        mulss   %xmm1, %xmm0
        movss   %xmm3, (%rsi)
        movss   %xmm2, 4(%rsi)
        movss   %xmm0, 8(%rsi)
        ret

and frob2 like

_Z5frob2RK5bar_tRS_:
.LFB19:
        movss   (%rdi), %xmm3
        movss   4(%rdi), %xmm2
        movaps  %xmm3, %xmm4
        movaps  %xmm2, %xmm0
        mulss   %xmm3, %xmm4
        movss   8(%rdi), %xmm1
        mulss   %xmm2, %xmm0
        addss   %xmm4, %xmm0
        movaps  %xmm1, %xmm4
        mulss   %xmm1, %xmm4
        addss   %xmm4, %xmm0
        rsqrtss %xmm0, %xmm4
        mulss   %xmm4, %xmm0
        mulss   %xmm4, %xmm0
        mulss   .LC1(%rip), %xmm4
        addss   .LC0(%rip), %xmm0
        mulss   %xmm4, %xmm0
        mulss   %xmm0, %xmm3
        mulss   %xmm0, %xmm2
        mulss   %xmm1, %xmm0
        movss   %xmm3, -24(%rsp)
        movss   %xmm2, -20(%rsp)
        movq    -24(%rsp), %rax
        movss   %xmm0, -16(%rsp)
        movq    %rax, (%rsi)
        movl    -16(%rsp), %eax
        movl    %eax, 8(%rsi)
        ret

so it's an aggregate copy that is not scalarized in frob2:

  b_1(D)->x = D.2444_20;
  b_1(D)->y = D.2443_19;
  b_1(D)->z = D.2442_18;
  return;

vs.

  D.2464.m[0] = D.2473_20;
  D.2464.m[1] = D.2472_19;
  D.2464.m[2] = D.2471_18;
  *b_1(D) = D.2464;
  return;

all inlining happens during early inlining and frob1 and frob2 are reasonably
similar after early inlining.

But then we have early SRA which does

;; Function void frob1(const foo_t&, foo_t&) (_Z5frob1RK5foo_tRS_)

Candidate (2452): D.2452
Candidate (2434): v
Candidate (2435): D.2435
Will attempt to totally scalarize D.2435 (UID: 2435):
Will attempt to totally scalarize D.2452 (UID: 2452):
Marking v offset: 0, size: 32:  to be replaced.
Marking v offset: 32, size: 32:  to be replaced.
Marking v offset: 64, size: 32:  to be replaced.
...

;; Function void frob2(const bar_t&, bar_t&) (_Z5frob2RK5bar_tRS_)

Candidate (2481): D.2481
Candidate (2464): D.2464
Candidate (2463): v
Marking v offset: 0, size: 32:  to be replaced.
Marking v offset: 32, size: 32:  to be replaced.
Marking v offset: 64, size: 32:  to be replaced.
...
! Disqualifying D.2464 - No scalar replacements to be created.

so it doesn't consider the struct with the array for total scalarization
for some reason.  Martin?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu dot
                   |                            |org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-04-22 09:07:40
               date|                            |
            Summary|4.5.0 regression, array vs  |[4.5 Regression] array vs
                   |members, dead code removal  |members, total scalarization
                   |issues                      |issues
   Target Milestone|---                         |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846

Reply via email to