https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47582

--- Comment #2 from Tony Poppleton <tony.poppleton at gmail dot com> ---
Re-testing this with GCC 5.1, the code appears to be even less efficient, for
both cases;

DM1:
.LFB0:
        .cfi_startproc
        movss   b(%rip), %xmm0
        xorl    %eax, %eax
        movss   %xmm0, a(%rip)
        movss   b+4(%rip), %xmm0
        movss   %xmm0, a+4(%rip)
        movss   b+8(%rip), %xmm0
        movss   %xmm0, a+8(%rip)
        movss   b+12(%rip), %xmm0
        movss   %xmm0, a+12(%rip)
        movss   b+16(%rip), %xmm0
        movss   %xmm0, a+16(%rip)
        ret
        .cfi_endproc

.LFB0:
        .cfi_startproc
        movq    b(%rip), %rax
        movq    %rax, a(%rip)
        movq    b+8(%rip), %rax
        movq    %rax, a+8(%rip)
        movl    b+16(%rip), %eax
        movl    %eax, a+16(%rip)
        xorl    %eax, %eax
        ret
        .cfi_endproc

Why is the "xorl" appearing in both cases?  Should this be logged as a separate
bug.

Incidentally, compiling with -O1 produces the same code as -O2 on older GCCs
(as in the description comment above)

My total guess is it is due to a and b not having any initial values, and an
optimization that takes into account value ranges is getting confused?

Reply via email to