------- Comment #6 from ubizjak at gmail dot com  2008-11-17 18:11 -------
I think that

        addps   .LC10(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC11(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC12(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC13(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC14(%rip), %xmm0
        mulps   %xmm1, %xmm0

is the bottleneck. Perhaps we should split impilicit memory operands out of the
insn by some generic peephole (if the register is available) and schedule loads
appropriately.

OTOH, loop optimizer should detect invariant loads and move them out of the
loop.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38134

Reply via email to