[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

ubizjak at gmail dot com Mon, 17 Nov 2008 10:13:18 -0800


------- Comment #6 from ubizjak at gmail dot com  2008-11-17 18:11 -------
I think that


        addps   .LC10(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC11(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC12(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC13(%rip), %xmm0
        mulps   %xmm1, %xmm0
        addps   .LC14(%rip), %xmm0
        mulps   %xmm1, %xmm0

is the bottleneck. Perhaps we should split impilicit memory operands out of the
insn by some generic peephole (if the register is available) and schedule loads
appropriately.

OTOH, loop optimizer should detect invariant loads and move them out of the
loop.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38134

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

Reply via email to