Toon Moene wrote:
I wrote:
OK, so it is an alignment issue (with -mtune=barcelona):
.L6:
movups 0(%rbp,%rax), %xmm0
movups (%rbx,%rax), %xmm1
incl %ecx
addps %xmm1, %xmm0
movaps %xmm0, (%r8,%rax)
addq $16, %rax
cmpl %r10d, %ecx
jb .L6
Once this problem is solved (well, determined how it could be solved),
we go on to the next, the extraneous induction variable %ecx.
There are two ways to deal with it:
1. Eliminate it with respect to the other induction variable that
counts in the same direction (upwards, with steps 16) and remember
that induction variable's (%rax) limit.
Just for completeness - gcc *does* know how to do this; it just doesn't
work when vectorizing.
This is what I get when compiling with -O2 -S:
.L3:
movss (%rdi,%rax), %xmm0
addss (%rsi,%rax), %xmm0
movss %xmm0, (%rdx,%rax)
addq $4, %rax
cmpq %rcx, %rax
jne .L3
Note how %rax remains as sole induction variable.
--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html