Toon Moene wrote:

I wrote:

OK, so it is an alignment issue (with -mtune=barcelona):

.L6:
        movups  0(%rbp,%rax), %xmm0
        movups  (%rbx,%rax), %xmm1
        incl    %ecx
        addps   %xmm1, %xmm0
        movaps  %xmm0, (%r8,%rax)
        addq    $16, %rax
        cmpl    %r10d, %ecx
        jb      .L6

Once this problem is solved (well, determined how it could be solved), we go on to the next, the extraneous induction variable %ecx.

There are two ways to deal with it:

1. Eliminate it with respect to the other induction variable that
   counts in the same direction (upwards, with steps 16) and remember
   that induction variable's (%rax) limit.

Just for completeness - gcc *does* know how to do this; it just doesn't work when vectorizing.

This is what I get when compiling with -O2 -S:

.L3:
        movss   (%rdi,%rax), %xmm0
        addss   (%rsi,%rax), %xmm0
        movss   %xmm0, (%rdx,%rax)
        addq    $4, %rax
        cmpq    %rcx, %rax
        jne     .L3

Note how %rax remains as sole induction variable.

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

Reply via email to