http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499
--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-11 14:14:01 UTC --- > I think you are looking at the scalar epilogue. The number of iterations is > unknown, so we need an epilogue loop for the case that number of iterations is > not a multiple of 4. While investigating pr51597, I have found that vectorized loops in programs as simple as subroutine spmmult(x,b,ad) implicit none integer, parameter :: nxyz=1008315 real(8),dimension(nxyz):: x,b,ad b = ad*x end subroutine spmmult !========================================= has always an additional non-vectorized loop, i.e. a vectorized one L3: movsd (%r9,%rax), %xmm1 addq $1, %rcx movapd (%r10,%rax), %xmm0 movhpd 8(%r9,%rax), %xmm1 mulpd %xmm1, %xmm0 movlpd %xmm0, (%r8,%rax) movhpd %xmm0, 8(%r8,%rax) addq $16, %rax cmpq $504156, %rcx jbe L3 and a non-vectorized one L5: movsd -8(%rdi,%rax,8), %xmm0 mulsd -8(%rdx,%rax,8), %xmm0 movsd %xmm0, -8(%rsi,%rax,8) addq $1, %rax cmpq %rcx, %rax jne L5 even when the above loops are unrolled. How can the loop L5 be unrolled if it is only there for a "scalar epilogue"?