http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179

--- Comment #7 from Uros Bizjak <ubizjak at gmail dot com> 2011-11-22 22:00:36 
UTC ---
(In reply to comment #3)
> Your testcase doesn't ressemble the original, the inner for cycles need
> clearing of the iteration variable.

Ah, indeed... fingers were too fast.

One additional data point with -O2 -ftree-vectorize -mfma4 -mavx with all
loops:

        movslq  %r8d, %rax
        movl    $C+32, %edx
        xorl    %esi, %esi
        leaq    B(,%rax,8), %rcx
        movl    $C, %eax
.L3:
>>      vmovsd  80(%rcx), %xmm1
        addl    $2, %esi
        vmovapd A(%rdi), %ymm0
>>      vmovddup        %xmm1, %xmm1
        vbroadcastsd    (%rcx), %ymm2
        addq    $160, %rcx
>>      vinsertf128     $1, %xmm1, %ymm1, %ymm1
        vfmaddpd        (%rax), %ymm2, %ymm0, %ymm2
        vmovapd %ymm2, (%rax)
        addq    $64, %rax
        vfmaddpd        (%rdx), %ymm1, %ymm0, %ymm0
        vmovapd %ymm0, (%rdx)
        addq    $64, %rdx
        cmpl    $10, %esi
        jne     .L3

This could be just "vbroadcastsd 80(%rcx), %ymm1". For some reason combine pass
does not form it.

Reply via email to