branch should be removed for vectorized and unrolled code

bugzilla-daemon Wed, 09 Apr 2014 15:26:18 -0700

http://llvm.org/bugs/show_bug.cgi?id=17803


Sanjay <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |[email protected]
         Resolution|---                         |FIXED

--- Comment #3 from Sanjay <[email protected]> ---
This looks ideal with clang 3.5 (r205798):

bin$ ./clang -v
clang version 3.5.0 (trunk 205798) (llvm/trunk 205792)
Target: x86_64-apple-darwin13.1.0
Thread model: posix
bin$ ./clang xor.c -O3 -fomit-frame-pointer -o - -S -march=core-avx2
...
## BB#0:                                ## %entry
    vmovaps    LCPI0_0(%rip), %ymm0
    vxorps    (%rdi), %ymm0, %ymm1
    vxorps    32(%rdi), %ymm0, %ymm2
    vxorps    64(%rdi), %ymm0, %ymm3
    vxorps    96(%rdi), %ymm0, %ymm0
    vmovups    %ymm1, (%rdi)
    vmovups    %ymm2, 32(%rdi)
    vmovups    %ymm3, 64(%rdi)
    vmovups    %ymm0, 96(%rdi)
    vzeroupper
    retq


With a plain 'avx' target rather than 'avx2', we go through all kinds of
gymnastics to avoid unaligned 32-bit load or store, but the leftover loop
problem is still fixed:

$ ./clang xor.c -O3 -fomit-frame-pointer -o - -S -march=corei7-avx
...
## BB#0:                                ## %entry
    vmovups    (%rdi), %xmm0
    vmovups    32(%rdi), %xmm1
    vmovups    64(%rdi), %xmm2
    vmovups    96(%rdi), %xmm3
    vinsertf128    $1, 16(%rdi), %ymm0, %ymm0
    vinsertf128    $1, 48(%rdi), %ymm1, %ymm1
    vinsertf128    $1, 80(%rdi), %ymm2, %ymm2
    vinsertf128    $1, 112(%rdi), %ymm3, %ymm3
    vmovaps    LCPI0_0(%rip), %ymm4
    vxorps    %ymm4, %ymm0, %ymm0
    vxorps    %ymm4, %ymm1, %ymm1
    vxorps    %ymm4, %ymm2, %ymm2
    vxorps    %ymm4, %ymm3, %ymm3
    vextractf128    $1, %ymm0, %xmm4
    vmovups    %xmm4, 16(%rdi)
    vmovups    %xmm0, (%rdi)
    vextractf128    $1, %ymm1, %xmm0
    vmovups    %xmm0, 48(%rdi)
    vmovups    %xmm1, 32(%rdi)
    vextractf128    $1, %ymm2, %xmm0
    vmovups    %xmm0, 80(%rdi)
    vmovups    %xmm2, 64(%rdi)
    vextractf128    $1, %ymm3, %xmm0
    vmovups    %xmm0, 112(%rdi)
    vmovups    %xmm3, 96(%rdi)
    vzeroupper
    retq

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs

[LLVMbugs] [Bug 17803] loop increment/compare/branch should be removed for vectorized and unrolled code

Reply via email to