http://llvm.org/bugs/show_bug.cgi?id=17803
Sanjay <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |[email protected] Resolution|--- |FIXED --- Comment #3 from Sanjay <[email protected]> --- This looks ideal with clang 3.5 (r205798): bin$ ./clang -v clang version 3.5.0 (trunk 205798) (llvm/trunk 205792) Target: x86_64-apple-darwin13.1.0 Thread model: posix bin$ ./clang xor.c -O3 -fomit-frame-pointer -o - -S -march=core-avx2 ... ## BB#0: ## %entry vmovaps LCPI0_0(%rip), %ymm0 vxorps (%rdi), %ymm0, %ymm1 vxorps 32(%rdi), %ymm0, %ymm2 vxorps 64(%rdi), %ymm0, %ymm3 vxorps 96(%rdi), %ymm0, %ymm0 vmovups %ymm1, (%rdi) vmovups %ymm2, 32(%rdi) vmovups %ymm3, 64(%rdi) vmovups %ymm0, 96(%rdi) vzeroupper retq With a plain 'avx' target rather than 'avx2', we go through all kinds of gymnastics to avoid unaligned 32-bit load or store, but the leftover loop problem is still fixed: $ ./clang xor.c -O3 -fomit-frame-pointer -o - -S -march=corei7-avx ... ## BB#0: ## %entry vmovups (%rdi), %xmm0 vmovups 32(%rdi), %xmm1 vmovups 64(%rdi), %xmm2 vmovups 96(%rdi), %xmm3 vinsertf128 $1, 16(%rdi), %ymm0, %ymm0 vinsertf128 $1, 48(%rdi), %ymm1, %ymm1 vinsertf128 $1, 80(%rdi), %ymm2, %ymm2 vinsertf128 $1, 112(%rdi), %ymm3, %ymm3 vmovaps LCPI0_0(%rip), %ymm4 vxorps %ymm4, %ymm0, %ymm0 vxorps %ymm4, %ymm1, %ymm1 vxorps %ymm4, %ymm2, %ymm2 vxorps %ymm4, %ymm3, %ymm3 vextractf128 $1, %ymm0, %xmm4 vmovups %xmm4, 16(%rdi) vmovups %xmm0, (%rdi) vextractf128 $1, %ymm1, %xmm0 vmovups %xmm0, 48(%rdi) vmovups %xmm1, 32(%rdi) vextractf128 $1, %ymm2, %xmm0 vmovups %xmm0, 80(%rdi) vmovups %xmm2, 64(%rdi) vextractf128 $1, %ymm3, %xmm0 vmovups %xmm0, 112(%rdi) vmovups %xmm3, 96(%rdi) vzeroupper retq -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ LLVMbugs mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs
