[cfe-users] Clang vectorization

Michael Hamburg Tue, 17 Mar 2015 16:59:49 -0700

(resent because first one is held?)

Hello cfe-users,


I’m trying to get clang (or GCC for that matter) to vectorize a very simple 
loop, and I’m wondering what I’m doing wrong.  I’d rather write the loop as a 
loop instead of using intrinsics or the clang vector extensions, because I want 
the code to be portable.  Pragmas and magic attributes are also undesirable, 
but they’re better than intrinsics.

This file is representative of what I’m trying to do.  I’m compiling with -O3 
-std=c99 -mavx2, but the same issues should apply for other vector settings.
“””
#include <stdint.h>

typedef struct this_should_totally_be_a_vector {
   uint64_t limb[8];
} __attribute__((aligned(32))) a_vector;

void add(a_vector *a, const a_vector *b) {
   for (int i=0; i<8; i++) a->limb[i] += b->limb[i];
}

void mac(a_vector *a, const a_vector *b) {
   const a_vector c = {{0,1,2,3,4,5,6,7}};
   for (int i=0; i<8; i++) a->limb[i] += b->limb[i] + 3*c.limb[i];
}
“””

Can someone suggest flags, pragmas, attributes etc which would cause these 
functions to produce good code?  I’m seeing lots of problems.  I’m testing for 
now on clang-3.6 release.

For starters, the compiler is unable to determine that there is no loop 
dependency, and therefore unrolls the loop instead of vectorizing.  When passed 
#pragma clang loop unroll(disable) vectorize(enable), it is still not able to 
determine that there is no dependency, and so branches to a scalar version if a 
is close to b.  Furthermore, it ignores the alignment hint and uses vmovdqu for 
everything, though maybe that doesn’t actually cost any performance.  In fact, 
there cannot be a loop dependency both because of the alignment and because the 
arrays are in structs.

Clang produces the correct code if a is declared __restrict__, but in the real 
code it is possible that a=b so I’d rather not say __restrict__ if I don’t have 
to (especially since the code may be inlined, possibly causing alias analysis 
to break).  GCC has #pragma GCC ivdep, which causes it to vectorize properly, 
but does Clang have any equivalent to #pragma ivdep?  Also, __restrict__ still 
doesn’t give me vmovdqa.

For mac, with __restrict__ (again undesirable) I get decent 2-way vectorized 
sse3 code, which isn’t bad I guess, but I’d rather the compiler automatically 
produced 4-way avx2 code.  If I add #pragma clang loop unroll(disable) 
vectorize(enable), I get
“”"
        vmovdqa mac.c(%rip), %ymm0
        vpbroadcastq    .LCPI2_0(%rip), %ymm1
        vpmuludq        %ymm1, %ymm0, %ymm2
        vpxor   %ymm3, %ymm3, %ymm3
        vpmuludq        %ymm3, %ymm0, %ymm4
        vpsllq  $32, %ymm4, %ymm4
        vpaddq  %ymm4, %ymm2, %ymm2
        vpsrlq  $32, %ymm0, %ymm0
        vpmuludq        %ymm1, %ymm0, %ymm0
        vpsllq  $32, %ymm0, %ymm0
        vpaddq  %ymm0, %ymm2, %ymm0
        vpaddq  (%rsi), %ymm0, %ymm0
        vpaddq  (%rdi), %ymm0, %ymm0
        vmovdqu %ymm0, (%rdi)
        vmovdqa mac.c+32(%rip), %ymm0
        vpmuludq        %ymm1, %ymm0, %ymm2
        vpmuludq        %ymm3, %ymm0, %ymm3
        vpsllq  $32, %ymm3, %ymm3
        vpaddq  %ymm3, %ymm2, %ymm2
        vpsrlq  $32, %ymm0, %ymm0
        vpmuludq        %ymm1, %ymm0, %ymm0
        vpsllq  $32, %ymm0, %ymm0
        vpaddq  %ymm0, %ymm2, %ymm0
        vpaddq  32(%rsi), %ymm0, %ymm0
        vpaddq  32(%rdi), %ymm0, %ymm0
        vmovdqu %ymm0, 32(%rdi)
        vzeroupper
        retq
“”"
In other words, clang has failed to propagate constants, and is trying to do 
64-bit multiplies (lowered to vpsllq and vpmuludq) at runtime.

Can anyone help me get decent, portable code out of this?  GCC performs well on 
add with #pragma GCC ivdep, but it also does silly things with mul.

Is there a way to do this which doesn’t depend on intrinsics or extensions?  If 
I absolutely have to write this with intrinsics or extensions, is there a nice 
way to do it which doesn’t change the struct definition and doesn’t break 
strict aliasing?

Thanks a lot,
— Mike
_______________________________________________
cfe-users mailing list
cfe-users@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users

[cfe-users] Clang vectorization

Reply via email to