On Fri, Jan 03, 2014 at 05:04:55PM +0100, Toon Moene wrote:
> I am trying to figure out how the top-consuming routines in our
> weather models will be compiled when using AVX512 instructions (and
> their 32 512 bit registers).
>
> I thought an up-to-date trunk version of gcc, using the command line:
>
> <...>/gfortran -Ofast -S -mavx2 -mavx512f <source code>
>
> would do that.
>
> Unfortunately, I do not see any use of the new zmm.. registers,
> which might mean that AVX512 isn't used yet.
>
> This is how the nightly build job builds the trunk gfortran compiler:
>
> configure --prefix=/home/toon/compilers/install --with-gnu-as
> --with-gnu-ld --enable-languages=fortran<,other-language>
> --disable-multilib --disable-nls --with-arch=core-avx2
> --with-tune=core-avx2
>
> Is it the --with-arch=core-avx2 ? Or perhaps the --with-gnu-as
> --with-gnu-ld (because the installed ones do not support AVX512 yet
> ?).
You shouldn't need assembler with AVX512 support just for -S,
if I try say simple:
void f1 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024;
i++) e[i] = f[i] * 7; }
void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024;
i++) e[i] = f[i]; }
-O2 -ftree-vectorize -mavx512f I get:
vmovdqa64 .LC0(%rip), %zmm1
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L2:
vpmulld (%rsi,%rax), %zmm1, %zmm0
vmovdqu32 %zmm0, (%rdi,%rax)
addq $64, %rax
cmpq $4096, %rax
jne .L2
rep; ret
and
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L6:
vmovdqu64 (%rsi,%rax), %zmm0
vmovdqu32 %zmm0, (%rdi,%rax)
addq $64, %rax
cmpq $4096, %rax
jne .L6
rep; ret
You can look at -fdump-tree-vect-details if something hasn't been vectorized
why it hasn't been vectorized.
Jakub