https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64655
Bug ID: 64655 Summary: Vectorizer is always using load aligned instructions with objects with the "aligned" attribute Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: adrien at guinet dot me I think there is an issue with the way gcc uses the __attribute__ information when generating SIMD load instructions during auto vectorization. For instance, using a structure like this one: struct A { uint32_t i[N] __attribute__((aligned(32))); uint32_t j[N] __attribute__((aligned(32))); uint32_t r[N] __attribute__((aligned(32))); }; with a loop like this one: struct A *a = (struct A*) malloc(sizeof(struct A)); for (size_t i = 0; i < N; i++) { a->r[i] = a->i[i] + a->j[i]; } then the vectorizer will use "load aligned" instructions. The issue is that, even if i, j and r are aligned *inside* the A structure, nothing tells gcc that the "a" pointer is actually correctly aligned. You can reproduce this using the attached "test_align.c" test case, and compiling it like this (using AVX2 for instance which needs 32-bytes alignment): $ gcc -O3 -march=core-avx2 test_align.c -std=c99 -o test_align Running it on a computer supporting AVX2 will segfault. If you don't have AVX2 on your machine, you can use the excellent Intel SDE (https://software.intel.com/en-us/articles/intel-software-development-emulator) which will use PIN to "emulate" AVX2. Running the sample under SDE will even tell us this: $ sde64 -- ./test_align SDE ERROR: TID: 0 executed instruction with an unaligned memory reference to address 0x602ab0 INSTR: 0x000400550: IFORM: VMOVDQA_YMMqq_MEMqq :: vmovdqa ymm0, ymmword ptr [rax+0x1aa0] IMAGE: /tmp/test_align FUNCTION: main Indeed, if we take a look at the assembly produced, we see: [.. init code with rand() calls .., then, without any guards checking aligned pointers] lea rdx, [r13+1A80h] mov rax, r13 loc_400550: vmovdqa ymm0, ymmword ptr [rax+1AA0h] <- aligned load vpaddd ymm0, ymm0, ymmword ptr [rax] <- aligned load add rax, 20h vmovdqa ymmword ptr [rax+3520h], ymm0 <- aligned store cmp rax, rdx jnz short loc_400550 If we remove the aligned attributes, ending with this structure: struct A { uint32_t i[N]; uint32_t j[N]; uint32_t r[N]; } then gcc generates guard to check for unaligned pointers, and everything runs fine! Note that clang uses unaligned loads even with the aligned attributes (and thus the binary does not segfault). The disassembly from the binary produced with clang 3.5 is this one: mov rax, 0FFFFFFFFFFFFF960h loc_400670: vmovdqu ymm0, ymmword ptr [r14+rax*4+3510h] vpaddd ymm0, ymm0, ymmword ptr [r14+rax*4+1A80h] vmovdqu ymmword ptr [r14+rax*4+4FA0h], ymm0 add rax, 8 jnz short loc_400670 Bug seen in GCC 4.8.3 and GCC 4.9.2. Thanks for any thoughts about this! Regards, Adrien.