https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451
Bug ID: 107451 Summary: Segmentation fault with vectorized code. Product: gcc Version: 11.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bartoldeman at users dot sourceforge.net Target Milestone: --- Created attachment 53785 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785&action=edit Test case The following code: double dot(int n, const double *x, int inc_x, const double *y) { int i, ix; double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ; ix=0; for(i = 0; i < n; i++) { dot[0] += x[ix] * y[ix] ; dot[1] += x[ix+1] * y[ix+1] ; dot[2] += x[ix] * y[ix+1] ; dot[3] += x[ix+1] * y[ix] ; ix += inc_x ; } return dot[0] + dot[1] + dot[2] + dot[3]; } int main(void) { double x = 0, y = 0; return dot(1, &x, 4096*4096, &y); } crashes with (on Linux x86-64) $ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash $ ./a.out Segmentation fault for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021), and also when patched with the patches from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212. The loop code assembly is as follows: 18: c5 f9 10 1e vmovupd (%rsi),%xmm3 1c: c5 f9 10 21 vmovupd (%rcx),%xmm4 20: ff c2 inc %edx 22: c4 e3 65 18 0c 06 01 vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1 29: c4 e3 5d 18 04 01 01 vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0 30: 48 01 c6 add %rax,%rsi 33: 48 01 c1 add %rax,%rcx 36: c4 e3 fd 01 c9 11 vpermpd $0x11,%ymm1,%ymm1 3c: c4 e3 fd 01 c0 14 vpermpd $0x14,%ymm0,%ymm0 42: c4 e2 f5 b8 d0 vfmadd231pd %ymm0,%ymm1,%ymm2 47: 39 fa cmp %edi,%edx 49: 75 cd jne 18 <dot+0x18> what happens here is that the vinsertf128 instructions take the element from one loop iteration later, and those get put in the high halves of ymm0 and ymm1. The vpermpd instructions then throw away those high halves again, so e.g. they turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively. So the result is correct but the superfluous vinsertf128 instructions access memory potentially past the end of x or y and thus a produce a segfault. related issue (coming from OpenBLAS): https://github.com/easybuilders/easybuild-easyconfigs/issues/16387 may also be related: https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834 (the particular comment shows very similar code but it's for GCC 12 which vectorizes by default, OpenBLAS worked around this by disabling the tree vectorizer there but only on Mac OS and Windows).