http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51357
Bug #: 51357 Summary: Simple program crash when enabling AVX Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: cdub...@gmail.com Created attachment 25953 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25953 Preprocessed version This simple program (I attached the preprocessed version): #include <stdlib.h> #include <stdio.h> void conv(const float* x, const int m, const float* y, const int n, float* z) { int i, j; float sum; for (i = 0; i < m - n + 1; ++i) { sum = 0.0f; for (j = 0; j < n - 3; j += 4) { sum += x[i + j + 0] * y[j + 0]; sum += x[i + j + 1] * y[j + 1]; sum += x[i + j + 2] * y[j + 2]; sum += x[i + j + 3] * y[j + 3]; } z[i] = sum; } } int main() { const int m = 128000; const int n = 64; float* x = (float*)malloc(m * sizeof(float)); float* y = (float*)malloc(n * sizeof(float)); float* z = (float*)malloc((m - n + 1) * sizeof(float)); conv(x, m, y, n, z); printf("%f\n", z[0]); free(x); free(y); free(z); } Crash if I compile it with gcc 4.6.2 (and gcc 4.6.1) optimizations turned on (-O2) and if I enable AVX (-march=corei7-avx). It works without optimizations (or with only -Os) or without AVX (-march=corei7). I use the latest Apple clang assembler (included with XCode 4.2.1). Here is the output of gcc -v: MacBook-Pro-2:main charles$ gcc-4.6 -v -save-temps -march=corei7-avx -O2 main.cpp Using built-in specs. COLLECT_GCC=gcc-4.6 COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/lto-wrapper Target: x86_64-apple-darwin11.2.0 Configured with: ../configure --enable-languages=c,c++,fortran,java,objc,obj-c++ --prefix=/usr/local/Cellar/gcc/4.6.2/gcc --datarootdir=/usr/local/Cellar/gcc/4.6.2/share --bindir=/usr/local/Cellar/gcc/4.6.2/bin --program-suffix=-4.6 --with-gmp=/usr/local/Cellar/gmp/5.0.2 --with-mpfr=/usr/local/Cellar/mpfr/3.1.0 --with-mpc=/usr/local/Cellar/libmpc/0.9 --with-system-zlib --enable-stage1-checking --enable-plugin --enable-lto --disable-nls --disable-fully-dynamic-string Thread model: posix gcc version 4.6.2 (GCC) COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.7.2' '-v' '-save-temps' '-march=corei7-avx' '-O2' /usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/cc1plus -E -quiet -v -D__DYNAMIC__ main.cpp -fPIC -mmacosx-version-min=10.7.2 -march=corei7-avx -O2 -fpch-preprocess -o main.ii ignoring nonexistent directory "/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../../../x86_64-apple-darwin11.2.0/include" #include "..." search starts here: #include <...> search starts here: /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../../../include/c++/4.6.2 /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../../../include/c++/4.6.2/x86_64-apple-darwin11.2.0 /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../../../include/c++/4.6.2/backward /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/include /usr/local/include /usr/local/Cellar/gcc/4.6.2/gcc/include /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/include-fixed /usr/include /System/Library/Frameworks /Library/Frameworks End of search list. COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.7.2' '-v' '-save-temps' '-march=corei7-avx' '-O2' /usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/cc1plus -fpreprocessed main.ii -fPIC -quiet -dumpbase main.cpp -mmacosx-version-min=10.7.2 -march=corei7-avx -auxbase main -O2 -version -o main.s GNU C++ (GCC) version 4.6.2 (x86_64-apple-darwin11.2.0) compiled by GNU C version 4.6.2, GMP version 5.0.2, MPFR version 3.1.0, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C++ (GCC) version 4.6.2 (x86_64-apple-darwin11.2.0) compiled by GNU C version 4.6.2, GMP version 5.0.2, MPFR version 3.1.0, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: b1d97ab7cd7bb3a1ba06b8ac1d280faa COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.7.2' '-v' '-save-temps' '-march=corei7-avx' '-O2' as -arch x86_64 -force_cpusubtype_ALL -o main.o main.s COMPILER_PATH=/usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/:/usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/:/usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/:/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/:/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/ LIBRARY_PATH=/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/:/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../../:/usr/lib/ COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.7.2' '-v' '-save-temps' '-march=corei7-avx' '-O2' /usr/local/Cellar/gcc/4.6.2/gcc/libexec/gcc/x86_64-apple-darwin11.2.0/4.6.2/collect2 -dynamic -arch x86_64 -macosx_version_min 10.7.2 -weak_reference_mismatches non-weak -o a.out -lcrt1.10.5.o -L/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2 -L/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../.. main.o -no_compact_unwind -lSystem -lgcc_ext.10.5 -lgcc -lSystem -v collect2 version 4.6.2 (x86_64 Darwin) /usr/bin/ld -dynamic -arch x86_64 -macosx_version_min 10.7.2 -weak_reference_mismatches non-weak -o a.out -lcrt1.10.5.o -L/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2 -L/usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2/../../.. main.o -no_compact_unwind -lSystem -lgcc_ext.10.5 -lgcc -lSystem -v @(#)PROGRAM:ld PROJECT:ld64-127.2 Library search paths: /usr/local/Cellar/gcc/4.6.2/gcc/lib/gcc/x86_64-apple-darwin11.2.0/4.6.2 /usr/local/Cellar/gcc/4.6.2/gcc/lib /usr/lib /usr/local/lib Framework search paths: /Library/Frameworks/ /System/Library/Frameworks/ It works if I do not unroll the inner loop, I guess because it is not using AVX in that case. Versions of the program using SSE or AVX intrinsics crash in the same conditions.