https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68128
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Can't reproduce, at least not on i7-5960X (thus OMP_NUM_THREADS=16). gcc -Ofast -fopenmp built cutcp is roughly the same performance in all of 4.6, 4.8, 5.1 and 6, the only thing that reliably helps (but only something like 3-4%) is defining __INTEL_COMPILER, as the benchmark uses different code for ICC and for other compilers, where other compilers use atomics that aren't used for ICC.