https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66866
Bug ID: 66866 Summary: [miscompile] incorrect load address on manual vector shuffle Product: gcc Version: 5.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- The following testcase fails at -O2: #include <xmmintrin.h> typedef short A __attribute__((__may_alias__)); short extr(const __m128i &d, int index) { return reinterpret_cast<const A *>(&d)[index]; } A &extr(__m128i &d, int index) { return reinterpret_cast<A *>(&d)[index]; } __m128i shuf(const __m128i v) { __m128i r; for (int i = 0; i + 4 <= 8; i += 4) { extr(r, i + 0) = extr(v, i + 1); extr(r, i + 1) = extr(v, i + 0); extr(r, i + 2) = extr(v, i + 3); extr(r, i + 3) = extr(v, i + 2); } return r; } int main() { __attribute__((aligned(16))) short mem[8]; *reinterpret_cast<__m128i *>(mem) = shuf(_mm_setr_epi16(0, 1, 2, 3, 4, 5, 6, 7)); if (mem[0] == 1 && mem[1] == 0 && mem[2] == 3 && mem[3] == 2 && mem[4] == 5 && mem[5] == 0 && mem[6] == 7 && mem[7] == 6) { abort(); } return 0; } Here's a little survey: for CXX in /opt/*/bin/{g++,clang++}; do echo -n "$CXX: "; $CXX -O2 testcase.cpp && ./a.out && echo passed || echo failed; done /opt/gcc-4.5.2/bin/g++: passed /opt/gcc-4.5.3/bin/g++: passed /opt/gcc-4.5.4/bin/g++: passed /opt/gcc-4.6.0/bin/g++: passed /opt/gcc-4.6.1/bin/g++: passed /opt/gcc-4.6.3/bin/g++: passed /opt/gcc-4.7.0/bin/g++: failed /opt/gcc-4.7.1/bin/g++: failed /opt/gcc-4.7.2/bin/g++: failed /opt/gcc-4.8.0/bin/g++: failed /opt/gcc-4.8.2/bin/g++: failed /opt/gcc-4.9.0/bin/g++: failed /opt/gcc-4.9.1/bin/g++: failed /opt/gcc-5.1.0/bin/g++: failed /opt/gcc-6-snapshot/bin/g++: failed /opt/clang-3.2/bin/clang++: passed /opt/clang-3.3/bin/clang++: passed /opt/clang-3.4/bin/clang++: passed /opt/clang-3.5/bin/clang++: passed /opt/clang-3.6/bin/clang++: passed /opt/clang-master/bin/clang++: passed The value at index 5 is assigned incorrectly from v[0] instead of v[4]. The issue goes away if I manually unroll the loop.