https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109519
Bug ID: 109519 Summary: aarch64: wrong code with NEON intrinsics on gcc-10 and later Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- Steps to reproduce: $ git clone https://github.com/sebpop/bitshuffle.git -b gcc-10-bug $ cd bitshuffle/reproduce $ make $ ./a.out The expected output is produced by gcc-7, gcc-9, and clang-15. 16384 4 14 16 33 39 45 51 57 67 102 108 120 126 128 134 138 140 [...] gcc-9 is the last version of gcc I tested that works. gcc-10 produces the following output: ./a.out 16384 0 0 0 0 39 45 51 57 gcc-11 and gcc-trunk produce the following output: ./a.out 16384 0 0 0 0 0 0 0 The output is also correct when removing the before-last patch from the git repo https://github.com/kiyo-masui/bitshuffle/pull/140 This patch exposes the bug in gcc by using NEON intrinsics instead of scalar computations to translate move_mask instructions from SSE2 to NEON.