https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114521
Bug ID: 114521 Summary: aarch64: wrong code with Neon ld1/st1x4 intrinsics gcc-11 and earlier Product: gcc Version: 11.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jswinney at amazon dot com Target Milestone: --- Created attachment 57831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57831&action=edit patch to fix the broken test Using a half-width 4-register load aarch64 Neon intrinsic results in incorrect stack spill, or at least incorrect offsetting into the resulting stack spill. This happens at any level of optimization, -O0..3. ``` #include <arm_neon.h> #include <inttypes.h> #include <stdio.h> uint8x8_t global[4] = {0}; void test(const uint8_t* arr) { const uint8x8x4_t parr = vld1_u8_x4(arr); global[0] = parr.val[0]; global[1] = parr.val[1]; global[2] = parr.val[2]; global[3] = parr.val[3]; } int main() { const uint8_t arr[32] = { 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, }; for (int i = 0; i < 4; i++) { printf("%llx ", (uint64_t) global[i]); } printf("\n"); test(arr); for (int i = 0; i < 4; i++) { printf("%llx ", (uint64_t) global[i]); } printf("\n"); return 0; } ``` >From the compiled "test" function above, the compiler emits the correct half-width load instruction followed by a full-width store: ``` test(unsigned char const*): ld1 {v0.8b - v3.8b}, [x0] sub sp, sp, #64 ... st1 {v0.16b - v3.16b}, [sp] ``` This issue is corrected by a change in gcc-12 in: 66f206b85395c273980e2b81a54dbddc4897e4a7 Additionally the test used to verify this code silently ignores the error. I have attached a patch which fixes the test. ``` $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-amazon-linux/11/lto-wrapper Target: aarch64-amazon-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://github.com/amazonlinux/amazon-linux-2022 --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.4.1-20230605/obj-aarch64-amazon-linux/isl-install --enable-gnu-indirect-function --with-tune=neoverse-n1 --with-arch=armv8.2-a+crypto --build=aarch64-amazon-linux Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.4.1 20230605 (Red Hat 11.4.1-2) (GCC) ```