https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66917
Bug ID: 66917 Summary: ARM: NEON: memcpy compiles to vld1 and vst1 with incorrect alignment Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: major Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mhw at netris dot org Target Milestone: --- Created attachment 36009 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36009&action=edit Minimal example code that is miscompiled The attached C source code is mis-compiled by GCC 4.9.3 with -O3 and -mfpu=neon. Calls to memcpy between a uint8_t* parameter and a local variable are compiled into vld1.64 and vst1.64 instructions with an alignment field that specifies that the memory address (the uint8_t* parameter) is aligned on a 64-bit boundary, although there is no basis for assuming such an alignment. Here's the C code: #include <stdint.h> #include <string.h> void test_neon_load_store_alignment (const uint8_t *ap, const uint8_t *bp, uint8_t *outp) { union { uint64_t u[2]; uint8_t c[16]; } a, b; memcpy (a.c, ap, 16); memcpy (b.c, bp, 16); a.u[0] ^= b.u[0]; a.u[1] ^= b.u[1]; memcpy (outp, a.c, 16); } When natively compiled with -S -O3 using GCC 4.9.3 configured with "--build=arm-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard --with-mode=thumb --with-fpu=neon", here's the output: .syntax unified .arch armv7-a .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu neon .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .thumb .file "foo.c" .text .align 2 .global test_neon_load_store_alignment .thumb .thumb_func .type test_neon_load_store_alignment, %function test_neon_load_store_alignment: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vld1.64 {d16-d17}, [r0:64] vld1.64 {d18-d19}, [r1:64] veor q8, q8, q9 vst1.64 {d16-d17}, [r2:64] bx lr .size test_neon_load_store_alignment, .-test_neon_load_store_alignment .ident "GCC: (GNU) 4.9.3" .section .note.GNU-stack,"",%progbits Here, r0 contains ap, r1 contains bp, and r2 contains outp. The associated operands of the vld1.64 and vst1.64 instructions are [r<n>:64], which specify that the memory address is known to be aligned on a 64-bit boundary, although that is not necessarily the case. See section A8.6.307 (page A8-602) of the ARM v7 Architecture Reference Manual, which spells out quite clearly that for this instruction: if (address MOD alignment) != 0 then GenerateAlignmentException(); and that in this case, alignment == 8. This may be related to bug #57271. If needed, I can provide instructions for how to reproduce the exact same GCC binary I'm using, and all other software that it depends on, using GNU Guix <http://gnu.org/s/guix>. FYI, the attached minimal test case was distilled from code in OpenSSL 1.0.2d that is miscompiled by GCC 4.9.3 in our armhf port targetting NEON, and leads to a Bus Error on the Novena laptop.