https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66917

            Bug ID: 66917
           Summary: ARM: NEON: memcpy compiles to vld1 and vst1 with
                    incorrect alignment
           Product: gcc
           Version: 4.9.3
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mhw at netris dot org
  Target Milestone: ---

Created attachment 36009
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36009&action=edit
Minimal example code that is miscompiled

The attached C source code is mis-compiled by GCC 4.9.3 with -O3 and
-mfpu=neon.  Calls to memcpy between a uint8_t* parameter and a local variable
are compiled into vld1.64 and vst1.64 instructions with an alignment field that
specifies that the memory address (the uint8_t* parameter) is aligned on a
64-bit boundary, although there is no basis for assuming such an alignment.

Here's the C code:

  #include <stdint.h>
  #include <string.h>

  void
  test_neon_load_store_alignment (const uint8_t *ap,
                                  const uint8_t *bp,
                                  uint8_t *outp)
  {
    union {
      uint64_t u[2];
      uint8_t c[16];
    } a, b;

    memcpy (a.c, ap, 16);
    memcpy (b.c, bp, 16);
    a.u[0] ^= b.u[0];
    a.u[1] ^= b.u[1];
    memcpy (outp, a.c, 16);
  }

When natively compiled with -S -O3 using GCC 4.9.3 configured with
"--build=arm-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-mode=thumb --with-fpu=neon", here's the output:

        .syntax unified
        .arch armv7-a
        .eabi_attribute 27, 3
        .eabi_attribute 28, 1
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .thumb
        .file   "foo.c"
        .text
        .align  2
        .global test_neon_load_store_alignment
        .thumb
        .thumb_func
        .type   test_neon_load_store_alignment, %function
test_neon_load_store_alignment:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vld1.64 {d16-d17}, [r0:64]
        vld1.64 {d18-d19}, [r1:64]
        veor    q8, q8, q9
        vst1.64 {d16-d17}, [r2:64]
        bx      lr
        .size   test_neon_load_store_alignment,
.-test_neon_load_store_alignment
        .ident  "GCC: (GNU) 4.9.3"
        .section        .note.GNU-stack,"",%progbits

Here, r0 contains ap, r1 contains bp, and r2 contains outp.  The associated
operands of the vld1.64 and vst1.64 instructions are [r<n>:64], which specify
that the memory address is known to be aligned on a 64-bit boundary, although
that is not necessarily the case.

See section A8.6.307 (page A8-602) of the ARM v7 Architecture Reference Manual,
which spells out quite clearly that for this instruction:

  if (address MOD alignment) != 0 then GenerateAlignmentException();

and that in this case, alignment == 8.

This may be related to bug #57271.

If needed, I can provide instructions for how to reproduce the exact same GCC
binary I'm using, and all other software that it depends on, using GNU Guix
<http://gnu.org/s/guix>.

FYI, the attached minimal test case was distilled from code in OpenSSL 1.0.2d
that is miscompiled by GCC 4.9.3 in our armhf port targetting NEON, and leads
to a Bus Error on the Novena laptop.

Reply via email to