http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634



             Bug #: 55634

           Summary: ARM: gcc vector extensions: storing vector to

                    unaligned memory location does not use VST1.8 NEON

                    instruction

    Classification: Unclassified

           Product: gcc

           Version: 4.7.2

            Status: UNCONFIRMED

          Severity: enhancement

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: siarhei.siamas...@gmail.com





The following test program tries to use GCC vector extensions to add two

vectors together and store the result to unaligned memory location in a

"portable" way with memcpy:



/***********************************************/



#include <string.h>



typedef unsigned int T __attribute__ ((vector_size (16)));



void foo (void *result, T *a, T *b)

{

  T tmp = *a + *b;

  memcpy (result, &tmp, sizeof(tmp));

}



/***********************************************/



Compiling with gcc 4.7.2:



$ arm-none-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfpu=neon -c test.c

$ objdump -d test.o



00000000 <foo>:

   0:    e52d4004     push    {r4}        ; (str r4, [sp, #-4]!)

   4:    ecd12b04     vldmia    r1, {d18-d19}

   8:    e24dd014     sub    sp, sp, #20

   c:    ecd20b04     vldmia    r2, {d16-d17}

  10:    e28dc010     add    ip, sp, #16

  14:    f26208e0     vadd.i32    q8, q9, q8

  18:    ed6c0b04     vstmdb    ip!, {d16-d17}

  1c:    e1a0c00d     mov    ip, sp

  20:    e1a04000     mov    r4, r0

  24:    e8bc000f     ldm    ip!, {r0, r1, r2, r3}

  28:    e5840000     str    r0, [r4]

  2c:    e5841004     str    r1, [r4, #4]

  30:    e5842008     str    r2, [r4, #8]

  34:    e584300c     str    r3, [r4, #12]

  38:    e28dd014     add    sp, sp, #20

  3c:    e8bd0010     pop    {r4}

  40:    e12fff1e     bx    lr



The same test program results in the following code if compiled for x86-64:



0000000000000000 <foo>:

   0:    66 0f 6f 06              movdqa (%rsi),%xmm0

   4:    66 0f fe 02              paddd  (%rdx),%xmm0

   8:    f3 0f 7f 07              movdqu %xmm0,(%rdi)

   c:    c3                       retq   



So x86-64 target is able to use MOVDQU instruction. Hence ARM target should be

able to use VST1.8 as well.

Reply via email to