http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634
Bug #: 55634 Summary: ARM: gcc vector extensions: storing vector to unaligned memory location does not use VST1.8 NEON instruction Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: siarhei.siamas...@gmail.com The following test program tries to use GCC vector extensions to add two vectors together and store the result to unaligned memory location in a "portable" way with memcpy: /***********************************************/ #include <string.h> typedef unsigned int T __attribute__ ((vector_size (16))); void foo (void *result, T *a, T *b) { T tmp = *a + *b; memcpy (result, &tmp, sizeof(tmp)); } /***********************************************/ Compiling with gcc 4.7.2: $ arm-none-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfpu=neon -c test.c $ objdump -d test.o 00000000 <foo>: 0: e52d4004 push {r4} ; (str r4, [sp, #-4]!) 4: ecd12b04 vldmia r1, {d18-d19} 8: e24dd014 sub sp, sp, #20 c: ecd20b04 vldmia r2, {d16-d17} 10: e28dc010 add ip, sp, #16 14: f26208e0 vadd.i32 q8, q9, q8 18: ed6c0b04 vstmdb ip!, {d16-d17} 1c: e1a0c00d mov ip, sp 20: e1a04000 mov r4, r0 24: e8bc000f ldm ip!, {r0, r1, r2, r3} 28: e5840000 str r0, [r4] 2c: e5841004 str r1, [r4, #4] 30: e5842008 str r2, [r4, #8] 34: e584300c str r3, [r4, #12] 38: e28dd014 add sp, sp, #20 3c: e8bd0010 pop {r4} 40: e12fff1e bx lr The same test program results in the following code if compiled for x86-64: 0000000000000000 <foo>: 0: 66 0f 6f 06 movdqa (%rsi),%xmm0 4: 66 0f fe 02 paddd (%rdx),%xmm0 8: f3 0f 7f 07 movdqu %xmm0,(%rdi) c: c3 retq So x86-64 target is able to use MOVDQU instruction. Hence ARM target should be able to use VST1.8 as well.