[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454 Siarhei Siamashka changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE --- Comment #5 from Siarhei Siamashka 2012-12-09 22:25:17 UTC --- Appears that this is a duplicate of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55614 As for memcpy, it looks like this is indeed the preferable "portable" way of storing vectors to unaligned memory (albeit somewhat buggy at the moment). And ARM just happens to have a performance issue related to memcpy, but it can be tracked elsewhere: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634 *** This bug has been marked as a duplicate of bug 55614 ***
[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454 --- Comment #4 from Siarhei Siamashka 2012-11-25 21:16:53 UTC --- (In reply to comment #3) > Also fails with GCC trunk (gcc version 4.8.0 20120518 (experimental)) ^^ Sorry, I accidentally compiled GCC from the stale old directory. The recent trunk 4.8.0 20121120 (experimental) has memcpy issue fixed. Still the STVX problem is there: : 0:7c 00 18 ce lvx v0,r0,r3 4:3d 40 00 00 lis r10,0 8:39 20 00 0a li r9,10 c:39 4a 00 00 addir10,r10,0 10:7c 0a 49 ce stvxv0,r10,r9 14:4e 80 00 20 blr
[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454 --- Comment #3 from Siarhei Siamashka 2012-11-25 19:32:02 UTC --- Also fails with GCC trunk (gcc version 4.8.0 20120518 (experimental)) The disassembly listing for "init_buffer" function: : 0:7d 80 42 a6 mfvrsave r12 4:94 21 ff e0 stwur1,-32(r1) 8:91 81 00 1c stw r12,28(r1) c:65 8c 80 00 orisr12,r12,32768 10:7d 80 43 a6 mtvrsave r12 14:3d 40 00 00 lis r10,0 18:7c 00 18 ce lvx v0,r0,r3 1c:39 20 00 0a li r9,10 20:39 4a 00 00 addir10,r10,0 24:7c 0a 49 ce stvxv0,r10,r9 Here it happily tries to use STVX instruction. And using this instruction just silently aligns the address down to 16 byte boundary, effectively doing the write at &buffer[0] instead of &buffer[10]. 28:81 81 00 1c lwz r12,28(r1) 2c:7d 80 43 a6 mtvrsave r12 30:38 21 00 20 addir1,r1,32 34:4e 80 00 20 blr And by the way, the memcpy workaround mentioned above is also broken in GCC 4.8, because it tries to be clever and generates exactly the same code relying on STVX :) With GCC 4.7.2, at least memcpy variant used to work correctly: : 0:3d 40 00 00 lis r10,0 4:80 a3 00 00 lwz r5,0(r3) 8:80 c3 00 04 lwz r6,4(r3) c:80 e3 00 08 lwz r7,8(r3) 10:39 2a 00 0a addir9,r10,10 14:81 03 00 0c lwz r8,12(r3) 18:90 aa 00 0a stw r5,10(r10) 1c:90 c9 00 04 stw r6,4(r9) 20:90 e9 00 08 stw r7,8(r9) 24:91 09 00 0c stw r8,12(r9) 28:4e 80 00 20 blr
[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454 --- Comment #2 from Siarhei Siamashka 2012-11-25 18:18:16 UTC --- (In reply to comment #1) > Besides from whether the testcase is valid According to http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html "packed - This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used." Is it safe to assume that the size of this "foo" struct is always expected to be 17 bytes in the testcase? If yes, then it must be safe to use any alignment for this struct because an array of "foo" will have elements with addresses at any possible alignments. As such, any memory location can be safely casted to foo* and used. Is there anything wrong with these assumptions? But in fact what I want is just to somehow tell gcc that I'm going to write this vector data type at an unaligned memory location. For example, x86 SSE2 and ARM NEON have unaligned load/store instructions. PPC Altivec can't do it easily, but that's a headache for GCC and the application developer (me) should not care. After all, if running out of options, one can always use memcpy(buffer + 10, a, sizeof(*a)); instead of ((foo *)(buffer + 9))->data = *a; The performance goes down the toilet though. Which would be in fact an acceptable solution for PPC, but x86 and ARM can definitely do much better. > 4.8 should do a better job here. Thanks, I'll check GCC 4.8 a bit later.