[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec

2012-12-09 Thread siarhei.siamashka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454



Siarhei Siamashka  changed:



   What|Removed |Added



 Status|UNCONFIRMED |RESOLVED

 Resolution||DUPLICATE



--- Comment #5 from Siarhei Siamashka  
2012-12-09 22:25:17 UTC ---

Appears that this is a duplicate of

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55614



As for memcpy, it looks like this is indeed the preferable "portable" way of

storing vectors to unaligned memory (albeit somewhat buggy at the moment).



And ARM just happens to have a performance issue related to memcpy, but it can

be tracked elsewhere: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634



*** This bug has been marked as a duplicate of bug 55614 ***


[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec

2012-11-25 Thread siarhei.siamashka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454



--- Comment #4 from Siarhei Siamashka  
2012-11-25 21:16:53 UTC ---

(In reply to comment #3)

> Also fails with GCC trunk (gcc version 4.8.0 20120518 (experimental))

 ^^

Sorry, I accidentally compiled GCC from the stale old directory. The recent

trunk 4.8.0 20121120 (experimental) has memcpy issue fixed. Still the STVX

problem is there:



 :

   0:7c 00 18 ce lvx v0,r0,r3

   4:3d 40 00 00 lis r10,0

   8:39 20 00 0a li  r9,10

   c:39 4a 00 00 addir10,r10,0

  10:7c 0a 49 ce stvxv0,r10,r9

  14:4e 80 00 20 blr


[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec

2012-11-25 Thread siarhei.siamashka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454



--- Comment #3 from Siarhei Siamashka  
2012-11-25 19:32:02 UTC ---

Also fails with GCC trunk (gcc version 4.8.0 20120518 (experimental))



The disassembly listing for "init_buffer" function:



 :

   0:7d 80 42 a6 mfvrsave r12

   4:94 21 ff e0 stwur1,-32(r1)

   8:91 81 00 1c stw r12,28(r1)

   c:65 8c 80 00 orisr12,r12,32768

  10:7d 80 43 a6 mtvrsave r12

  14:3d 40 00 00 lis r10,0

  18:7c 00 18 ce lvx v0,r0,r3

  1c:39 20 00 0a li  r9,10

  20:39 4a 00 00 addir10,r10,0

  24:7c 0a 49 ce stvxv0,r10,r9



Here it happily tries to use STVX instruction. And using this instruction just

silently aligns the address down to 16 byte boundary, effectively doing the

write at &buffer[0] instead of &buffer[10].



  28:81 81 00 1c lwz r12,28(r1)

  2c:7d 80 43 a6 mtvrsave r12

  30:38 21 00 20 addir1,r1,32

  34:4e 80 00 20 blr





And by the way, the memcpy workaround mentioned above is also broken in GCC

4.8, because it tries to be clever and generates exactly the same code relying

on STVX :)





With GCC 4.7.2, at least memcpy variant used to work correctly:



 :

   0:3d 40 00 00 lis r10,0

   4:80 a3 00 00 lwz r5,0(r3)

   8:80 c3 00 04 lwz r6,4(r3)

   c:80 e3 00 08 lwz r7,8(r3)

  10:39 2a 00 0a addir9,r10,10

  14:81 03 00 0c lwz r8,12(r3)

  18:90 aa 00 0a stw r5,10(r10)

  1c:90 c9 00 04 stw r6,4(r9)

  20:90 e9 00 08 stw r7,8(r9)

  24:91 09 00 0c stw r8,12(r9)

  28:4e 80 00 20 blr


[Bug target/55454] [PPC] unaligned memory accesses do not work correctly for vector extensions when using altivec

2012-11-25 Thread siarhei.siamashka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55454



--- Comment #2 from Siarhei Siamashka  
2012-11-25 18:18:16 UTC ---

(In reply to comment #1)

> Besides from whether the testcase is valid



According to http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html



"packed - This attribute, attached to struct or union type definition,

specifies that each member (other than zero-width bit-fields) of the structure

or union is placed to minimize the memory required. When attached to an enum

definition, it indicates that the smallest integral type should be used."



Is it safe to assume that the size of this "foo" struct is always expected to

be 17 bytes in the testcase? If yes, then it must be safe to use any alignment

for this struct because an array of "foo" will have elements with addresses at

any possible alignments. As such, any memory location can be safely casted to

foo* and used. Is there anything wrong with these assumptions?





But in fact what I want is just to somehow tell gcc that I'm going to write

this vector data type at an unaligned memory location. For example, x86 SSE2

and ARM NEON have unaligned load/store instructions. PPC Altivec can't do it

easily, but that's a headache for GCC and the application developer (me) should

not care. After all, if running out of options, one can always use



memcpy(buffer + 10, a, sizeof(*a));



instead of



((foo *)(buffer + 9))->data = *a;



The performance goes down the toilet though. Which would be in fact an

acceptable solution for PPC, but x86 and ARM can definitely do much better.



> 4.8 should do a better job here.



Thanks, I'll check GCC 4.8 a bit later.