In altivec load/store instructions (lvx, stvx, ...) and lsvl/lsvr, when address is supplied as pointer + well-known constant, gcc always calculates the actual address in scalar unit and does not use sum in those instructions (puts 0 as index). This slows-down some simple altivec loops.
Sample code: vector unsigned char *vDst = dst; vector unsigned char vSetTo = {}; /* zero */ do { vec_st( vSetTo, 0, vDst ); vec_st( vSetTo, 16, vDst ); vDst += 2; } while (--len); gcc 4.1.2, 4.2.0, 4.3-20070615 produces: .L3: addi %r11,%r9,16 stvx %v0,0,%r9 addi %r9,%r9,32 stvx %v0,0,%r11 bdnz .L3 while, ideally, it should be: li %r11,16 .L3: stvx %v0,0,%r9 stvx %v0,%r11,%r9 addi %r9,%r9,32 bdnz .L3 gcc 3.3, with -O2, behaves quite well in this case (should use 0 instead of r10): li %r10,0 li %r11,16 .L13: stvx %v0,%r10,%r9 stvx %v0,%r11,%r9 addi %r9,%r9,32 bdnz .L13 -- Summary: [PPC/Altivec, regression?] gcc uses 0 as altivec load/store index Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sparky at pld-linux dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32396