The SPU's load and store instructions ({l,st}q{a,d,r,x}) zero the lower four bits of the computed address before performing the requested load or store. In certain cases (particularly, working with array loads of vector data), this may be utilised to avoid unnecessary shifts and masks.
Consider rottest.c: #include <spu_intrinsics.h> vector unsigned int a[1024]; vector unsigned int f(int i) { return a[i>>ROTN]; } Compiled with "spu-elf-gcc rottest.c -c -S -O3 -DROTN=8" : (Using gcc 4.4.0 20080404) f: rotmai $4,$3,-8 ila $2,a shli $3,$4,4 lqx $3,$2,$3 bi $lr The rotmai and shli may be legitimately combined to yield something like: f: rotmai $4,$3,-4 ila $2,a lqx $3,$2,$3 bi $lr Compiled with "spu-elf-gcc rottest.c -c -S -O3 -DROTN=4" : f: ila $2,a andi $3,$3,-16 lqx $3,$2,$3 bi $lr The andi is redundant. -- Summary: Take advantage of lower bit zeroing of load/store insns on SPU Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jadamcze at utas dot edu dot au GCC target triplet: spu-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36829