The SPU's load and store instructions ({l,st}q{a,d,r,x}) zero the lower four
bits of the computed address before performing the requested load or store.  In
certain cases (particularly, working with array loads of vector data), this may
be utilised to avoid unnecessary shifts and masks.


Consider rottest.c:

#include <spu_intrinsics.h>
vector unsigned int a[1024];
vector unsigned int f(int i) {
        return a[i>>ROTN];
}


Compiled with "spu-elf-gcc rottest.c -c -S -O3 -DROTN=8" :
(Using gcc 4.4.0 20080404)

f:
        rotmai  $4,$3,-8
        ila     $2,a
        shli    $3,$4,4
        lqx     $3,$2,$3
        bi      $lr


The rotmai and shli may be legitimately combined to yield something like:

f:
        rotmai  $4,$3,-4
        ila     $2,a
        lqx     $3,$2,$3
        bi      $lr


Compiled with "spu-elf-gcc rottest.c -c -S -O3 -DROTN=4" :

f:
        ila     $2,a
        andi    $3,$3,-16
        lqx     $3,$2,$3
        bi      $lr

The andi is redundant.


-- 
           Summary: Take advantage of lower bit zeroing of load/store insns
                    on SPU
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jadamcze at utas dot edu dot au
GCC target triplet: spu-elf


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36829

Reply via email to