In altivec load/store instructions (lvx, stvx, ...) and lsvl/lsvr, when address
is supplied as pointer + well-known constant, gcc always calculates the actual
address in scalar unit and does not use sum in those instructions (puts 0 as
index). This slows-down some simple altivec loops.

Sample code:
        vector unsigned char *vDst = dst;
        vector unsigned char vSetTo = {}; /* zero */

        do {
                vec_st( vSetTo,  0, vDst );
                vec_st( vSetTo, 16, vDst );
                vDst += 2;
        } while (--len);

gcc 4.1.2, 4.2.0, 4.3-20070615 produces:

.L3:
        addi %r11,%r9,16
        stvx %v0,0,%r9
        addi %r9,%r9,32
        stvx %v0,0,%r11
        bdnz .L3

while, ideally, it should be:
        li %r11,16
.L3:
        stvx %v0,0,%r9
        stvx %v0,%r11,%r9
        addi %r9,%r9,32
        bdnz .L3

gcc 3.3, with -O2, behaves quite well in this case (should use 0 instead of
r10):
        li %r10,0
        li %r11,16
.L13:
        stvx %v0,%r10,%r9
        stvx %v0,%r11,%r9
        addi %r9,%r9,32
        bdnz .L13


-- 
           Summary: [PPC/Altivec, regression?] gcc uses 0 as altivec
                    load/store index
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sparky at pld-linux dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32396

Reply via email to