Turns out the code we are generating for vectorized induction ppc is quite
terrible - the vector induction variable is advanced by a constant step in the
loop (e.g., {4,4,4,4} as in the testcase below). This is the sequence gcc
currently creates for altivec in order to generate the {4,4,4,4} vector:

        li 0,4
        stw 0,-48(1)
        lvewx 0,0,9
        vspltw 0,0,0

So, one thing to figure out is why we don't use the immediate form of the splat
(vspltiw); The other is - why this sequence ends up getting generated not only
before the loop (see insns marked with "<<<1" below), but also inside the
loop... (see insns marked with "<<<2" below). 

This is the testcase (it is basically the testcase
gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c with larger loop count to avoid
complete unrolling):

int main1 (int X)
{
  int s = X;
  int i;
  for (i = 0; i < 96; i++)
    s += i;
  return s;
}

compiled as follows:
gcc -O2 -ftree-vectorize -maltivec -fno-tree-scev-cprop -S t.c


        li 0,4          <<<1
        stw 0,-48(1)    <<<1
        ld 9,[EMAIL PROTECTED](2)
        li 0,23
        mr 11,3
        mtctr 0
        lvx 1,0,9
        addi 9,1,-48
        vor 13,1,1
        lvewx 0,0,9     <<<1
        vspltw 0,0,0    <<<1    
        vadduwm 1,1,0
        .p2align 4,,15
.L2:
        li 0,4          <<<2
        addi 9,1,-48
        vadduwm 13,13,1
        stw 0,-48(1)    <<<2
        lvewx 0,0,9     <<<2
        vspltw 0,0,0    <<<2
        vadduwm 1,1,0
        bdnz .L2


-- 
           Summary: Bad codegen for vectorized induction with altivec
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334

Reply via email to