https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100076

            Bug ID: 100076
           Summary: eembc/automotive/basefp01 has 30.3% regression compare
                    -O2 -ftree-vectorize with -O2 on SKX/CLX
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
                CC: hjl.tools at gmail dot com
  Target Milestone: ---

Refer to https://godbolt.org/z/e3nfz3xvW

cat testcase.c

int
t_run_test(double a)
{

        static double P1, Q1;
        static varsize polyX1[9];
        polyX1[1] = a;
        P1 = (varsize)constantP[0];
        polyX1[1] = a;

// Loop 1
        for( int i1 = 2 ; i1 <= 8 ; i1++ )
        {
            polyX1[i1] = polyX1[i1 - 1] * polyX1[1] ;
        }


        P1 = (varsize)constantP[0] ;
// Loop 2
        for( int i1 = 1 ; i1 <= 8 ; i1++ )
        {
            P1 += (varsize)constantP[i1] * polyX1[i1] ;
        }


        Q1 = (varsize)constantQ[0] ;
// Loop 3
        for( int i1 = 1 ; i1 <= 8 ; i1++ )
        {
            Q1 += (varsize)constantQ[i1] * polyX1[i1] ;
        }


        return a = a * P1 / Q1 ;

}

Loop 1 write array polyX1 which is used by Loop2 and Loop 3, with
-ftree-vectorize -O2, Loop2 and Loop 3 are vectorized, but Loop 1 is not since
it have inter-iterative dependence, then for array polyX1, there're 64-bit
stores in loop 1 and 128-bit load in Loop2 and Loop 3, and it causes store
forwarding stalls which hurt performance.

Reply via email to