https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89371

            Bug ID: 89371
           Summary: missed vectorisation with "#pragma omp simd
                    collapse(2)"
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arnaud02 at users dot sourceforge.net
  Target Milestone: ---

void ff(double* res, double const* a, double const* b, int ncell, int neq)
{
#pragma omp simd collapse(2)
  for(int icell=0; icell < ncell; ++icell)
  {
      for(int ieq=0; ieq<neq; ++ieq)
      {
          res[icell*neq+ieq] = a[icell*neq+ieq]-b[icell*neq+ieq];
      }
  }
}
built by gcc 8.2 on x86_64 with "-std=c++14 -O3 -mavx -fopenmp-simd" results in
simd instruction emitted. Run time tests with ncell=100'000 and neq=3 for
instance confirm that the code is slower with "#pragma omp simd collapse(2)".

Am I missing something?

Ideally, I would like to be able to flatten the loop:
void ff(double* res, double const* a, double const* b, int ncell, int neq)
{
  for(int j=0; j < ncell*neq; ++j)
    res[j] = a[j]-b[j];
}

Reply via email to