https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

            Bug ID: 66285
           Summary: failure to vectorize parallelized loop
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Another pr46032-inspired example.

Consider par-2.c:
...
#define nEvents 1000

int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data)
{
  double coeff = 12.2;

  for (INDEX_TYPE idx = 0; idx < nEvents; idx++)
    results[idx] = coeff * data[idx];

  return !(results[argc] == 0.0);
}

#if defined (MAIN)
int
main (int argc)
{
  double results[nEvents] = {0};
  double data[nEvents] = {0};

  return f (argc, results, data);
}
#endif
...

And investigate.sh:
...
#!/bin/bash

src=par-2.c

for parloops_factor in 0 2; do
    for index_type in "int" "unsigned int" "long" "unsigned long"; do
        rm -f *.c.*;

        ./lean-c/install/bin/gcc -O2 $src -S \
            -ftree-parallelize-loops=$parloops_factor \
            -ftree-vectorize \
            -fdump-tree-all-all \
            "-DINDEX_TYPE=$index_type"

        vectdump=$src.132t.vect
        pardump=$src.129t.parloops

        vectorized=$(grep -c "LOOP VECTORIZED" $vectdump)

        if [ ! -f $pardump ]; then 
            parallelized=0
        else
            parallelized=$(grep -c "parallelizing inner loop" $pardump)
        fi

        echo "parloops_factor: $parloops_factor, index_type: $index_type:"
        echo "  vectorized: $vectorized, parallelized: $parallelized"
    done
done
...

If we're not parallelizing, vectorization succeeds:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...

If we're parallelizing, vectorization succeeds for (unsigned) long:
...
parloops_factor: 2, index_type: long:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 1, parallelized: 1
...

but not for (unsigned) int:
...
parloops_factor: 2, index_type: int:
  vectorized: 0, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
  vectorized: 0, parallelized: 1
...

Reply via email to