https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #7 from vries at gcc dot gnu.org --- (In reply to Richard Biener from comment #6) > I thought that parallelizing vectorized loops is harder (you eventually get > extra prologue and epliogue loops, etc). Another example, par-4.c: ... int __attribute__((noinline,noclone)) f (int argc, double *__restrict results, double *__restrict data, INDEX_TYPE n) { double coeff = 12.2; for (INDEX_TYPE idx = 0; idx < n; idx++) results[idx] = coeff * data[idx]; return !(results[argc] == 0.0); } #define nEvents 1000 #if defined (MAIN) int main (int argc) { double results[nEvents] = {0}; double data[nEvents] = {0}; return f (argc, results, data, nEvents); } #endif ... When not parallelizing, we vectorize without problems: ... parloops_factor: 0, index_type: int: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: unsigned int: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: long: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: unsigned long: vectorized: 1, parallelized: 0 ... When parallelizing, we generate both a low iteration count loop, and a split-off parallelized loop. The vectorizer vectorizes both loops (each of which contains an epilogue): ... parloops_factor: 2, index_type: int: vectorized: 2, parallelized: 1 parloops_factor: 2, index_type: long: vectorized: 2, parallelized: 1 parloops_factor: 2, index_type: unsigned long: vectorized: 2, parallelized: 1 ... Except in the case of unsigned int, in which case it only vectorizes the low iteration count loop: ... parloops_factor: 2, index_type: unsigned int: vectorized: 1, parallelized: 1 ... The other loop fails to vectorize in a fashion similar as decribed for par-2.c with INDEX_TYPE (unsigned) int.