https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #8 from vries at gcc dot gnu.org --- For example par-4.c, if we use the same patch to interchange the passes, we get: When not parallelizing, all loops get vectorized: ... parloops_factor: 0, index_type: int: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: unsigned int: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: long: vectorized: 1, parallelized: 0 parloops_factor: 0, index_type: unsigned long: vectorized: 1, parallelized: 0 ... When parallelizing, we parallelize one loop. ... parloops_factor: 2, index_type: int: vectorized: 1, parallelized: 1 parloops_factor: 2, index_type: unsigned int: vectorized: 1, parallelized: 1 parloops_factor: 2, index_type: long: vectorized: 1, parallelized: 1 parloops_factor: 2, index_type: unsigned long: vectorized: 1, parallelized: 1 ... The loop that is parallelized is the vectorized loop, not the epilogue. So AFAIU: - with this patch the epilogue is only performed by the main thread, after all the threads are done. Each thread handles one slice of the vectorized loop. - without the patch, the epilogue is potentially executed by each thread.