https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #8 from vries at gcc dot gnu.org ---
For example par-4.c, if we use the same patch to interchange the passes, we
get:

When not parallelizing, all loops get vectorized:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...

When parallelizing, we parallelize one loop.
...
parloops_factor: 2, index_type: int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 1, parallelized: 1
...
The loop that is parallelized is the vectorized loop, not the epilogue.


So AFAIU:
- with this patch the epilogue is only performed by the main thread, after all
  the threads are done. Each thread handles one slice of the vectorized loop.
- without the patch, the epilogue is potentially executed by each thread.

Reply via email to