This patch enables SIMD vectorization on non-SIMT targets in acc vector loops. It does does so by setting the force_vectorization flag in a similar manner to OpenMP SIMD loops. Unlike OpenMP, OpenACC provides the compiler with the flexibility to assign gang, worker and vector parallelism to independent acc loops. At present, automatic parallelism is assigned during the oacc device lower pass, specifically inside oacc_loop_process. Consequently, this patch applies the force_vectorization flag late when the GOACC_LOOP internal functions are expanded into target-specific code.
Note that expand_oacc_for may construct two loops for each acc loop; the outer loop represents the "chunking" factor, whereas the inner loops are for individual gang, worker and vector threads. Also note that OpenACC permits the user to apply any combination of gang, worker and vector level parallelism to each loop. E.g., acc loop gang vector. However, oacc_xform_loop does not strip-mine the acc loops to take advantage of this on non-SIMT targets as it does for SIMT targets. Therefore, this the force vectorization flag is only set when the acc loop has been assigned vector partitioning. Is this patch OK for trunk? Cesar
2017-09-13 Cesar Philippidis <ce...@codesourcery.com> gcc/ * omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on non-SIMT targets in acc vector loops. diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 2d4fd411680..9d5b8bef649 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "stringpool.h" #include "attribs.h" +#include "cfgloop.h" /* Describe the OpenACC looping structure of a function. The entire function is held in a 'NULL' loop. */ @@ -370,6 +371,30 @@ oacc_xform_loop (gcall *call) break; case IFN_GOACC_LOOP_OFFSET: + /* Enable vectorization on non-SIMT targets. */ + if (!targetm.simt.vf + && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR) + /* If not -fno-tree-loop-vectorize, hint that we want to vectorize + the loop. */ + && (flag_tree_loop_vectorize + || !global_options_set.x_flag_tree_loop_vectorize)) + { + basic_block bb = gsi_bb (gsi); + struct loop *parent = bb->loop_father; + struct loop *body = parent->inner; + + parent->force_vectorize = true; + parent->safelen = INT_MAX; + + /* "Chunking loops" may have inner loops. */ + if (parent->inner) + { + body->force_vectorize = true; + body->safelen = INT_MAX; + } + + cfun->has_force_vectorize_loops = true; + } if (striding) { r = oacc_thread_numbers (true, mask, &seq);