https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895
--- Comment #3 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- (In reply to cesar from comment #2) > Furthermore, as Thomas mentioned, gcc-6 does not automatically assign > parallelism to loops inside parallel regions. :-) GCC 6.1 does do that that. > Consequently, you need to > explicitly use num_gangs, num_workers and vector_length to determine the > amount of parallelism and gang, worker and vector to partition the acc loops > accordingly. GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers, and a vector length of 32 (so, you don't need to specify "num_gangs() num_workers() vector_length()" clauses). What it will not do (and what was the point of my earlier note in #c1), is assign more than one of OpenACC's parallelism levels (gang, worker, vector) to a one-level loop constructs, which is why you'll want to specify "gang worker vector" clauses for the loop construct.