https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895
--- Comment #4 from cesar at gcc dot gnu.org --- (In reply to Thomas Schwinge from comment #3) > (In reply to cesar from comment #2) > > Consequently, you need to > > explicitly use num_gangs, num_workers and vector_length to determine the > > amount of parallelism and gang, worker and vector to partition the acc loops > > accordingly. > > GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers, > and a vector length of 32 (so, you don't need to specify "num_gangs() > num_workers() vector_length()" clauses). What it will not do (and what was > the point of my earlier note in #c1), is assign more than one of OpenACC's > parallelism levels (gang, worker, vector) to a one-level loop constructs, > which is why you'll want to specify "gang worker vector" clauses for the > loop construct. Thomas is correct. I've been focusing too much on the front ends and not the loop partitioning infrastructure. Sorry for the noise.