https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895

--- Comment #3 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to cesar from comment #2)
> Furthermore, as Thomas mentioned, gcc-6 does not automatically assign
> parallelism to loops inside parallel regions.

:-) GCC 6.1 does do that that.

> Consequently, you need to
> explicitly use num_gangs, num_workers and vector_length to determine the
> amount of parallelism and gang, worker and vector to partition the acc loops
> accordingly.

GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers,
and a vector length of 32 (so, you don't need to specify "num_gangs()
num_workers() vector_length()" clauses).  What it will not do (and what was the
point of my earlier note in #c1), is assign more than one of OpenACC's
parallelism levels (gang, worker, vector) to a one-level loop constructs, which
is why you'll want to specify "gang worker vector" clauses for the loop
construct.

Reply via email to