https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895

--- Comment #4 from cesar at gcc dot gnu.org ---
(In reply to Thomas Schwinge from comment #3)
> (In reply to cesar from comment #2)

> > Consequently, you need to
> > explicitly use num_gangs, num_workers and vector_length to determine the
> > amount of parallelism and gang, worker and vector to partition the acc loops
> > accordingly.
> 
> GCC 6.1 by default will configure nvptx offloading for 32 gangs, 32 workers,
> and a vector length of 32 (so, you don't need to specify "num_gangs()
> num_workers() vector_length()" clauses).  What it will not do (and what was
> the point of my earlier note in #c1), is assign more than one of OpenACC's
> parallelism levels (gang, worker, vector) to a one-level loop constructs,
> which is why you'll want to specify "gang worker vector" clauses for the
> loop construct.

Thomas is correct. I've been focusing too much on the front ends and not the
loop partitioning infrastructure. Sorry for the noise.

Reply via email to