On Thu, May 28, 2015 at 03:06:35PM +0100, Julian Brown wrote:
> For NVPTX, it is vitally important that the divergence of threads
> within a warp can be controlled: in particular we must be able to
> generate code that we know "reconverges" at a particular point.
> Unfortunately GCC's middle-end optimisers can cause this property to
> be violated, which causes problems for the OpenACC execution model
> we're planning to use for NVPTX.
> 
> As a brief example: code running in vector-single mode runs on a
> single thread of a warp, and must broadcast condition results to other
> threads of the warp so that they can "follow along" and be ready for
> vector-partitioned execution when necessary.

I think the lowering of this already at ompexp time is premature,
I think much better would be to have a function attribute (or cgraph
flag) that would be set for functions you want to compile this way
(plus a targetm flag that the targets want to support it that way),
plus a flag in loop structure for the acc loop vector loops
(perhaps the current OpenMP simd loop flags are good enough for that),
and lower it somewhere around the vectorization pass or so.

Or, what exactly do you emit for the fallback code, or for other GPGPUs
or XeonPhi?  To me e.g. for XeonPhi or HSA this sounds like you
want to implement the acc loop gang as a work-sharing loop among
threads (like #pragma omp for) and #pragma acc loop vector like
a loop that should be vectorized if at all possible (like #pragma omp simd).
I really think it is important that OpenACC GCC support is not so strongly
tied to one specific GPGPU, and similarly OpenMP should be usable for
all offloading targets GCC supports.

That way, it is possible to auto-vectorize the code too, decision how
to expand the code of offloaded function is done already separately for each
offloading target, there is a space for optimizations on much simpler
cfg, etc.

        Jakub

Reply via email to