[PATCH] D101976: [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL

Jon Chesterfield via Phabricator via cfe-commits Thu, 06 May 2021 12:52:45 -0700

JonChesterfield added a comment.

In D101976#2742188 <https://reviews.llvm.org/D101976#2742188>, @jdoerfert wrote:

> In D101976#2742166 <https://reviews.llvm.org/D101976#2742166>, 
> @JonChesterfield wrote:
>
>> What are the required semantics of the barrier operations? Amdgcn builds 
>> them on shared memory, so probably needs a change to the corresponding 
>> target_impl to match
>
> I have *not* tested AMDGCN but I was not expecting a problem. The semantics I 
> need here is: 
>  warp N, thread     0 hits a barrier instruction I0
>  warp N, threads 1-31 hit  a barrier instruction I1
>  the entire warp synchronizes and moves on.

One hazard is the amdgpu devicertl only has one barrier. D102016 
<https://reviews.llvm.org/D102016> makes it simpler to add a second. I'd guess 
we want named_sync to call one barrier and syncthreads to call a different one, 
so we should probably rename those functions. The LDS barrier implementation 
needs to know how many threads to wait for, we may be OK passing 'all the 
threads' down from the __syncthreads entry point.

The other is the single instruction pointer per wavefront, like pre-volta 
nvidia cards (which I believe we also expect to work). I'm not sure whether 
totally independent barriers will work, or whether we'll need to arrange for 
thread 0 and thread 1-31 to call the two different barriers at the same point 
in control flow.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101976/new/

https://reviews.llvm.org/D101976

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D101976: [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL

Reply via email to