jdoerfert wrote:

There seems to be some trouble with NVIDIA offload (I tested mainly AMDGPU) and 
f128, I'll make sure that works too.
The nits are easy to address, I just copied the style around.
I'll also add a IR test to match the new runtime calls and kernel argument 
passing.

> Will kernels in TUs compiled with `-foffload-via-llvm` be interoperable with 
> code that wants to launch them from another TU compiled w/o 
> `-foffload-via-llvm` ?
> 
> E.g.:
> 
> * a.cu: `__global__ void kernel() { ... }`
> * b.cu: `extern __global__ void kernel(); void func() { kernel<<<1,1>>>();}`
> 
> This could use a test in the testsuite to actually check whether it works.

I'll look into this. Intuitively, the kernel launch needs -foffload-via-llvm 
(which implies -foffload-new-driver) and the kernel definition needs 
-foffload-new-driver. Similarly, with the new driver flag device code should 
link fine. Right now, this defaults to gpu-rdc, as OpenMP does, but we can 
change that. On that note, non-rdc should actually internalize all but the 
kernels and thereby help the middle end as well.

https://github.com/llvm/llvm-project/pull/94549
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to