[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); hahnjo wrote: I have the same fear as @Artem-B, higher than default priorities are also sometimes reserved. We really need to see what `nvcc` does here, but what I could imagine (at least how I would solve it) is putting the constructor with the same priority before all other constructors. https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); vgvassilev wrote: @argentite ping. https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
vgvassilev wrote: cc: @hahnjo https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); Artem-B wrote: I'd start with checking what NVCC generates for the initializers. Considering that ultimately we need to conform to CUDA runtime expectations and given lack of documentation, NVCC-generated code is the only reference we have. Compile your example with -keep and see what NVCC-generated registration code looks like. https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
https://github.com/vgvassilev edited https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); vgvassilev wrote: @Artem-B, I don’t think @argentite is pushing particularly for this solution of the problem. It seems we agree that is a problem and the behavior of clang diverges from the reference implementation. I believe we should figure out how to fix it. Rather than changing the priority we can book a slot for the kernel launch declaration respecting the init order. https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); Artem-B wrote: This is a very contrived example. While I agree that it currently does not work with CUDA, I am still not convinced that it is a problem that needs to be solved in clang. Let's assume you've set the priority at X. Launching kernels from dynamic initializers with higher priority will still be broken, so the patch does not solve the problem conceptually. If you set the priority of CUDA kernel initializers at the highest level (is that the ntent of priority=0?), can you guarantee that kernel registration never depends on anything else that was expected to get initialized before it? We also no longer have *any* wiggle room to run anything before kernel registration when we need to. @MaskRay Fangrui, WDYT about bumping dynamic initializer priority in principle? Is there anything else we need to worry about? https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); argentite wrote: The underlying issues is not actually clang-repl specific, it also affects clang. For example, this seems to succeed in `nvcc` but fails with `clang`: ```cpp #include __global__ void kernel() {} class C { public: C() { kernel<<<1, 1>>>(); printf("Error: %d\n", cudaGetLastError()); } }; C c; int main() {} ``` This is fixed by this patch. Maybe we can look for a proper solution to this? https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); Artem-B wrote: > User code in Clang interpreter, is also executed through global_ctors. This > patch ensures kernels can be launched in the same iteration it is defined in > by making the registration first in the list. This sounds like an application-specific problem that may be addressable by lowering priority of user code initializers. In general, I'm very reluctant to change the initialization order to be different from what NVCC generates. We do need to interoperate with NVIDIA's libraries and the change in initialization order is potentially risky. Considering that we have no practical way to test it, and that it appears to address something that affects only one application (and may be dealt with on the app level), I do not think we should change the priority for the clang-generated kernel registration code. https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
https://github.com/argentite updated https://github.com/llvm/llvm-project/pull/66658 >From bed2919f781c5ef71e268c95b31a6b9af5392730 Mon Sep 17 00:00:00 2001 From: Anubhab Ghosh Date: Mon, 18 Sep 2023 20:33:19 +0530 Subject: [PATCH] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list. --- clang/lib/CodeGen/CodeGenModule.cpp | 2 +- .../test/Interpreter/CUDA/launch-same-ptu.cu | 21 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 clang/test/Interpreter/CUDA/launch-same-ptu.cu diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 8b0c9340775cbe9..647c8922f27a00f 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, /*Priority=*/0); } if (OpenMPRuntime) { if (llvm::Function *OpenMPRequiresDirectiveRegFun = diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu b/clang/test/Interpreter/CUDA/launch-same-ptu.cu new file mode 100644 index 000..93e203a47212fbf --- /dev/null +++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu @@ -0,0 +1,21 @@ +// Tests __device__ function calls +// RUN: cat %s | clang-repl --cuda | FileCheck %s + +extern "C" int printf(const char*, ...); + +int var; +int* devptr = nullptr; +printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int))); +// CHECK: cudaMalloc: 0 + +__device__ inline void test_device(int* value) { *value = 42; } __global__ void test_kernel(int* value) { test_device(value); } test_kernel<<<1,1>>>(devptr); +printf("CUDA Error: %d\n", cudaGetLastError()); +// CHECK-NEXT: CUDA Error: 0 + +printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), cudaMemcpyDeviceToHost)); +// CHECK-NEXT: cudaMemcpy: 0 + +printf("Value: %d\n", var); +// CHECK-NEXT: Value: 42 + +%quit ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Changes CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through `global_ctors`. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list. This allows `#include`-ing a large portion of code that defines device functions and also launches kernels in clang-repl. --- Full diff: https://github.com/llvm/llvm-project/pull/66658.diff 2 Files Affected: - (modified) clang/lib/CodeGen/CodeGenModule.cpp (+1-1) - (added) clang/test/Interpreter/CUDA/launch-same-ptu.cu (+21) ``diff diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 8b0c9340775cbe9..783865409c778f5 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, 0); } if (OpenMPRuntime) { if (llvm::Function *OpenMPRequiresDirectiveRegFun = diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu b/clang/test/Interpreter/CUDA/launch-same-ptu.cu new file mode 100644 index 000..93e203a47212fbf --- /dev/null +++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu @@ -0,0 +1,21 @@ +// Tests __device__ function calls +// RUN: cat %s | clang-repl --cuda | FileCheck %s + +extern "C" int printf(const char*, ...); + +int var; +int* devptr = nullptr; +printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int))); +// CHECK: cudaMalloc: 0 + +__device__ inline void test_device(int* value) { *value = 42; } __global__ void test_kernel(int* value) { test_device(value); } test_kernel<<<1,1>>>(devptr); +printf("CUDA Error: %d\n", cudaGetLastError()); +// CHECK-NEXT: CUDA Error: 0 + +printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), cudaMemcpyDeviceToHost)); +// CHECK-NEXT: cudaMemcpy: 0 + +printf("Value: %d\n", var); +// CHECK-NEXT: Value: 42 + +%quit `` https://github.com/llvm/llvm-project/pull/66658 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)
https://github.com/argentite created https://github.com/llvm/llvm-project/pull/66658 CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through `global_ctors`. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list. This allows `#include`-ing a large portion of code that defines device functions and also launches kernels in clang-repl. >From fb806d7c7d357f1769538df0ba7729e4b328da79 Mon Sep 17 00:00:00 2001 From: Anubhab Ghosh Date: Mon, 18 Sep 2023 20:33:19 +0530 Subject: [PATCH] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list. --- clang/lib/CodeGen/CodeGenModule.cpp | 2 +- .../test/Interpreter/CUDA/launch-same-ptu.cu | 21 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 clang/test/Interpreter/CUDA/launch-same-ptu.cu diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 8b0c9340775cbe9..783865409c778f5 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, 0); } if (OpenMPRuntime) { if (llvm::Function *OpenMPRequiresDirectiveRegFun = diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu b/clang/test/Interpreter/CUDA/launch-same-ptu.cu new file mode 100644 index 000..93e203a47212fbf --- /dev/null +++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu @@ -0,0 +1,21 @@ +// Tests __device__ function calls +// RUN: cat %s | clang-repl --cuda | FileCheck %s + +extern "C" int printf(const char*, ...); + +int var; +int* devptr = nullptr; +printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int))); +// CHECK: cudaMalloc: 0 + +__device__ inline void test_device(int* value) { *value = 42; } __global__ void test_kernel(int* value) { test_device(value); } test_kernel<<<1,1>>>(devptr); +printf("CUDA Error: %d\n", cudaGetLastError()); +// CHECK-NEXT: CUDA Error: 0 + +printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), cudaMemcpyDeviceToHost)); +// CHECK-NEXT: cudaMemcpy: 0 + +printf("Value: %d\n", var); +// CHECK-NEXT: Value: 42 + +%quit ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits