subject:"\[clang\] \[clang\-repl\]\[CUDA\] Move CUDA module registration to beginning of global

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-10-03 Thread Jonas Hahnfeld via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

hahnjo wrote:

I have the same fear as @Artem-B, higher than default priorities are also 
sometimes reserved. We really need to see what `nvcc` does here, but what I 
could imagine (at least how I would solve it) is putting the constructor with 
the same priority before all other constructors.

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-30 Thread Vassil Vassilev via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

vgvassilev wrote:

@argentite ping.

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-30 Thread Vassil Vassilev via cfe-commits


vgvassilev wrote:

cc: @hahnjo

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

Artem-B wrote:

I'd start with checking what NVCC generates for the initializers. Considering 
that ultimately we need to conform to CUDA runtime expectations and given lack 
of documentation, NVCC-generated code is the only reference we have.
Compile your example with -keep and see what NVCC-generated registration code 
looks like.

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Vassil Vassilev via cfe-commits


https://github.com/vgvassilev edited 
https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Vassil Vassilev via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

vgvassilev wrote:

@Artem-B, I don’t think @argentite is pushing particularly for this solution of 
the problem. It seems we agree that is a problem and the behavior of clang 
diverges from the reference implementation. I believe we should figure out how 
to fix it. 

Rather than changing the priority we can book a slot for the kernel launch 
declaration respecting the init order.

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

Artem-B wrote:

This is a very contrived example. While I agree that it currently does not work 
with CUDA, I am still not convinced that it is a problem that needs to be 
solved in clang.

Let's assume you've set the priority at X. Launching kernels from dynamic 
initializers with higher priority will still be broken, so the patch does not 
solve the problem conceptually.

If you set the priority of CUDA kernel initializers at the highest level (is 
that the ntent of priority=0?), can you guarantee that kernel registration 
never depends on anything else that was expected to get initialized before it? 
We also no longer have *any* wiggle room to run anything before kernel 
registration when we need to.

@MaskRay Fangrui, WDYT about bumping dynamic initializer priority in principle? 
Is there anything else we need to worry about?


https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Anubhab Ghosh via cfe-commits



@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

argentite wrote:

The underlying issues is not actually clang-repl specific, it also affects 
clang. For example, this seems to succeed in `nvcc` but fails with `clang`:
```cpp
#include 

__global__ void kernel() {}

class C {
public:
  C() {
kernel<<<1, 1>>>();
printf("Error: %d\n", cudaGetLastError());
  }
};

C c;

int main() {}
```

This is fixed by this patch. Maybe we can look for a proper solution to this?

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits

@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);

Artem-B wrote:

> User code in Clang interpreter, is also executed through global_ctors. This 
> patch ensures kernels can be launched in the same iteration it is defined in 
> by making the registration first in the list.

This sounds like an application-specific problem that may be addressable by 
lowering priority of user code initializers.

In general, I'm very reluctant to change the initialization order to be 
different from what NVCC generates. We do need to interoperate with NVIDIA's 
libraries and the change in initialization order is potentially risky. 
Considering that we have no practical way to test it, and that it appears to 
address something that affects only one application (and may be dealt with on 
the app level), I do not think we should change the priority for the 
clang-generated kernel registration code.

https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Anubhab Ghosh via cfe-commits


https://github.com/argentite updated 
https://github.com/llvm/llvm-project/pull/66658

>From bed2919f781c5ef71e268c95b31a6b9af5392730 Mon Sep 17 00:00:00 2001
From: Anubhab Ghosh 
Date: Mon, 18 Sep 2023 20:33:19 +0530
Subject: [PATCH] [clang-repl][CUDA] Move CUDA module registration to beginning
 of global_ctors

CUDA device code needs to be registered to the runtime before kernels
can be launched. This is done through a global constructor.
User code in Clang interpreter, is also executed through global_ctors.
This patch ensures kernels can be launched in the same iteration it is
defined in by making the registration first in the list.
---
 clang/lib/CodeGen/CodeGenModule.cpp   |  2 +-
 .../test/Interpreter/CUDA/launch-same-ptu.cu  | 21 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Interpreter/CUDA/launch-same-ptu.cu

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 8b0c9340775cbe9..647c8922f27a00f 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);
   }
   if (OpenMPRuntime) {
 if (llvm::Function *OpenMPRequiresDirectiveRegFun =
diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu 
b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
new file mode 100644
index 000..93e203a47212fbf
--- /dev/null
+++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
@@ -0,0 +1,21 @@
+// Tests __device__ function calls
+// RUN: cat %s | clang-repl --cuda | FileCheck %s
+
+extern "C" int printf(const char*, ...);
+
+int var;
+int* devptr = nullptr;
+printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int)));
+// CHECK: cudaMalloc: 0
+
+__device__ inline void test_device(int* value) { *value = 42; } __global__ 
void test_kernel(int* value) { test_device(value); } 
test_kernel<<<1,1>>>(devptr);
+printf("CUDA Error: %d\n", cudaGetLastError());
+// CHECK-NEXT: CUDA Error: 0
+
+printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), 
cudaMemcpyDeviceToHost));
+// CHECK-NEXT: cudaMemcpy: 0
+
+printf("Value: %d\n", var);
+// CHECK-NEXT: Value: 42
+
+%quit

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread via cfe-commits


llvmbot wrote:




@llvm/pr-subscribers-clang-codegen


Changes

CUDA device code needs to be registered to the runtime before kernels can be 
launched. This is done through a global constructor. User code in Clang 
interpreter, is also executed through `global_ctors`. This patch ensures 
kernels can be launched in the same iteration it is defined in by making the 
registration first in the list.

This allows `#include`-ing a large portion of code that defines device 
functions and also launches kernels in clang-repl. 

---
Full diff: https://github.com/llvm/llvm-project/pull/66658.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+1-1) 
- (added) clang/test/Interpreter/CUDA/launch-same-ptu.cu (+21) 


``diff
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 8b0c9340775cbe9..783865409c778f5 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, 0);
   }
   if (OpenMPRuntime) {
 if (llvm::Function *OpenMPRequiresDirectiveRegFun =
diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu 
b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
new file mode 100644
index 000..93e203a47212fbf
--- /dev/null
+++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
@@ -0,0 +1,21 @@
+// Tests __device__ function calls
+// RUN: cat %s | clang-repl --cuda | FileCheck %s
+
+extern "C" int printf(const char*, ...);
+
+int var;
+int* devptr = nullptr;
+printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int)));
+// CHECK: cudaMalloc: 0
+
+__device__ inline void test_device(int* value) { *value = 42; } __global__ 
void test_kernel(int* value) { test_device(value); } 
test_kernel<<<1,1>>>(devptr);
+printf("CUDA Error: %d\n", cudaGetLastError());
+// CHECK-NEXT: CUDA Error: 0
+
+printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), 
cudaMemcpyDeviceToHost));
+// CHECK-NEXT: cudaMemcpy: 0
+
+printf("Value: %d\n", var);
+// CHECK-NEXT: Value: 42
+
+%quit

``




https://github.com/llvm/llvm-project/pull/66658
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Anubhab Ghosh via cfe-commits


https://github.com/argentite created 
https://github.com/llvm/llvm-project/pull/66658

CUDA device code needs to be registered to the runtime before kernels can be 
launched. This is done through a global constructor. User code in Clang 
interpreter, is also executed through `global_ctors`. This patch ensures 
kernels can be launched in the same iteration it is defined in by making the 
registration first in the list.

This allows `#include`-ing a large portion of code that defines device 
functions and also launches kernels in clang-repl. 

>From fb806d7c7d357f1769538df0ba7729e4b328da79 Mon Sep 17 00:00:00 2001
From: Anubhab Ghosh 
Date: Mon, 18 Sep 2023 20:33:19 +0530
Subject: [PATCH] [clang-repl][CUDA] Move CUDA module registration to beginning
 of global_ctors

CUDA device code needs to be registered to the runtime before kernels
can be launched. This is done through a global constructor.
User code in Clang interpreter, is also executed through global_ctors.
This patch ensures kernels can be launched in the same iteration it is
defined in by making the registration first in the list.
---
 clang/lib/CodeGen/CodeGenModule.cpp   |  2 +-
 .../test/Interpreter/CUDA/launch-same-ptu.cu  | 21 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Interpreter/CUDA/launch-same-ptu.cu

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 8b0c9340775cbe9..783865409c778f5 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -794,7 +794,7 @@ void CodeGenModule::Release() {
   AddGlobalCtor(ObjCInitFunction);
   if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
-  AddGlobalCtor(CudaCtorFunction);
+  AddGlobalCtor(CudaCtorFunction, 0);
   }
   if (OpenMPRuntime) {
 if (llvm::Function *OpenMPRequiresDirectiveRegFun =
diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu 
b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
new file mode 100644
index 000..93e203a47212fbf
--- /dev/null
+++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu
@@ -0,0 +1,21 @@
+// Tests __device__ function calls
+// RUN: cat %s | clang-repl --cuda | FileCheck %s
+
+extern "C" int printf(const char*, ...);
+
+int var;
+int* devptr = nullptr;
+printf("cudaMalloc: %d\n", cudaMalloc((void **) , sizeof(int)));
+// CHECK: cudaMalloc: 0
+
+__device__ inline void test_device(int* value) { *value = 42; } __global__ 
void test_kernel(int* value) { test_device(value); } 
test_kernel<<<1,1>>>(devptr);
+printf("CUDA Error: %d\n", cudaGetLastError());
+// CHECK-NEXT: CUDA Error: 0
+
+printf("cudaMemcpy: %d\n", cudaMemcpy(, devptr, sizeof(int), 
cudaMemcpyDeviceToHost));
+// CHECK-NEXT: cudaMemcpy: 0
+
+printf("Value: %d\n", var);
+// CHECK-NEXT: Value: 42
+
+%quit

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

12 matches

Site Navigation

Mail list logo

Footer information