[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. yaxunl marked an inline comment as done. Closed by commit rGf2677afe9159: [CUDA][HIP] Externalize device var in anonymous namespace (authored by yaxunl). Herald added a project: clang. Changed prior to commit: https://reviews.llvm.org/D152164?vs=528447&id=528896#toc Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D152164/new/ https://reviews.llvm.org/D152164 Files: clang/lib/AST/ASTContext.cpp clang/test/CodeGenCUDA/anon-ns.cu clang/test/CodeGenCUDA/host-used-device-var.cu clang/test/CodeGenCUDA/kernel-in-anon-ns.cu Index: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu === --- clang/test/CodeGenCUDA/kernel-in-anon-ns.cu +++ /dev/null @@ -1,58 +0,0 @@ -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - -x hip %s > %t.dev - -// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - -x hip %s > %t.host - -// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=HIP,COMMON %s - -// RUN: echo "GPU binary" > %t.fatbin - -// RUN: %clang_cc1 -triple nvptx -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - %s > %t.dev - -// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple nvptx -std=c++11 -fgpu-rdc -fcuda-include-gpubinary %t.fatbin \ -// RUN: -emit-llvm -o - %s > %t.host - -// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=CUDA,COMMON %s - -#include "Inputs/cuda.h" - -// HIP-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv\.intern\.b04fd23c98500190]]( -// HIP-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT_\.intern\.b04fd23c98500190]]( -// HIP-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT_\.intern\.b04fd23c98500190]]( - -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv__intern__b04fd23c98500190]]( -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT___intern__b04fd23c98500190]]( -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT___intern__b04fd23c98500190]]( - -// COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" -// COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00" -// COMMON-DAG: @[[STR3:.*]] = {{.*}} c"[[KERN3]]\00" - -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR1]] -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR2]] -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR3]] - - -template -__global__ void tempKern(T x) {} - -namespace { - __global__ void kernel() {} - struct X {}; - X x; - auto lambda = [](){}; -} - -void test() { - kernel<<<1, 1>>>(); - - tempKern<<<1, 1>>>(x); - - tempKern<<<1, 1>>>(lambda); -} Index: clang/test/CodeGenCUDA/host-used-device-var.cu === --- clang/test/CodeGenCUDA/host-used-device-var.cu +++ clang/test/CodeGenCUDA/host-used-device-var.cu @@ -73,9 +73,8 @@ inline constexpr int constexpr_var1b = 1; // Check constant constexpr variables ODR-used by host code only. -// Non-inline constexpr variable has internal linkage, therefore it is not accessible by host and not kept. -// Inline constexpr variable has linkonce_ord linkage, therefore it can be accessed by host and kept. -// DEV-NEG-NOT: constexpr_var2a +// Device-side constexpr variables accessed by host code should be externalized and kept. +// DEV-DAG: @_ZL15constexpr_var2a = addrspace(4) externally_initialized constant i32 2 // DEV-DAG: @constexpr_var2b = linkonce_odr addrspace(4) externally_initialized constant i32 2 __constant__ constexpr int constexpr_var2a = 2; inline __constant__ constexpr int constexpr_var2b = 2; @@ -184,6 +183,7 @@ // Check the exact list of variables to ensure @_ZL2u4 is not among them. // DEV: @llvm.compiler.used = {{[^@]*}} @_Z10p_add_funcIiE +// DEV-SAME: {{^[^@]*}} @_ZL15constexpr_var2a // DEV-SAME: {{^[^@]*}} @_ZL2u3 // DEV-SAME: {{^[^@]*}} @_ZZ4fun1vE11static_var1 // DEV-SAME: {{^[^@]*}} @_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var2 Index: clang/test/CodeGenCUDA/anon-ns.cu === --- /dev/null +++ clang/test/CodeGenCUDA/anon-ns.cu @@ -0,0 +1,97 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++17 -fgpu-rdc \ +// RUN: -emit-llvm -o - -x hip %s > %t.dev + +// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ +// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++17 -fgpu-rdc \ +// RUN: -emit-l
[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: clang/test/CodeGenCUDA/anon-ns.cu:46 + +// COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" +// COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00" tra wrote: > Nit: I'd rename the patterns to reflect the names of the source entities they > track, so we don't have to dig through multiple dependent matches in order to > figure out what the test does. > E.g. for `tempKern` : `KERN3`, `STR3` -> `TKERN`, `TKERNSTR`. > > Maybe give kernels/variables more distinct names as well. My brain keeps > trying to interpret `temp` as `temporary`. > A common naming scheme would be nice. E.g. `tk`, `tv` for the template kernel > and variable, `a*` for anonymous entities. > will do. thanks. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D152164/new/ https://reviews.llvm.org/D152164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/CodeGenCUDA/anon-ns.cu:46 + +// COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" +// COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00" Nit: I'd rename the patterns to reflect the names of the source entities they track, so we don't have to dig through multiple dependent matches in order to figure out what the test does. E.g. for `tempKern` : `KERN3`, `STR3` -> `TKERN`, `TKERNSTR`. Maybe give kernels/variables more distinct names as well. My brain keeps trying to interpret `temp` as `temporary`. A common naming scheme would be nice. E.g. `tk`, `tv` for the template kernel and variable, `a*` for anonymous entities. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D152164/new/ https://reviews.llvm.org/D152164 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace
yaxunl updated this revision to Diff 528447. yaxunl added a comment. add a test to make sure device var in an anonymous namespace is not externalized if used by device code only. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D152164/new/ https://reviews.llvm.org/D152164 Files: clang/lib/AST/ASTContext.cpp clang/test/CodeGenCUDA/anon-ns.cu clang/test/CodeGenCUDA/kernel-in-anon-ns.cu Index: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu === --- clang/test/CodeGenCUDA/kernel-in-anon-ns.cu +++ /dev/null @@ -1,58 +0,0 @@ -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - -x hip %s > %t.dev - -// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - -x hip %s > %t.host - -// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=HIP,COMMON %s - -// RUN: echo "GPU binary" > %t.fatbin - -// RUN: %clang_cc1 -triple nvptx -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ -// RUN: -emit-llvm -o - %s > %t.dev - -// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple nvptx -std=c++11 -fgpu-rdc -fcuda-include-gpubinary %t.fatbin \ -// RUN: -emit-llvm -o - %s > %t.host - -// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=CUDA,COMMON %s - -#include "Inputs/cuda.h" - -// HIP-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv\.intern\.b04fd23c98500190]]( -// HIP-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT_\.intern\.b04fd23c98500190]]( -// HIP-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT_\.intern\.b04fd23c98500190]]( - -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv__intern__b04fd23c98500190]]( -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT___intern__b04fd23c98500190]]( -// CUDA-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT___intern__b04fd23c98500190]]( - -// COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" -// COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00" -// COMMON-DAG: @[[STR3:.*]] = {{.*}} c"[[KERN3]]\00" - -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR1]] -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR2]] -// COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR3]] - - -template -__global__ void tempKern(T x) {} - -namespace { - __global__ void kernel() {} - struct X {}; - X x; - auto lambda = [](){}; -} - -void test() { - kernel<<<1, 1>>>(); - - tempKern<<<1, 1>>>(x); - - tempKern<<<1, 1>>>(lambda); -} Index: clang/test/CodeGenCUDA/anon-ns.cu === --- /dev/null +++ clang/test/CodeGenCUDA/anon-ns.cu @@ -0,0 +1,97 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++17 -fgpu-rdc \ +// RUN: -emit-llvm -o - -x hip %s > %t.dev + +// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ +// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++17 -fgpu-rdc \ +// RUN: -emit-llvm -o - -x hip %s > %t.host + +// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=HIP,COMMON %s +// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=COMNEG %s + +// RUN: echo "GPU binary" > %t.fatbin + +// RUN: %clang_cc1 -triple nvptx -fcuda-is-device -cuid=abc \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++17 -fgpu-rdc \ +// RUN: -emit-llvm -o - %s > %t.dev + +// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ +// RUN: -aux-triple nvptx -std=c++17 -fgpu-rdc -fcuda-include-gpubinary %t.fatbin \ +// RUN: -emit-llvm -o - %s > %t.host + +// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=CUDA,COMMON %s +// RUN: cat %t.dev %t.host | FileCheck -check-prefixes=COMNEG %s + +#include "Inputs/cuda.h" + +// HIP-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv\.intern\.b04fd23c98500190]]( +// HIP-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT_\.intern\.b04fd23c98500190]]( +// HIP-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT_\.intern\.b04fd23c98500190]]( +// HIP-DAG: @[[VAR1:_ZN12_GLOBAL__N_11AE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global +// HIP-DAG: @[[VAR2:_ZN12_GLOBAL__N_11BE\.static\.b04fd23c98500190]] = addrspace(4) externally_initialized global +// HIP-DAG: @[[VAR3:_Z7tempVarIN12_GLOBAL__N_11XEE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global + +// CUDA-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv__intern__b04fd23c98500190]]( +// CUDA-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT___intern__b04fd23c98500190]]( +// CU
[PATCH] D152164: [CUDA][HIP] Externalize device var in anonymous namespace
yaxunl created this revision. yaxunl added a reviewer: tra. Herald added a subscriber: mattd. Herald added a project: All. yaxunl requested review of this revision. Device variables in an anonymous namespace may be referenced by host code, therefore they need to be externalized in a similar way as a static device variables or kernels in an anonymous namespace. Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3246 https://reviews.llvm.org/D152164 Files: clang/lib/AST/ASTContext.cpp clang/test/CodeGenCUDA/anon-ns.cu clang/test/CodeGenCUDA/kernel-in-anon-ns.cu Index: clang/test/CodeGenCUDA/anon-ns.cu === --- clang/test/CodeGenCUDA/anon-ns.cu +++ clang/test/CodeGenCUDA/anon-ns.cu @@ -1,9 +1,9 @@ // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++17 -fgpu-rdc \ // RUN: -emit-llvm -o - -x hip %s > %t.dev // RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++11 -fgpu-rdc \ +// RUN: -aux-triple amdgcn-amd-amdhsa -std=c++17 -fgpu-rdc \ // RUN: -emit-llvm -o - -x hip %s > %t.host // RUN: cat %t.dev %t.host | FileCheck -check-prefixes=HIP,COMMON %s @@ -11,11 +11,11 @@ // RUN: echo "GPU binary" > %t.fatbin // RUN: %clang_cc1 -triple nvptx -fcuda-is-device -cuid=abc \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -std=c++17 -fgpu-rdc \ // RUN: -emit-llvm -o - %s > %t.dev // RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \ -// RUN: -aux-triple nvptx -std=c++11 -fgpu-rdc -fcuda-include-gpubinary %t.fatbin \ +// RUN: -aux-triple nvptx -std=c++17 -fgpu-rdc -fcuda-include-gpubinary %t.fatbin \ // RUN: -emit-llvm -o - %s > %t.host // RUN: cat %t.dev %t.host | FileCheck -check-prefixes=CUDA,COMMON %s @@ -25,34 +25,62 @@ // HIP-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv\.intern\.b04fd23c98500190]]( // HIP-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT_\.intern\.b04fd23c98500190]]( // HIP-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT_\.intern\.b04fd23c98500190]]( +// HIP-DAG: @[[VAR1:_ZN12_GLOBAL__N_11AE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global +// HIP-DAG: @[[VAR2:_ZN12_GLOBAL__N_11BE\.static\.b04fd23c98500190]] = addrspace(4) externally_initialized global +// HIP-DAG: @[[VAR3:_Z7tempVarIN12_GLOBAL__N_11XEE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global // CUDA-DAG: define weak_odr {{.*}}void @[[KERN1:_ZN12_GLOBAL__N_16kernelEv__intern__b04fd23c98500190]]( // CUDA-DAG: define weak_odr {{.*}}void @[[KERN2:_Z8tempKernIN12_GLOBAL__N_11XEEvT___intern__b04fd23c98500190]]( // CUDA-DAG: define weak_odr {{.*}}void @[[KERN3:_Z8tempKernIN12_GLOBAL__N_1UlvE_EEvT___intern__b04fd23c98500190]]( +// CUDA-DAG: @[[VAR2:_ZN12_GLOBAL__N_11BE__static__b04fd23c98500190]] = addrspace(4) externally_initialized global +// CUDA-DAG: @[[VAR3:_Z7tempVarIN12_GLOBAL__N_11XEE__static__b04fd23c98500190]] = addrspace(1) externally_initialized global + +// HIP-DAG: @llvm.compiler.used = {{.*}}@[[VAR1]]{{.*}}@[[VAR3]]{{.*}}@[[VAR2]] +// CUDA-DAG: @llvm.compiler.used = {{.*}}@[[VAR3]]{{.*}}@[[VAR2]] // COMMON-DAG: @[[STR1:.*]] = {{.*}} c"[[KERN1]]\00" // COMMON-DAG: @[[STR2:.*]] = {{.*}} c"[[KERN2]]\00" // COMMON-DAG: @[[STR3:.*]] = {{.*}} c"[[KERN3]]\00" +// HIP-DAG: @[[STR4:.*]] = {{.*}} c"[[VAR1]]\00" +// COMMON-DAG: @[[STR5:.*]] = {{.*}} c"[[VAR2]]\00" +// COMMON-DAG: @[[STR6:.*]] = {{.*}} c"[[VAR3]]\00" // COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR1]] // COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR2]] // COMMON-DAG: call i32 @__{{.*}}RegisterFunction({{.*}}@[[STR3]] - +// HIP-DAG: call void @__{{.*}}RegisterManagedVar({{.*}}@[[STR4]] +// COMMON-DAG: call void @__{{.*}}RegisterVar({{.*}}@[[STR5]] +// COMMON-DAG: call void @__{{.*}}RegisterVar({{.*}}@[[STR6]] template __global__ void tempKern(T x) {} +template +__device__ T tempVar; + namespace { __global__ void kernel() {} struct X {}; X x; auto lambda = [](){}; +#if __HIP__ + __managed__ int A = 1; +#endif + __constant__ int B = 2; } +template +void getSymbol(T *x) {} + void test() { kernel<<<1, 1>>>(); tempKern<<<1, 1>>>(x); tempKern<<<1, 1>>>(lambda); +#if __HIP__ + getSymbol(&A); +#endif + getSymbol(&B); + getSymbol(&tempVar); } Index: clang/lib/AST/ASTContext.cpp === --- clang/lib/AST/ASTContext.cpp +++ clang/lib/AST/ASTContext.cpp @@ -13602,16 +13602,17 @@ } bool ASTContext::mayExternalize(const Decl *D) const { - bool IsStaticVar = - isa(D) && cast(D)->getStorageClass() == SC_Static; + bool IsInternalVar