saiislam added inline comments.
================ Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12 +// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s + +#include "Inputs/cuda.h" ---------------- yaxunl wrote: > saiislam wrote: > > yaxunl wrote: > > > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link > > > the device lib and verify the load of llvm.amdgcn.abi.version being > > > eliminated after optimization. > > > > > > I think currently it cannot do that since llvm.amdgcn.abi.version is not > > > internalized by the internalization pass. This can cause some significant > > > perf drops since loading is expensive. Need to tweak the function > > > controlling what variables can be internalized for amdgpu so that this > > > variable gets internalized, or having a generic way to tell that function > > > which variables should be internalized, e.g. by adding a metadata > > > amdgcn.internalize > > load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and > > mlink-builtin-bitcode of device lib. > It seems being eliminated by IPSCCP. It makes sense since it is constant > weak_odr without externally_initialized. Either changing it to weak or adding > externally_initialized will keep the load. Normal `__constant__` var in > device code may be changed by host code, therefore they are emitted with > externally_initialized and do not have the load eliminated. Thank you @yaxunl ! I have added these observations as comments in the code at load emit and global emit locations. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D139730/new/ https://reviews.llvm.org/D139730 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits