saiislam added inline comments.

================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck 
-check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
yaxunl wrote:
> saiislam wrote:
> > yaxunl wrote:
> > > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link 
> > > the device lib and verify the load of llvm.amdgcn.abi.version being 
> > > eliminated after optimization.
> > > 
> > > I think currently it cannot do that since llvm.amdgcn.abi.version is not 
> > > internalized by the internalization pass. This can cause some significant 
> > > perf drops since loading is expensive. Need to tweak the function 
> > > controlling what variables can be internalized for amdgpu so that this 
> > > variable gets internalized, or having a generic way to tell that function 
> > > which variables should be internalized, e.g. by adding a metadata 
> > > amdgcn.internalize
> > load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and 
> > mlink-builtin-bitcode of device lib.
> It seems being eliminated by IPSCCP. It makes sense since it is constant 
> weak_odr without externally_initialized. Either changing it to weak or adding 
> externally_initialized will keep the load. Normal `__constant__` var in 
> device code may be changed by host code, therefore they are emitted with 
> externally_initialized and do not have the load eliminated.
Thank you @yaxunl !
I have added these observations as comments in the code at load emit and global 
emit locations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to