[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-22 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu closed 
https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-22 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/81700

>From 1e9c29c4794ed6d60cf54c902cd0f854afd1aace Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Tue, 13 Feb 2024 10:00:21 -0500
Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc`

`-fgpu-rdc` mode allows device functions call device functions
in different TU. However, currently all device objects
have to be linked together since only one fat binary
is supported. This is time consuming for AMDGPU backend
since it only supports LTO.

There are use cases that objects can be divided into groups
in which device functions are self-contained but host functions
are not. It is desirable to link/optimize/codegen the device
code and generate a fatbin for each group, whereas partially
link the host code with `ld -r` or generate a static library
by using the `-emit-static-lib` option of clang. This avoids
linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin`
for all objects for `-fgpu-rdc`. With this patch, clang
emits an unique external symbol `__hip_fatbin_{cuid}`
for the fat binary for each object. When a group of objects
are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary.
Each group has its own fat binary. One executable or
shared library can have multiple fat binaries. Device
linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also
uniquefied and merged to avoid repeated registering.
Symbol `__hip_cuid_{cuid}` is introduced to facilitate
debugging and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018
---
 clang/lib/CodeGen/CGCUDANV.cpp|  22 +-
 clang/lib/CodeGen/CodeGenModule.cpp   |  10 +-
 clang/lib/Driver/OffloadBundler.cpp   |  40 ++-
 clang/lib/Driver/ToolChains/HIPUtility.cpp| 258 +-
 clang/test/CMakeLists.txt |   1 +
 clang/test/CodeGenCUDA/device-stub.cu |  10 +-
 .../test/CodeGenCUDA/host-used-device-var.cu  |   5 +-
 clang/test/Driver/Inputs/hip.h|  25 ++
 clang/test/Driver/clang-offload-bundler.c |  13 +-
 clang/test/Driver/hip-partial-link.hip|  97 +++
 clang/test/Driver/hip-toolchain-rdc.hip   |  38 ++-
 11 files changed, 469 insertions(+), 50 deletions(-)
 create mode 100644 clang/test/Driver/Inputs/hip.h
 create mode 100644 clang/test/Driver/hip-partial-link.hip

diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..49f93451db7bbb 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
   // to contain the fat binary but will be populated somewhere else,
   // e.g. by lld through link script.
   FatBinStr = new llvm::GlobalVariable(
-CGM.getModule(), CGM.Int8Ty,
-/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-"__hip_fatbin", nullptr,
-llvm::GlobalVariable::NotThreadLocal);
+  CGM.getModule(), CGM.Int8Ty,
+  /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+  llvm::GlobalVariable::NotThreadLocal);
   cast(FatBinStr)->setSection(FatbinConstantName);
 }
 
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   // thread safety of the loaded program. Therefore we can assume sequential
   // execution of constructor functions here.
   if (IsHIP) {
-auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
-llvm::GlobalValue::LinkOnceAnyLinkage;
+auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+ : llvm::GlobalValue::ExternalLinkage;
 llvm::BasicBlock *IfBlock =
 llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc);
 llvm::BasicBlock *ExitBlock =
@@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 // of HIP ABI.
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
-/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy),
-"__hip_gpubin_handle");
-if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage)
-  GpuBinaryHandle->setComdat(
-  CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName()));
+/*Initializer=*/
+CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+CudaGpuBinary
+? "__hip_gpubin_handle"
+: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
 GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign());
 // Prevent the weak symbol in different shared libraries being merged.
 if (Linkage != 

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-21 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

> This makes sense overall, though it's very complicated. Generally we just 
> need to make sure these things are private to one group of files. There's a 
> lot more to parse here compared to the `linker-wrapper`.
> 
> Do any of these tests check when called with `-r`? I'm assuming that's the 
> difference here when we decide whether or not to ghoup files by cuid.

Added a test for `-r`

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-21 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/81700

>From d84645f035609e7ece76c1f2eb06637826b7b22a Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Tue, 13 Feb 2024 10:00:21 -0500
Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc`

`-fgpu-rdc` mode allows device functions call device functions
in different TU. However, currently all device objects
have to be linked together since only one fat binary
is supported. This is time consuming for AMDGPU backend
since it only supports LTO.

There are use cases that objects can be divided into groups
in which device functions are self-contained but host functions
are not. It is desirable to link/optimize/codegen the device
code and generate a fatbin for each group, whereas partially
link the host code with `ld -r` or generate a static library
by using the `-emit-static-lib` option of clang. This avoids
linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin`
for all objects for `-fgpu-rdc`. With this patch, clang
emits an unique external symbol `__hip_fatbin_{cuid}`
for the fat binary for each object. When a group of objects
are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary.
Each group has its own fat binary. One executable or
shared library can have multiple fat binaries. Device
linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also
uniquefied and merged to avoid repeated registering.
Symbol `__hip_cuid_{cuid}` is introduced to facilitate
debugging and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018
---
 clang/lib/CodeGen/CGCUDANV.cpp|  22 +-
 clang/lib/CodeGen/CodeGenModule.cpp   |  10 +-
 clang/lib/Driver/OffloadBundler.cpp   |  40 ++-
 clang/lib/Driver/ToolChains/HIPUtility.cpp| 258 +-
 clang/test/CodeGenCUDA/device-stub.cu |  10 +-
 .../test/CodeGenCUDA/host-used-device-var.cu  |   5 +-
 clang/test/Driver/Inputs/hip.h|  25 ++
 clang/test/Driver/clang-offload-bundler.c |   4 +-
 clang/test/Driver/hip-partial-link.hip|  97 +++
 clang/test/Driver/hip-toolchain-rdc.hip   |  38 ++-
 10 files changed, 461 insertions(+), 48 deletions(-)
 create mode 100644 clang/test/Driver/Inputs/hip.h
 create mode 100644 clang/test/Driver/hip-partial-link.hip

diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..49f93451db7bbb 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
   // to contain the fat binary but will be populated somewhere else,
   // e.g. by lld through link script.
   FatBinStr = new llvm::GlobalVariable(
-CGM.getModule(), CGM.Int8Ty,
-/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-"__hip_fatbin", nullptr,
-llvm::GlobalVariable::NotThreadLocal);
+  CGM.getModule(), CGM.Int8Ty,
+  /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+  llvm::GlobalVariable::NotThreadLocal);
   cast(FatBinStr)->setSection(FatbinConstantName);
 }
 
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   // thread safety of the loaded program. Therefore we can assume sequential
   // execution of constructor functions here.
   if (IsHIP) {
-auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
-llvm::GlobalValue::LinkOnceAnyLinkage;
+auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+ : llvm::GlobalValue::ExternalLinkage;
 llvm::BasicBlock *IfBlock =
 llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc);
 llvm::BasicBlock *ExitBlock =
@@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 // of HIP ABI.
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
-/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy),
-"__hip_gpubin_handle");
-if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage)
-  GpuBinaryHandle->setComdat(
-  CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName()));
+/*Initializer=*/
+CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+CudaGpuBinary
+? "__hip_gpubin_handle"
+: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
 GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign());
 // Prevent the weak symbol in different shared libraries being merged.
 if (Linkage != llvm::GlobalValue::InternalLinkage)
diff --git 

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-15 Thread Yaxun Liu via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {
+for (const auto  : Obj.symbols()) {
+  auto FlagOrErr = Symbol.getFlags();
+  if (!FlagOrErr) {
+errorHandler(FlagOrErr.takeError());
+continue;
+  }
+
+  // Filter only undefined symbols
+  if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) {

yxsamliu wrote:

will do

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-15 Thread Yaxun Liu via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {

yxsamliu wrote:

will do

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-15 Thread Yaxun Liu via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.

yxsamliu wrote:

If the argument is an input file and it fails to be read, the error will be 
captured by the linker and error will be emitted there.

will fix the comment.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

This makes sense overall, though it's very complicated. Generally we just need 
to make sure these things are private to one group of files. There's a lot more 
to parse here compared to the `linker-wrapper`.

Do any of these tests check when called with `-r`? I'm assuming that's the 
difference here when we decide whether or not to ghoup files by cuid.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B approved this pull request.

Overall LGTM. Please wait for @jhuber6's to double check the partial linking 
mechanics details.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {
+for (const auto  : Obj.symbols()) {
+  auto FlagOrErr = Symbol.getFlags();
+  if (!FlagOrErr) {
+errorHandler(FlagOrErr.takeError());
+continue;
+  }
+
+  // Filter only undefined symbols
+  if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) {

Artem-B wrote:

style nit: remove `{}` around single-statement body. 

Applies here and in a handful of other places throughout the patch.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.

Artem-B wrote:

Comment could use some editing. `therefore, ignore an error if we fail to read 
the file`.

This makes me ask -- what if the argument *is* an input file, and we do fail to 
read it. How do we tell apart the linker options from the input file? Relying 
on a failure to read it does not seem to be a good way to handle it.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits


@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple 
,
  : T.normalize();
 }
 
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+  HIPUndefinedFatBinSymbols(const Compilation )
+  : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+  DiagnosticsEngine::Error,
+  "Error collecting HIP undefined fatbin symbols: %0")),
+Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
+populateSymbols();
+if (Verbose) {
+  for (auto Name : FatBinSymbols)
+llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+  for (auto Name : GPUBinHandleSymbols)
+llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+}
+  }
+
+  const std::set () const {
+return FatBinSymbols;
+  }
+
+  const std::set () const {
+return GPUBinHandleSymbols;
+  }
+
+private:
+  const Compilation 
+  unsigned DiagID;
+  bool Quiet;
+  bool Verbose;
+  std::set FatBinSymbols;
+  std::set GPUBinHandleSymbols;
+  const std::string FatBinPrefix = "__hip_fatbin";
+  const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+  void populateSymbols() {
+std::deque WorkList;
+std::set Visited;
+
+for (const auto  : C.getActions()) {
+  WorkList.push_back(Action);
+}
+
+while (!WorkList.empty()) {
+  const Action *CurrentAction = WorkList.front();
+  WorkList.pop_front();
+
+  if (!CurrentAction || !Visited.insert(CurrentAction).second)
+continue;
+
+  if (const auto *IA = dyn_cast(CurrentAction)) {
+std::string ID = IA->getId().str();
+if (!ID.empty()) {
+  ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+  FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+  GPUBinHandleSymbols.insert(
+  Twine(GPUBinHandlePrefix + "_" + ID).str());
+  continue;
+}
+const char *Filename = IA->getInputArg().getValue();
+auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+// Input action could be options to linker, therefore ignore it
+// if cannot read it.
+if (!BufferOrErr)
+  continue;
+
+processInput(BufferOrErr.get()->getMemBufferRef());
+  } else
+WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+CurrentAction->getInputs().end());
+}
+  }
+
+  void processInput(const llvm::MemoryBufferRef ) {
+// Try processing as object file first.
+auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+if (ObjFileOrErr) {
+  processSymbols(**ObjFileOrErr);
+  return;
+}
+
+// Then try processing as archive files.
+llvm::consumeError(ObjFileOrErr.takeError());
+auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+if (ArchiveOrErr) {
+  llvm::Error Err = llvm::Error::success();
+  llvm::object::Archive  = *ArchiveOrErr.get();
+  for (auto  : Archive.children(Err)) {
+auto ChildBufOrErr = Child.getMemoryBufferRef();
+if (ChildBufOrErr)
+  processInput(*ChildBufOrErr);
+else
+  errorHandler(ChildBufOrErr.takeError());
+  }
+
+  if (Err)
+errorHandler(std::move(Err));
+  return;
+}
+
+// Ignore other files.
+llvm::consumeError(ArchiveOrErr.takeError());
+  }
+  void processSymbols(const llvm::object::ObjectFile ) {

Artem-B wrote:

Nit -- add an empty line to separate functions.

https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-14 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/81700

>From 6006975bbff9fc0f6fb9b8e24a52d4963ceb774c Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Tue, 13 Feb 2024 10:00:21 -0500
Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc`

`-fgpu-rdc` mode allows device functions call device functions
in different TU. However, currently all device objects
have to be linked together since only one fat binary
is supported. This is time consuming for AMDGPU backend
since it only supports LTO.

There are use cases that objects can be divided into groups
in which device functions are self-contained but host functions
are not. It is desirable to link/optimize/codegen the device
code and generate a fatbin for each group, whereas partially
link the host code with `ld -r` or generate a static library
by using the `-emit-static-lib` option of clang. This avoids
linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin`
for all objects for `-fgpu-rdc`. With this patch, clang
emits an unique external symbol `__hip_fatbin_{cuid}`
for the fat binary for each object. When a group of objects
are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary.
Each group has its own fat binary. One executable or
shared library can have multiple fat binaries. Device
linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also
uniquefied and merged to avoid repeated registering.
Symbol `__hip_cuid_{cuid}` is introduced to facilitate
debugging and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018
Change-Id: Ia16ac3ddb05b66e6149288aacb0ba4a80120ad8c
---
 clang/lib/CodeGen/CGCUDANV.cpp|  22 +-
 clang/lib/CodeGen/CodeGenModule.cpp   |  10 +-
 clang/lib/Driver/ToolChains/HIPUtility.cpp| 237 +-
 clang/test/CodeGenCUDA/device-stub.cu |  10 +-
 .../test/CodeGenCUDA/host-used-device-var.cu  |   5 +-
 clang/test/Driver/hip-toolchain-rdc.hip   |  38 ++-
 6 files changed, 278 insertions(+), 44 deletions(-)

diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..49f93451db7bbb 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
   // to contain the fat binary but will be populated somewhere else,
   // e.g. by lld through link script.
   FatBinStr = new llvm::GlobalVariable(
-CGM.getModule(), CGM.Int8Ty,
-/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-"__hip_fatbin", nullptr,
-llvm::GlobalVariable::NotThreadLocal);
+  CGM.getModule(), CGM.Int8Ty,
+  /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+  llvm::GlobalVariable::NotThreadLocal);
   cast(FatBinStr)->setSection(FatbinConstantName);
 }
 
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   // thread safety of the loaded program. Therefore we can assume sequential
   // execution of constructor functions here.
   if (IsHIP) {
-auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
-llvm::GlobalValue::LinkOnceAnyLinkage;
+auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+ : llvm::GlobalValue::ExternalLinkage;
 llvm::BasicBlock *IfBlock =
 llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc);
 llvm::BasicBlock *ExitBlock =
@@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 // of HIP ABI.
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
-/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy),
-"__hip_gpubin_handle");
-if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage)
-  GpuBinaryHandle->setComdat(
-  CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName()));
+/*Initializer=*/
+CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+CudaGpuBinary
+? "__hip_gpubin_handle"
+: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
 GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign());
 // Prevent the weak symbol in different shared libraries being merged.
 if (Linkage != llvm::GlobalValue::InternalLinkage)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index c984260b082cd1..218066bced6c19 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -919,7 +919,15 @@ void CodeGenModule::Release() {
 llvm::ConstantArray::get(ATy, UsedArray), 

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-13 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff ec5f4a4bc6f27b044bc73668414ecefe9690d283 
318117089831345caa13d8b4eeea23d0aa2c8588 -- clang/lib/CodeGen/CGCUDANV.cpp 
clang/lib/CodeGen/CodeGenModule.cpp clang/lib/Driver/ToolChains/HIPUtility.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 7d23f94473..49f93451db 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -827,7 +827,7 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
 /*Initializer=*/
-CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
 CudaGpuBinary
 ? "__hip_gpubin_handle"
 : "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp 
b/clang/lib/Driver/ToolChains/HIPUtility.cpp
index 4bd6926ec6..67519a8a54 100644
--- a/clang/lib/Driver/ToolChains/HIPUtility.cpp
+++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp
@@ -56,7 +56,7 @@ public:
   DiagnosticsEngine::Error,
   "Error collecting HIP undefined fatbin symbols: %0")),
 Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
-Verbose(C.getArgs().hasArg(options::OPT_v)){
+Verbose(C.getArgs().hasArg(options::OPT_v)) {
 populateSymbols();
 if (Verbose) {
   for (auto Name : FatBinSymbols)
@@ -284,17 +284,12 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
   HIPUndefinedFatBinSymbols Symbols(C);
 
   std::string PrimaryHipFatbinSymbol;
-  std::string
-  PrimaryGpuBinHandleSymbol;
-  bool FoundPrimaryHipFatbinSymbol =
-  false;
-  bool FoundPrimaryGpuBinHandleSymbol =
-  false;
-
-  std::vector
-  AliasHipFatbinSymbols;
-  std::vector
-  AliasGpuBinHandleSymbols;
+  std::string PrimaryGpuBinHandleSymbol;
+  bool FoundPrimaryHipFatbinSymbol = false;
+  bool FoundPrimaryGpuBinHandleSymbol = false;
+
+  std::vector AliasHipFatbinSymbols;
+  std::vector AliasGpuBinHandleSymbols;
 
   // Iterate through symbols to find the primary ones and collect others for
   // aliasing

``




https://github.com/llvm/llvm-project/pull/81700
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-13 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)


Changes

`-fgpu-rdc` mode allows device functions call device functions in different TU. 
However, currently all device objects have to be linked together since only one 
fat binary is supported. This is time consuming for AMDGPU backend since it 
only supports LTO.

There are use cases that objects can be divided into groups in which device 
functions are self-contained but host functions are not. It is desirable to 
link/optimize/codegen the device code and generate a fatbin for each group, 
whereas partially link the host code with `ld -r` or generate a static library 
by using the `-emit-static-lib` option of clang. This avoids linking all device 
code together, therefore decreases the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin` for all objects for 
`-fgpu-rdc`. With this patch, clang emits an unique external symbol 
`__hip_fatbin_{cuid}` for the fat binary for each object. When a group of 
objects are linked together to generate a fatbin, the symbols are merged by 
alias and point to the same fat binary. Each group has its own fat binary. One 
executable or shared library can have multiple fat binaries. Device linking is 
done for undefined fab binary symbols only to avoid repeated linking. 
`__hip_gpubin_handle` is also uniquefied and merged to avoid repeated 
registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging 
and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018

---
Full diff: https://github.com/llvm/llvm-project/pull/81700.diff


6 Files Affected:

- (modified) clang/lib/CodeGen/CGCUDANV.cpp (+11-11) 
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+9-1) 
- (modified) clang/lib/Driver/ToolChains/HIPUtility.cpp (+229-13) 
- (modified) clang/test/CodeGenCUDA/device-stub.cu (+4-6) 
- (modified) clang/test/CodeGenCUDA/host-used-device-var.cu (+3-2) 
- (modified) clang/test/Driver/hip-toolchain-rdc.hip (+27-11) 


``diff
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..7d23f944732dbf 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
   // to contain the fat binary but will be populated somewhere else,
   // e.g. by lld through link script.
   FatBinStr = new llvm::GlobalVariable(
-CGM.getModule(), CGM.Int8Ty,
-/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-"__hip_fatbin", nullptr,
-llvm::GlobalVariable::NotThreadLocal);
+  CGM.getModule(), CGM.Int8Ty,
+  /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+  llvm::GlobalVariable::NotThreadLocal);
   cast(FatBinStr)->setSection(FatbinConstantName);
 }
 
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   // thread safety of the loaded program. Therefore we can assume sequential
   // execution of constructor functions here.
   if (IsHIP) {
-auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
-llvm::GlobalValue::LinkOnceAnyLinkage;
+auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+ : llvm::GlobalValue::ExternalLinkage;
 llvm::BasicBlock *IfBlock =
 llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc);
 llvm::BasicBlock *ExitBlock =
@@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
 // of HIP ABI.
 GpuBinaryHandle = new llvm::GlobalVariable(
 TheModule, PtrTy, /*isConstant=*/false, Linkage,
-/*Initializer=*/llvm::ConstantPointerNull::get(PtrTy),
-"__hip_gpubin_handle");
-if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage)
-  GpuBinaryHandle->setComdat(
-  CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName()));
+/*Initializer=*/
+CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+CudaGpuBinary
+? "__hip_gpubin_handle"
+: "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
 GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign());
 // Prevent the weak symbol in different shared libraries being merged.
 if (Linkage != llvm::GlobalValue::InternalLinkage)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index c984260b082cd1..218066bced6c19 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -919,7 +919,15 @@ void CodeGenModule::Release() {
 llvm::ConstantArray::get(ATy, UsedArray), "__clang_gpu_used_external");
 addCompilerUsedGlobal(GV);
   }
-
+  if (LangOpts.HIP) {
+// Emit a unique ID so that host and device binaries from the same

[clang] [HIP] Allow partial linking for `-fgpu-rdc` (PR #81700)

2024-02-13 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu created 
https://github.com/llvm/llvm-project/pull/81700

`-fgpu-rdc` mode allows device functions call device functions in different TU. 
However, currently all device objects have to be linked together since only one 
fat binary is supported. This is time consuming for AMDGPU backend since it 
only supports LTO.

There are use cases that objects can be divided into groups in which device 
functions are self-contained but host functions are not. It is desirable to 
link/optimize/codegen the device code and generate a fatbin for each group, 
whereas partially link the host code with `ld -r` or generate a static library 
by using the `-emit-static-lib` option of clang. This avoids linking all device 
code together, therefore decreases the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin` for all objects for 
`-fgpu-rdc`. With this patch, clang emits an unique external symbol 
`__hip_fatbin_{cuid}` for the fat binary for each object. When a group of 
objects are linked together to generate a fatbin, the symbols are merged by 
alias and point to the same fat binary. Each group has its own fat binary. One 
executable or shared library can have multiple fat binaries. Device linking is 
done for undefined fab binary symbols only to avoid repeated linking. 
`__hip_gpubin_handle` is also uniquefied and merged to avoid repeated 
registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging 
and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018

>From 318117089831345caa13d8b4eeea23d0aa2c8588 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" 
Date: Tue, 13 Feb 2024 10:00:21 -0500
Subject: [PATCH] [HIP] Allow partial linking for `-fgpu-rdc`

`-fgpu-rdc` mode allows device functions call device functions
in different TU. However, currently all device objects
have to be linked together since only one fat binary
is supported. This is time consuming for AMDGPU backend
since it only supports LTO.

There are use cases that objects can be divided into groups
in which device functions are self-contained but host functions
are not. It is desirable to link/optimize/codegen the device
code and generate a fatbin for each group, whereas partially
link the host code with `ld -r` or generate a static library
by using the `-emit-static-lib` option of clang. This avoids
linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.

Previously, clang emits an external symbol `__hip_fatbin`
for all objects for `-fgpu-rdc`. With this patch, clang
emits an unique external symbol `__hip_fatbin_{cuid}`
for the fat binary for each object. When a group of objects
are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary.
Each group has its own fat binary. One executable or
shared library can have multiple fat binaries. Device
linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also
uniquefied and merged to avoid repeated registering.
Symbol `__hip_cuid_{cuid}` is introduced to facilitate
debugging and tooling.

Fixes: https://github.com/llvm/llvm-project/issues/77018
---
 clang/lib/CodeGen/CGCUDANV.cpp|  22 +-
 clang/lib/CodeGen/CodeGenModule.cpp   |  10 +-
 clang/lib/Driver/ToolChains/HIPUtility.cpp| 242 +-
 clang/test/CodeGenCUDA/device-stub.cu |  10 +-
 .../test/CodeGenCUDA/host-used-device-var.cu  |   5 +-
 clang/test/Driver/hip-toolchain-rdc.hip   |  38 ++-
 6 files changed, 283 insertions(+), 44 deletions(-)

diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..7d23f944732dbf 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() 
{
   // to contain the fat binary but will be populated somewhere else,
   // e.g. by lld through link script.
   FatBinStr = new llvm::GlobalVariable(
-CGM.getModule(), CGM.Int8Ty,
-/*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
-"__hip_fatbin", nullptr,
-llvm::GlobalVariable::NotThreadLocal);
+  CGM.getModule(), CGM.Int8Ty,
+  /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+  "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+  llvm::GlobalVariable::NotThreadLocal);
   cast(FatBinStr)->setSection(FatbinConstantName);
 }
 
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
   // thread safety of the loaded program. Therefore we can assume sequential
   // execution of constructor functions here.
   if (IsHIP) {
-auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
-llvm::GlobalValue::LinkOnceAnyLinkage;
+auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+