jhuber6 added a comment.

In D128914#3643802 <https://reviews.llvm.org/D128914#3643802>, @tra wrote:

> For what it's worth, NCCL <https://developer.nvidia.com/nccl> is the only 
> nontrivial library that needs RDC compilation that I'm aware of.
> It's also self-contained for RDC purposes we only need to use RDC on the 
> library TUs and do not need to propagate it to all CUDA TUs in the build.
>
> I believe such 'constrained' RDC compilation will likely be the reasonable 
> practical trade-off. It may not become the default compilation mode, but we 
> should be able to control where the "fully linked GPU executable" boundary is 
> and it's not necessarily going to match the fully-linked host executable.

Theoretically we could do this with a relocatable link using the 
linker-wrapper. The only problem with this approach are the `__start/__stop` 
linker defined variables that we use to iterate the globals to be registered as 
these are tied to the section specifically. Potentially, we could move these to 
a unique section so they don't interfere with anything. So it would be 
something like this

  clang-linker-wrapper -r a.o b.o c.o -o registered.o // Contains RTL calls to 
register all globals at section 'cuda_offloading_entries_<ID>'
  llvm-strip ---remove-section .llvm.offloading registered.o // Remove embedded 
IR so no other files will link against it
  llvm-objcopy --rename-section 
cuda_offloading_entries=cuda_offloading_entries_<ID> registered.o // Change the 
registration section to something unique

Think this would work?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128914/new/

https://reviews.llvm.org/D128914

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to