jhuber6 added a comment. In D128914#3643802 <https://reviews.llvm.org/D128914#3643802>, @tra wrote:
> For what it's worth, NCCL <https://developer.nvidia.com/nccl> is the only > nontrivial library that needs RDC compilation that I'm aware of. > It's also self-contained for RDC purposes we only need to use RDC on the > library TUs and do not need to propagate it to all CUDA TUs in the build. > > I believe such 'constrained' RDC compilation will likely be the reasonable > practical trade-off. It may not become the default compilation mode, but we > should be able to control where the "fully linked GPU executable" boundary is > and it's not necessarily going to match the fully-linked host executable. Theoretically we could do this with a relocatable link using the linker-wrapper. The only problem with this approach are the `__start/__stop` linker defined variables that we use to iterate the globals to be registered as these are tied to the section specifically. Potentially, we could move these to a unique section so they don't interfere with anything. So it would be something like this clang-linker-wrapper -r a.o b.o c.o -o registered.o // Contains RTL calls to register all globals at section 'cuda_offloading_entries_<ID>' llvm-strip ---remove-section .llvm.offloading registered.o // Remove embedded IR so no other files will link against it llvm-objcopy --rename-section cuda_offloading_entries=cuda_offloading_entries_<ID> registered.o // Change the registration section to something unique Think this would work? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128914/new/ https://reviews.llvm.org/D128914 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits