yaxunl added a comment. In D128914#3643270 <https://reviews.llvm.org/D128914#3643270>, @jhuber6 wrote:
>> There is only one fatbin for -fgpu-rdc mode but the fatbin unregister >> function is called multiple times in each TU. HIP runtime expects each >> fatbin is unregistered only once. The old embedding scheme introduced a weak >> symbol to track whether the fabin has been unregistered and to make sure it >> is only unregistered once. > > I see, this wrapping will only happen in RDC-mode so it's probably safe to > ignore here? When I support non-RDC mode in the new driver it will most > likely rely on the old code generation. Although it's entirely feasible to > make RDC-mode the default. There's no runtime overhead when using LTO. If you only unregister fatbin once for the whole program, then it should be safe -fgpu-rdc. I am not sure if that is the case. My experience with -fgpu-rdc is that it causes much longer linking time for large applications like PyTorch or TensroFlow, and LTO does not help. This is because the compiler has lots of inter-procedural optimization passes which take more than linear time. Due to that those apps need to be compiled as -fno-gpu-rdc. Actually most CUDA/HIP applications are using -fno-gpu-rdc. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128914/new/ https://reviews.llvm.org/D128914 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits