yaxunl added a comment.

In D128914#3643270 <https://reviews.llvm.org/D128914#3643270>, @jhuber6 wrote:

>> There is only one fatbin for -fgpu-rdc mode but the fatbin unregister 
>> function is called multiple times in each TU. HIP runtime expects each 
>> fatbin is unregistered only once. The old embedding scheme introduced a weak 
>> symbol to track whether the fabin has been unregistered and to make sure it 
>> is only unregistered once.
>
> I see, this wrapping will only happen in RDC-mode so it's probably safe to 
> ignore here? When I support non-RDC mode in the new driver it will most 
> likely rely on the old code generation. Although it's entirely feasible to 
> make RDC-mode the default. There's no runtime overhead when using LTO.

If you only unregister fatbin once for the whole program, then it should be 
safe -fgpu-rdc. I am not sure if that is the case.

My experience with -fgpu-rdc is that it causes much longer linking time for 
large applications like PyTorch or TensroFlow, and LTO does not help. This is 
because the compiler has lots of inter-procedural optimization passes which 
take more than linear time. Due to that those apps need to be compiled as 
-fno-gpu-rdc. Actually most CUDA/HIP applications are using -fno-gpu-rdc.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128914/new/

https://reviews.llvm.org/D128914

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to