[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/81033 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
Artem-B wrote: > Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` > builtin I added requires `sm_70` at the clang level. Either means I'd need to > go back to inline assembly or remove that requirement at least from clang so > it's a backend failure. The question is -- who's going to provide a fallback implementation for the nanosleepbuiltin for the older GPUs. I do not think it's LLVM's job, so constraining the builtin is appropriate. However, nothing stops you from providing your own implementation in libc using inline asm. Something along these lines: ``` __device__ void my_nanosleep(int N) { if (__nvvm_reflect(SM_70)) { asm volatile("nanosleep") } else { while(N--) { volatile asm("something unoptimizable") } } } ``` https://github.com/llvm/llvm-project/pull/81033 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
jhuber6 wrote: Okay, `__nvvm_reflect` doesn't work fully here because the `nanosleep` builtin I added requires `sm_70` at the clang level. Either means I'd need to go back to inline assembly or remove that requirement at least from clang so it's a backend failure. https://github.com/llvm/llvm-project/pull/81033 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
jhuber6 wrote: > > This patch, which simply makes it legal on all architectures but do nothing > > is it's older than sm_70. > > I do not think this is the right thing to do. "do nothing" is not what one > would expect from a `nanosleep`. Thanks, I made this a draft because I figured it wasn't the correct thing to do but wanted to pose the question. > Let's unpack your problem a bit. > > __nvvm_reflect() is probably closest to what you would need. However, IIUIC, > if you use it to provide nanosleep-based variant and an alternative for the > older GPUs, the `nanosleep` variant code will still hang off the dead branch > of if(__nvvm_reflect()) and if it's not eliminated by DCE (which it would not > if optimizations are off), the resulting PTX will be invalid for the older > GPUs. > > In other words, pushing nanosleep implementation into an intrinsic makes > things compile everywhere at the expense of doing a wrong thing on the older > GPUs. I do not think it's a good trade-off. > > Perhaps a better approach would be to incorporate dead branch elimination > onto NVVMReflect pass itself. We do know that it is the explicit intent of > `__nvvm_reflect()`. If NVVMReflect explicitly guarantees that the dead branch > will be gone, it should allow you to use approach `#1` w/o concerns for > whether optimizations are enabled and you should be able to provide whatever > alternative implementation you need (even if it's a null one), without > affecting correctness of LLVM itself. I think that would be a good solution if possible. Would this simply mean scheduling a global DCE pass right after the NVVM reflect pass? Since that seems to be run at `O0` that seems like the easiest solution, though it somewhat breaks `O0` semantics. Or, maybe we just have a really shallow implementation in the NVVM reflect pass that collapses the branch? https://github.com/llvm/llvm-project/pull/81033 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
Artem-B wrote: > This patch, which simply makes it legal on all architectures but do nothing > is it's older than sm_70. I do not think this is the right thing to do. "do nothing" is not what one would expect from a `nanosleep`. Let's unpack your problem a bit. __nvvm_reflect() is probably closest to what you would need. However, IIUIC, if you use it to provide nanosleep-based variant and an alternative for the older GPUs, the `nanosleep` variant code will still hang off the dead branch of if(__nvvm_reflect()) and if it's not eliminated by DCE (which it would not if optimizations are off), the resulting PTX will be invalid for the older GPUs. In other words, pushing nanosleep implementation into an intrinsic makes things compile everywhere at the expense of doing a wrong thing on the older GPUs. I do not think it's a good trade-off. Perhaps a better approach would be to incorporate dead branch elimination onto NVVMReflect pass itself. We do know that it is the explicit intent of `__nvvm_reflect()`. If NVVMReflect explicitly guarantees that the dead branch will be gone, it should allow you to use approach `#1` w/o concerns for whether optimizations are enabled and you should be able to provide whatever alternative implementation you need (even if it's a null one), without affecting correctness of LLVM itself. https://github.com/llvm/llvm-project/pull/81033 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported (PR #81033)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/81033 Summary; The LLVM C library currently uses `nanosleep` in the RPC interface and for the C library `nanosleep` function. We build the LLVM C library for every single NVPTX architecture individually currently, which is not ideal. The goal is to make the LLVM-IR target independent, unfortunately the one snag is the `nanosleep` function which will crash if used on a GPU older than sm_70. There are three possible solutions to this. 1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions. This will work as long as optimizations are on, not ideal. 2. Get rid of the use of nanosleep in `libc`. This isn't ideal as sleeping during the busy-wait loops is helpful for thread scheduling and it prevents us from providing `nanosleep` as a C library function. 3. This patch, which simply makes it legal on all architectures but do nothing is it's older than sm_70. This is a draft to question if this is an acceptable hack, as an intrinsic silently doing nothing is not always a good idea. Potentially a new intrinsic could be added instead, but there is also a desire to have intrinsics map 1-to-1 with hardware. >From 10447352c68c666c51cfba7d84a06cb23327bc8a Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 7 Feb 2024 14:03:00 -0600 Subject: [PATCH] [NVPTX][Draft] Make `__nvvm_nanosleep` a no-op if unsupported Summary; The LLVM C library currently uses `nanosleep` in the RPC interface and for the C library `nanosleep` function. We build the LLVM C library for every single NVPTX architecture individually currently, which is not ideal. The goal is to make the LLVM-IR target independent, unfortunately the one snag is the `nanosleep` function which will crash if used on a GPU older than sm_70. There are three possible solutions to this. 1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions. This will work as long as optimizations are on, not ideal. 2. Get rid of the use of nanosleep in `libc`. This isn't ideal as sleeping during the busy-wait loops is helpful for thread scheduling and it prevents us from providing `nanosleep` as a C library function. 3. This patch, which simply makes it legal on all architectures but do nothing is it's older than sm_70. This is a draft to question if this is an acceptable hack, as an intrinsic silently doing nothing is not always a good idea. Potentially a new intrinsic could be added instead, but there is also a desire to have intrinsics map 1-to-1 with hardware. --- clang/include/clang/Basic/BuiltinsNVPTX.def | 2 +- llvm/lib/Target/NVPTX/NVPTXIntrinsics.td| 9 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def index 7819e71d7fe2aa..5fd17a1f5b8552 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.def +++ b/clang/include/clang/Basic/BuiltinsNVPTX.def @@ -159,7 +159,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n") BUILTIN(__nvvm_prmt, "UiUiUiUi", "") BUILTIN(__nvvm_exit, "v", "r") -TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", AND(SM_70, PTX63)) +TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", PTX63) // Min Max diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td index 2330d7213c26dc..fd786a12c78eba 100644 --- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td +++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td @@ -646,6 +646,15 @@ def INT_NVVM_NANOSLEEP_I : NVPTXInst<(outs), (ins i32imm:$i), "nanosleep.u32 \t$ def INT_NVVM_NANOSLEEP_R : NVPTXInst<(outs), (ins Int32Regs:$i), "nanosleep.u32 \t$i;", [(int_nvvm_nanosleep Int32Regs:$i)]>, Requires<[hasPTX<63>, hasSM<70>]>; + +// Make 'nanosleep' a no-op on older architectures. +def INT_NVVM_NANOSLEEP_I_NOOP : NVPTXInst<(outs), (ins i32imm:$i), "/* no-op */", + [(int_nvvm_nanosleep imm:$i)]>, +Requires<[hasPTX<63>]>; +def INT_NVVM_NANOSLEEP_R_NOOP : NVPTXInst<(outs), (ins Int32Regs:$i), "/* no-op */", + [(int_nvvm_nanosleep Int32Regs:$i)]>, +Requires<[hasPTX<63>]>; + // // Min Max // ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits