llvmbot wrote:

<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Joseph Huber (jhuber6)

<details>
<summary>Changes</summary>

Summary:
Currently, the GPU gets its math by using wrapper headers that eagerly
replace libcalls with calls to the vendor's math library. e.g.
```
// __clang_cuda_math.h
[[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); }
```

However, we want to be able to move away from including these headers.
When these headers are not included, the lack of `errno` on the GPU
target enables these to be transformed into intrinsic calls. These
intrinsic calls will then potentially not be supported by the backend,
see https://godbolt.org/z/oKvTevaE1.

Even in the case that these functions are supported, we still want to
use regular libcalls now so that the LTO linking will replace these
calls before they reach the backend.

This patch simply changes the logic to prevent emitting intrinsic
functions for the standard math library functions. This means that `sin`
will not be an intrinsic, but `__builtin_sin` will. A better solution
long-term would be to have a pass that does custom lowering of all of
these before LTO linking if possible.


---
Full diff: https://github.com/llvm/llvm-project/pull/98209.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CGBuiltin.cpp (+6) 
- (added) clang/test/CodeGen/gpu-math-libcalls.c (+51) 


``````````diff
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 6cc0d9485720c..89c27147a2bd9 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2637,6 +2637,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
       GenerateIntrinsics =
           ConstWithoutErrnoOrExceptions && ErrnoOverridenToFalseWithOpt;
   }
+  // The GPU targets do not want math intrinsics to reach the backend.
+  // TODO: We should add a custom pass to lower these early enough for LTO.
+  if (getTarget().getTriple().isNVPTX() || getTarget().getTriple().isAMDGPU())
+    GenerateIntrinsics = !getContext().BuiltinInfo.isPredefinedLibFunction(
+        BuiltinIDIfNoAsmLabel);
+
   if (GenerateIntrinsics) {
     switch (BuiltinIDIfNoAsmLabel) {
     case Builtin::BIceil:
diff --git a/clang/test/CodeGen/gpu-math-libcalls.c 
b/clang/test/CodeGen/gpu-math-libcalls.c
new file mode 100644
index 0000000000000..436ad0384ee2d
--- /dev/null
+++ b/clang/test/CodeGen/gpu-math-libcalls.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa %s -emit-llvm -o - | FileCheck %s 
--check-prefix AMDGPU
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda %s -emit-llvm -o - | FileCheck 
%s --check-prefix NVPTX
+
+double sin(double);
+double cos(double);
+double sqrt(double);
+
+// AMDGPU-LABEL: define dso_local void @libcalls(
+// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[CALL:%.*]] = call double @sin(double noundef 
0.000000e+00) #[[ATTR3:[0-9]+]]
+// AMDGPU-NEXT:    [[CALL1:%.*]] = call double @cos(double noundef 
0.000000e+00) #[[ATTR3]]
+// AMDGPU-NEXT:    [[CALL2:%.*]] = call double @sqrt(double noundef 
0.000000e+00) #[[ATTR3]]
+// AMDGPU-NEXT:    ret void
+//
+// NVPTX-LABEL: define dso_local void @libcalls(
+// NVPTX-SAME: ) #[[ATTR0:[0-9]+]] {
+// NVPTX-NEXT:  [[ENTRY:.*:]]
+// NVPTX-NEXT:    [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) 
#[[ATTR3:[0-9]+]]
+// NVPTX-NEXT:    [[CALL1:%.*]] = call double @cos(double noundef 
0.000000e+00) #[[ATTR3]]
+// NVPTX-NEXT:    [[CALL2:%.*]] = call double @sqrt(double noundef 
0.000000e+00) #[[ATTR3]]
+// NVPTX-NEXT:    ret void
+//
+void libcalls() {
+  (void)sin(0.);
+  (void)cos(0.);
+  (void)sqrt(0.);
+}
+
+// AMDGPU-LABEL: define dso_local void @builtins(
+// AMDGPU-SAME: ) #[[ATTR0]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[TMP0:%.*]] = call double @llvm.sin.f64(double 
0.000000e+00)
+// AMDGPU-NEXT:    [[TMP1:%.*]] = call double @llvm.cos.f64(double 
0.000000e+00)
+// AMDGPU-NEXT:    [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 
0.000000e+00)
+// AMDGPU-NEXT:    ret void
+//
+// NVPTX-LABEL: define dso_local void @builtins(
+// NVPTX-SAME: ) #[[ATTR0]] {
+// NVPTX-NEXT:  [[ENTRY:.*:]]
+// NVPTX-NEXT:    [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00)
+// NVPTX-NEXT:    [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00)
+// NVPTX-NEXT:    [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 
0.000000e+00)
+// NVPTX-NEXT:    ret void
+//
+void builtins() {
+  (void)__builtin_sin(0.);
+  (void)__builtin_cos(0.);
+  (void)__builtin_sqrt(0.);
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/98209
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to