llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Joseph Huber (jhuber6) <details> <summary>Changes</summary> Summary: Currently, the GPU gets its math by using wrapper headers that eagerly replace libcalls with calls to the vendor's math library. e.g. ``` // __clang_cuda_math.h [[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); } ``` However, we want to be able to move away from including these headers. When these headers are not included, the lack of `errno` on the GPU target enables these to be transformed into intrinsic calls. These intrinsic calls will then potentially not be supported by the backend, see https://godbolt.org/z/oKvTevaE1. Even in the case that these functions are supported, we still want to use regular libcalls now so that the LTO linking will replace these calls before they reach the backend. This patch simply changes the logic to prevent emitting intrinsic functions for the standard math library functions. This means that `sin` will not be an intrinsic, but `__builtin_sin` will. A better solution long-term would be to have a pass that does custom lowering of all of these before LTO linking if possible. --- Full diff: https://github.com/llvm/llvm-project/pull/98209.diff 2 Files Affected: - (modified) clang/lib/CodeGen/CGBuiltin.cpp (+6) - (added) clang/test/CodeGen/gpu-math-libcalls.c (+51) ``````````diff diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 6cc0d9485720c..89c27147a2bd9 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -2637,6 +2637,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, GenerateIntrinsics = ConstWithoutErrnoOrExceptions && ErrnoOverridenToFalseWithOpt; } + // The GPU targets do not want math intrinsics to reach the backend. + // TODO: We should add a custom pass to lower these early enough for LTO. + if (getTarget().getTriple().isNVPTX() || getTarget().getTriple().isAMDGPU()) + GenerateIntrinsics = !getContext().BuiltinInfo.isPredefinedLibFunction( + BuiltinIDIfNoAsmLabel); + if (GenerateIntrinsics) { switch (BuiltinIDIfNoAsmLabel) { case Builtin::BIceil: diff --git a/clang/test/CodeGen/gpu-math-libcalls.c b/clang/test/CodeGen/gpu-math-libcalls.c new file mode 100644 index 0000000000000..436ad0384ee2d --- /dev/null +++ b/clang/test/CodeGen/gpu-math-libcalls.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa %s -emit-llvm -o - | FileCheck %s --check-prefix AMDGPU +// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda %s -emit-llvm -o - | FileCheck %s --check-prefix NVPTX + +double sin(double); +double cos(double); +double sqrt(double); + +// AMDGPU-LABEL: define dso_local void @libcalls( +// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] { +// AMDGPU-NEXT: [[ENTRY:.*:]] +// AMDGPU-NEXT: [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]] +// AMDGPU-NEXT: [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]] +// AMDGPU-NEXT: [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]] +// AMDGPU-NEXT: ret void +// +// NVPTX-LABEL: define dso_local void @libcalls( +// NVPTX-SAME: ) #[[ATTR0:[0-9]+]] { +// NVPTX-NEXT: [[ENTRY:.*:]] +// NVPTX-NEXT: [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]] +// NVPTX-NEXT: [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]] +// NVPTX-NEXT: [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]] +// NVPTX-NEXT: ret void +// +void libcalls() { + (void)sin(0.); + (void)cos(0.); + (void)sqrt(0.); +} + +// AMDGPU-LABEL: define dso_local void @builtins( +// AMDGPU-SAME: ) #[[ATTR0]] { +// AMDGPU-NEXT: [[ENTRY:.*:]] +// AMDGPU-NEXT: [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00) +// AMDGPU-NEXT: [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00) +// AMDGPU-NEXT: [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00) +// AMDGPU-NEXT: ret void +// +// NVPTX-LABEL: define dso_local void @builtins( +// NVPTX-SAME: ) #[[ATTR0]] { +// NVPTX-NEXT: [[ENTRY:.*:]] +// NVPTX-NEXT: [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00) +// NVPTX-NEXT: [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00) +// NVPTX-NEXT: [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00) +// NVPTX-NEXT: ret void +// +void builtins() { + (void)__builtin_sin(0.); + (void)__builtin_cos(0.); + (void)__builtin_sqrt(0.); +} `````````` </details> https://github.com/llvm/llvm-project/pull/98209 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits