[PATCH] D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA.

Artem Belevich via Phabricator via cfe-commits Tue, 04 Aug 2020 12:06:08 -0700

tra created this revision.
tra added a reviewer: jlebar.
Herald added subscribers: sanjoy.google, bixia, yaxunl.
Herald added a project: clang.
tra requested review of this revision.


Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*() functions there have a bug -- they use
round() which rounds *up* instead of rounding towards the nearest integer, so we
end up with rint(2.5f) producing 3.0 instead of expected 2.0. The broken bitcode
is not actually used by NVCC itself, which has both a work-around in CUDA
headers and, in recent versions, uses correct implementations in NVCC's 
built-ins.

This patch implements equivalent workaround and directs rint/rintf to
__builtin_rint/rintf that produce correct results.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D85236

Files:
  clang/lib/Headers/__clang_cuda_math.h


Index: clang/lib/Headers/__clang_cuda_math.h
===================================================================
--- clang/lib/Headers/__clang_cuda_math.h
+++ clang/lib/Headers/__clang_cuda_math.h
@@ -249,8 +249,9 @@
 __DEVICE__ float rhypotf(float __a, float __b) {
   return __nv_rhypotf(__a, __b);
 }
-__DEVICE__ double rint(double __a) { return __nv_rint(__a); }
-__DEVICE__ float rintf(float __a) { return __nv_rintf(__a); }
+// __nv_rint* in libdevice is buggy and produces incorrect results.
+__DEVICE__ double rint(double __a) { return __builtin_rint(__a); }
+__DEVICE__ float rintf(float __a) { return __builtin_rintf(__a); }
 __DEVICE__ double rnorm(int __a, const double *__b) {
   return __nv_rnorm(__a, __b);
 }

Index: clang/lib/Headers/__clang_cuda_math.h
===================================================================
--- clang/lib/Headers/__clang_cuda_math.h
+++ clang/lib/Headers/__clang_cuda_math.h
@@ -249,8 +249,9 @@
 __DEVICE__ float rhypotf(float __a, float __b) {
   return __nv_rhypotf(__a, __b);
 }
-__DEVICE__ double rint(double __a) { return __nv_rint(__a); }
-__DEVICE__ float rintf(float __a) { return __nv_rintf(__a); }
+// __nv_rint* in libdevice is buggy and produces incorrect results.
+__DEVICE__ double rint(double __a) { return __builtin_rint(__a); }
+__DEVICE__ float rintf(float __a) { return __builtin_rintf(__a); }
 __DEVICE__ double rnorm(int __a, const double *__b) {
   return __nv_rnorm(__a, __b);
 }

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA.

Reply via email to