tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: sanjoy.google, bixia, yaxunl. Herald added a project: clang. tra requested review of this revision.
Normally math functions are forwarded to __nv_* counterparts provided by CUDA's libdevice bitcode. However, __nv_rint*() functions there have a bug -- they use round() which rounds *up* instead of rounding towards the nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected 2.0. The broken bitcode is not actually used by NVCC itself, which has both a work-around in CUDA headers and, in recent versions, uses correct implementations in NVCC's built-ins. This patch implements equivalent workaround and directs rint/rintf to __builtin_rint/rintf that produce correct results. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D85236 Files: clang/lib/Headers/__clang_cuda_math.h Index: clang/lib/Headers/__clang_cuda_math.h =================================================================== --- clang/lib/Headers/__clang_cuda_math.h +++ clang/lib/Headers/__clang_cuda_math.h @@ -249,8 +249,9 @@ __DEVICE__ float rhypotf(float __a, float __b) { return __nv_rhypotf(__a, __b); } -__DEVICE__ double rint(double __a) { return __nv_rint(__a); } -__DEVICE__ float rintf(float __a) { return __nv_rintf(__a); } +// __nv_rint* in libdevice is buggy and produces incorrect results. +__DEVICE__ double rint(double __a) { return __builtin_rint(__a); } +__DEVICE__ float rintf(float __a) { return __builtin_rintf(__a); } __DEVICE__ double rnorm(int __a, const double *__b) { return __nv_rnorm(__a, __b); }
Index: clang/lib/Headers/__clang_cuda_math.h =================================================================== --- clang/lib/Headers/__clang_cuda_math.h +++ clang/lib/Headers/__clang_cuda_math.h @@ -249,8 +249,9 @@ __DEVICE__ float rhypotf(float __a, float __b) { return __nv_rhypotf(__a, __b); } -__DEVICE__ double rint(double __a) { return __nv_rint(__a); } -__DEVICE__ float rintf(float __a) { return __nv_rintf(__a); } +// __nv_rint* in libdevice is buggy and produces incorrect results. +__DEVICE__ double rint(double __a) { return __builtin_rint(__a); } +__DEVICE__ float rintf(float __a) { return __builtin_rintf(__a); } __DEVICE__ double rnorm(int __a, const double *__b) { return __nv_rnorm(__a, __b); }
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits