----- Original Message ----- > From: "Justin Lebar" <jle...@google.com> > To: "Hal Finkel" <hfin...@anl.gov> > Cc: "Clang Commits" <cfe-commits@lists.llvm.org> > Sent: Saturday, October 8, 2016 10:56:37 PM > Subject: Re: r283680 - [CUDA] Support <complex> and std::min/max on the > device. > > > > The fix is not as simple as simply changing our implementation of > > e.g. > > std::isnan to call __builtin_isnanf, because we also would want to > > fix > > ::isnanf, > > > > No, if I understand what you're saying, you specifically wouldn't. > > I understand how this is feasible on the CPU, because ::isnan is a > library function that can never be inlined. But on the GPU, these > library functions are (at the moment) always declared inline. That > seems to complicate this idea. > > Right now ::isnan(x) is going to call __nv_isnan(x), which computes > abs(x). If we pass -ffast-math, the compiler will be able to assume > that abs(x) is not nan. I guess you're saying that we would need to > special-case __nv_isnan so that -ffast-math is always off > (essentially). But, what if it gets inlined? > > It looks like libstdc++'s std::isnan calls __builtin_isnan (same for > its std::isinf), and its ::isnan is an alias for std::isnan. So > libstdc++'s isnan is going to return false with -ffast-math (or > anyway > it will do the same thing as the builtin functions, which aiui is > what > you're proposing libc++'s isnan *not* do).
This was not my first choice, but was the direction that Marshall preferred based on our conversations up to that point. I had not noticed this aspect of libstdc++'s behavior. It is indeed the case that, with libstdc++, std::isnan gets optimized away with -ffast-math, but ::isnan does not. That might be desirable, or it might just be weird given that I'd expect std::isnan and ::isnan to essentally do the same thing for POD FP types. > > > This is important for use cases where, for example, even though the > > user might want fast math, they still need to check their inputs > > for NaNs. > > Since this isn't going to work with libstdc++, and it relies on not > doing anything that the compiler might construe as "arithmetic" on > the > value, this seems pretty dicey to me. One could instead compile a > separate TU without -ffast-math and do all their validation there? > I'd have a lot more confidence in that working today, continuing to > work tomorrow, and being portable across compilers and standard > libraries. I certainly agree that I have a higher confidence in the multiple TU approach. > > I don't mean to relitigate https://reviews.llvm.org/D18639, but I am > not convinced that libc++'s isnan should have a path that returns > true > with -ffast-math, given that > > * libstdc++'s isnan will always return false with -ffast-math, > * it's at best complicated for us to make this work if you can > inline > the body of isnan (as we can on the GPU), > * it's at best complicated for users to write "correct" C++ that > calls isnan with -ffast-math, especially if they want their code to > continue to work in the future in the face of changing compilers > (-ffast-math is not specified anywhere, so who knows what it means), > and > * there's a relatively simple workaround (use a separate TU) that > sidesteps all these problems. > > I'm not saying we should go in and change libc++'s CPU implementation > of isnan to call the builtin. I'll leave that up to people who care > about CPU code. But at least on the GPU, it still makes sense to me > to fix the problem you originally identified by making > std::/::isnan/isinf always return false/true with -ffast-math. Which > I think we should be able to do with the intrinsic upgrade I > originally suggested. > > On a separate note: Can we make __libcpp_isnan and __libcpp_isinf > constexpr? This will make them implicitly host+device functions, > solving the problem on the GPU. Otherwise I may have to reimplement > these functions in a header, and that's lame. Although I am clearly > not above that. :) I think this makes sense ;) We should check with Eric or Marshall. -Hal > > On Sat, Oct 8, 2016 at 6:50 PM, Hal Finkel <hfin...@anl.gov> wrote: > > ----- Original Message ----- > >> From: "Justin Lebar" <jle...@google.com> > >> To: "Hal Finkel" <hfin...@anl.gov> > >> Cc: "Clang Commits" <cfe-commits@lists.llvm.org> > >> Sent: Saturday, October 8, 2016 6:16:12 PM > >> Subject: Re: r283680 - [CUDA] Support <complex> and std::min/max > >> on the device. > >> > >> Hal, > >> > >> On NVPTX, these functions eventually get resolved to function > >> calls > >> in > >> libdevice, e.g. __nv_isinff and __nv_isnanf. > >> > >> llvm does not do a good job understanding the body of e.g. > >> __nvvm_isnanf, because it uses nvptx-specific intrinsic functions, > >> notably @llvm.nvvm.fabs.f. These are opaque to the LLVM > >> optimizer. > >> > >> The fix is not as simple as simply changing our implementation of > >> e.g. > >> std::isnan to call __builtin_isnanf, because we also would want to > >> fix > >> ::isnanf, > > > > No, if I understand what you're saying, you specifically wouldn't. > > We had a discussion about this on the review thread(s) that led to > > r283051, and while we want to elide the checks inside the > > mathematical functions, we don't want to replace isnan itself with > > something that will get optimized away. We want to keep the > > ability for the user to explicitly check for NaNs, etc. even if we > > don't want those checks to appear inside of mathematical > > operations. This is important for use cases where, for example, > > even though the user might want fast math, they still need to > > check their inputs for NaNs. > > > > -Hal > > > >> but we can't override that implementation without some > >> major > >> surgery on the nvptx headers. > >> > >> David Majnemer and I talked about one way to fix this, namely by > >> using > >> IR intrinsic upgrades to replace the opaque nvptx intrinsics with > >> LLVM > >> intrinsics. LLVM would then be able to understand these > >> intrinsics > >> and optimize them. We would reap benefits not just for > >> std::isnan, > >> but also e.g. constant-folding calls like std::abs that also > >> eventually end up in libnvvm. > >> > >> I did the first half of this work, by adding lowerings for the > >> various > >> LLVM intrinsics to the NVPTX backend [1]. But David is now busy > >> with > >> other things and hasn't been able to help with the second half, > >> namely > >> using IR upgrades to replace the nvptx target-specific intrinsics > >> with > >> generalized LLVM intrinsics. Perhaps this is something you'd be > >> able > >> to help with? > >> > >> In any case, using builtins here without fixing std::isnan and > >> ::isnan > >> feels to me to be the wrong solution. It seems to me that we > >> should > >> be able to rely on std::isnan and friends being fast, and if > >> they're > >> not, we should fix that. Using builtins here would be "cheating" > >> to > >> make our implementation faster than user code. > >> > >> I'll note, separately, that on x86, clang does not seem to > >> constant-fold std::isinf or __builtin_isinff to false with > >> -ffast-math > >> -ffinite-math-only. GCC can do it. Clang gets std::isnan. > >> https://godbolt.org/g/vZB55a > >> > >> By the way, the changes you made to libc++ unfortunately break > >> this > >> patch with libc++, because e.g. __libcpp_isnan is not a device > >> function. I'll have to think about how to fix that -- I may send > >> you > >> a patch. > >> > >> Regards, > >> -Justin > >> > >> [1] https://reviews.llvm.org/D24300 > >> > >> On Sat, Oct 8, 2016 at 3:36 PM, Hal Finkel <hfin...@anl.gov> > >> wrote: > >> > Hi Justin, > >> > > >> > This is neat! > >> > > >> > I see a bunch of uses of std::isinf, etc. here. It tends to be > >> > important that, when using -ffast-math (or -ffinite-math-only) > >> > these checks get optimized away. Can you please check that they > >> > do? If not, you might mirror what I've done in r283051 for > >> > libc++, > >> > which is similar to what libstdc++ ends up doing, so that we use > >> > __builtin_isnan/isinf/isfinite. > >> > > >> > Thanks again, > >> > Hal > >> > > >> > ----- Original Message ----- > >> >> From: "Justin Lebar via cfe-commits" > >> >> <cfe-commits@lists.llvm.org> > >> >> To: cfe-commits@lists.llvm.org > >> >> Sent: Saturday, October 8, 2016 5:16:13 PM > >> >> Subject: r283680 - [CUDA] Support <complex> and std::min/max on > >> >> the device. > >> >> > >> >> Author: jlebar > >> >> Date: Sat Oct 8 17:16:12 2016 > >> >> New Revision: 283680 > >> >> > >> >> URL: http://llvm.org/viewvc/llvm-project?rev=283680&view=rev > >> >> Log: > >> >> [CUDA] Support <complex> and std::min/max on the device. > >> >> > >> >> Summary: > >> >> We do this by wrapping <complex> and <algorithm>. > >> >> > >> >> Tests are in the test-suite. > >> >> > >> >> Reviewers: tra > >> >> > >> >> Subscribers: jhen, beanz, cfe-commits, mgorny > >> >> > >> >> Differential Revision: https://reviews.llvm.org/D24979 > >> >> > >> >> Added: > >> >> cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h > >> >> cfe/trunk/lib/Headers/cuda_wrappers/ > >> >> cfe/trunk/lib/Headers/cuda_wrappers/algorithm > >> >> cfe/trunk/lib/Headers/cuda_wrappers/complex > >> >> Modified: > >> >> cfe/trunk/lib/Driver/ToolChains.cpp > >> >> cfe/trunk/lib/Headers/CMakeLists.txt > >> >> cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h > >> >> > >> >> Modified: cfe/trunk/lib/Driver/ToolChains.cpp > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChains.cpp?rev=283680&r1=283679&r2=283680&view=diff > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Driver/ToolChains.cpp (original) > >> >> +++ cfe/trunk/lib/Driver/ToolChains.cpp Sat Oct 8 17:16:12 > >> >> 2016 > >> >> @@ -4694,6 +4694,15 @@ void Linux::AddClangCXXStdlibIncludeArgs > >> >> > >> >> void Linux::AddCudaIncludeArgs(const ArgList &DriverArgs, > >> >> ArgStringList &CC1Args) const { > >> >> + if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) { > >> >> + // Add cuda_wrappers/* to our system include path. This > >> >> lets > >> >> us > >> >> wrap > >> >> + // standard library headers. > >> >> + SmallString<128> P(getDriver().ResourceDir); > >> >> + llvm::sys::path::append(P, "include"); > >> >> + llvm::sys::path::append(P, "cuda_wrappers"); > >> >> + addSystemInclude(DriverArgs, CC1Args, P); > >> >> + } > >> >> + > >> >> if (DriverArgs.hasArg(options::OPT_nocudainc)) > >> >> return; > >> >> > >> >> > >> >> Modified: cfe/trunk/lib/Headers/CMakeLists.txt > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/CMakeLists.txt?rev=283680&r1=283679&r2=283680&view=diff > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Headers/CMakeLists.txt (original) > >> >> +++ cfe/trunk/lib/Headers/CMakeLists.txt Sat Oct 8 17:16:12 > >> >> 2016 > >> >> @@ -24,10 +24,13 @@ set(files > >> >> bmiintrin.h > >> >> __clang_cuda_builtin_vars.h > >> >> __clang_cuda_cmath.h > >> >> + __clang_cuda_complex_builtins.h > >> >> __clang_cuda_intrinsics.h > >> >> __clang_cuda_math_forward_declares.h > >> >> __clang_cuda_runtime_wrapper.h > >> >> cpuid.h > >> >> + cuda_wrappers/algorithm > >> >> + cuda_wrappers/complex > >> >> clflushoptintrin.h > >> >> emmintrin.h > >> >> f16cintrin.h > >> >> > >> >> Added: cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h?rev=283680&view=auto > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h > >> >> (added) > >> >> +++ cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h Sat > >> >> Oct > >> >> 8 > >> >> 17:16:12 2016 > >> >> @@ -0,0 +1,203 @@ > >> >> +/*===-- __clang_cuda_complex_builtins - CUDA impls of runtime > >> >> complex fns ---=== > >> >> + * > >> >> + * Permission is hereby granted, free of charge, to any person > >> >> obtaining a copy > >> >> + * of this software and associated documentation files (the > >> >> "Software"), to deal > >> >> + * in the Software without restriction, including without > >> >> limitation > >> >> the rights > >> >> + * to use, copy, modify, merge, publish, distribute, > >> >> sublicense, > >> >> and/or sell > >> >> + * copies of the Software, and to permit persons to whom the > >> >> Software is > >> >> + * furnished to do so, subject to the following conditions: > >> >> + * > >> >> + * The above copyright notice and this permission notice shall > >> >> be > >> >> included in > >> >> + * all copies or substantial portions of the Software. > >> >> + * > >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > >> >> KIND, > >> >> EXPRESS OR > >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > >> >> MERCHANTABILITY, > >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO > >> >> EVENT > >> >> SHALL THE > >> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, > >> >> DAMAGES > >> >> OR > >> >> OTHER > >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR > >> >> OTHERWISE, > >> >> ARISING FROM, > >> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > >> >> OTHER > >> >> DEALINGS IN > >> >> + * THE SOFTWARE. > >> >> + * > >> >> + > >> >> *===-----------------------------------------------------------------------=== > >> >> + */ > >> >> + > >> >> +#ifndef __CLANG_CUDA_COMPLEX_BUILTINS > >> >> +#define __CLANG_CUDA_COMPLEX_BUILTINS > >> >> + > >> >> +// This header defines __muldc3, __mulsc3, __divdc3, and > >> >> __divsc3. > >> >> These are > >> >> +// libgcc functions that clang assumes are available when > >> >> compiling > >> >> c99 complex > >> >> +// operations. (These implementations come from libc++, and > >> >> have > >> >> been modified > >> >> +// to work with CUDA.) > >> >> + > >> >> +extern "C" inline __device__ double _Complex __muldc3(double > >> >> __a, > >> >> double __b, > >> >> + double > >> >> __c, > >> >> double __d) { > >> >> + double __ac = __a * __c; > >> >> + double __bd = __b * __d; > >> >> + double __ad = __a * __d; > >> >> + double __bc = __b * __c; > >> >> + double _Complex z; > >> >> + __real__(z) = __ac - __bd; > >> >> + __imag__(z) = __ad + __bc; > >> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { > >> >> + int __recalc = 0; > >> >> + if (std::isinf(__a) || std::isinf(__b)) { > >> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); > >> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); > >> >> + if (std::isnan(__c)) > >> >> + __c = std::copysign(0, __c); > >> >> + if (std::isnan(__d)) > >> >> + __d = std::copysign(0, __d); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (std::isinf(__c) || std::isinf(__d)) { > >> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); > >> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); > >> >> + if (std::isnan(__a)) > >> >> + __a = std::copysign(0, __a); > >> >> + if (std::isnan(__b)) > >> >> + __b = std::copysign(0, __b); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) || > >> >> + std::isinf(__ad) || std::isinf(__bc))) { > >> >> + if (std::isnan(__a)) > >> >> + __a = std::copysign(0, __a); > >> >> + if (std::isnan(__b)) > >> >> + __b = std::copysign(0, __b); > >> >> + if (std::isnan(__c)) > >> >> + __c = std::copysign(0, __c); > >> >> + if (std::isnan(__d)) > >> >> + __d = std::copysign(0, __d); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (__recalc) { > >> >> + // Can't use std::numeric_limits<double>::infinity() -- > >> >> that > >> >> doesn't have > >> >> + // a device overload (and isn't constexpr before C++11, > >> >> naturally). > >> >> + __real__(z) = __builtin_huge_valf() * (__a * __c - __b * > >> >> __d); > >> >> + __imag__(z) = __builtin_huge_valf() * (__a * __d + __b * > >> >> __c); > >> >> + } > >> >> + } > >> >> + return z; > >> >> +} > >> >> + > >> >> +extern "C" inline __device__ float _Complex __mulsc3(float > >> >> __a, > >> >> float __b, > >> >> + float > >> >> __c, > >> >> float __d) { > >> >> + float __ac = __a * __c; > >> >> + float __bd = __b * __d; > >> >> + float __ad = __a * __d; > >> >> + float __bc = __b * __c; > >> >> + float _Complex z; > >> >> + __real__(z) = __ac - __bd; > >> >> + __imag__(z) = __ad + __bc; > >> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { > >> >> + int __recalc = 0; > >> >> + if (std::isinf(__a) || std::isinf(__b)) { > >> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); > >> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); > >> >> + if (std::isnan(__c)) > >> >> + __c = std::copysign(0, __c); > >> >> + if (std::isnan(__d)) > >> >> + __d = std::copysign(0, __d); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (std::isinf(__c) || std::isinf(__d)) { > >> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); > >> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); > >> >> + if (std::isnan(__a)) > >> >> + __a = std::copysign(0, __a); > >> >> + if (std::isnan(__b)) > >> >> + __b = std::copysign(0, __b); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) || > >> >> + std::isinf(__ad) || std::isinf(__bc))) { > >> >> + if (std::isnan(__a)) > >> >> + __a = std::copysign(0, __a); > >> >> + if (std::isnan(__b)) > >> >> + __b = std::copysign(0, __b); > >> >> + if (std::isnan(__c)) > >> >> + __c = std::copysign(0, __c); > >> >> + if (std::isnan(__d)) > >> >> + __d = std::copysign(0, __d); > >> >> + __recalc = 1; > >> >> + } > >> >> + if (__recalc) { > >> >> + __real__(z) = __builtin_huge_valf() * (__a * __c - __b * > >> >> __d); > >> >> + __imag__(z) = __builtin_huge_valf() * (__a * __d + __b * > >> >> __c); > >> >> + } > >> >> + } > >> >> + return z; > >> >> +} > >> >> + > >> >> +extern "C" inline __device__ double _Complex __divdc3(double > >> >> __a, > >> >> double __b, > >> >> + double > >> >> __c, > >> >> double __d) { > >> >> + int __ilogbw = 0; > >> >> + // Can't use std::max, because that's defined in > >> >> <algorithm>, > >> >> and > >> >> we don't > >> >> + // want to pull that in for every compile. The CUDA headers > >> >> define > >> >> + // ::max(float, float) and ::max(double, double), which is > >> >> sufficient for us. > >> >> + double __logbw = std::logb(max(std::abs(__c), > >> >> std::abs(__d))); > >> >> + if (std::isfinite(__logbw)) { > >> >> + __ilogbw = (int)__logbw; > >> >> + __c = std::scalbn(__c, -__ilogbw); > >> >> + __d = std::scalbn(__d, -__ilogbw); > >> >> + } > >> >> + double __denom = __c * __c + __d * __d; > >> >> + double _Complex z; > >> >> + __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, > >> >> -__ilogbw); > >> >> + __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, > >> >> -__ilogbw); > >> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { > >> >> + if ((__denom == 0.0) && (!std::isnan(__a) || > >> >> !std::isnan(__b))) > >> >> { > >> >> + __real__(z) = std::copysign(__builtin_huge_valf(), __c) > >> >> * > >> >> __a; > >> >> + __imag__(z) = std::copysign(__builtin_huge_valf(), __c) > >> >> * > >> >> __b; > >> >> + } else if ((std::isinf(__a) || std::isinf(__b)) && > >> >> std::isfinite(__c) && > >> >> + std::isfinite(__d)) { > >> >> + __a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a); > >> >> + __b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b); > >> >> + __real__(z) = __builtin_huge_valf() * (__a * __c + __b * > >> >> __d); > >> >> + __imag__(z) = __builtin_huge_valf() * (__b * __c - __a * > >> >> __d); > >> >> + } else if (std::isinf(__logbw) && __logbw > 0.0 && > >> >> std::isfinite(__a) && > >> >> + std::isfinite(__b)) { > >> >> + __c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c); > >> >> + __d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d); > >> >> + __real__(z) = 0.0 * (__a * __c + __b * __d); > >> >> + __imag__(z) = 0.0 * (__b * __c - __a * __d); > >> >> + } > >> >> + } > >> >> + return z; > >> >> +} > >> >> + > >> >> +extern "C" inline __device__ float _Complex __divsc3(float > >> >> __a, > >> >> float __b, > >> >> + float > >> >> __c, > >> >> float __d) { > >> >> + int __ilogbw = 0; > >> >> + float __logbw = std::logb(max(std::abs(__c), > >> >> std::abs(__d))); > >> >> + if (std::isfinite(__logbw)) { > >> >> + __ilogbw = (int)__logbw; > >> >> + __c = std::scalbn(__c, -__ilogbw); > >> >> + __d = std::scalbn(__d, -__ilogbw); > >> >> + } > >> >> + float __denom = __c * __c + __d * __d; > >> >> + float _Complex z; > >> >> + __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, > >> >> -__ilogbw); > >> >> + __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, > >> >> -__ilogbw); > >> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { > >> >> + if ((__denom == 0) && (!std::isnan(__a) || > >> >> !std::isnan(__b))) > >> >> { > >> >> + __real__(z) = std::copysign(__builtin_huge_valf(), __c) > >> >> * > >> >> __a; > >> >> + __imag__(z) = std::copysign(__builtin_huge_valf(), __c) > >> >> * > >> >> __b; > >> >> + } else if ((std::isinf(__a) || std::isinf(__b)) && > >> >> std::isfinite(__c) && > >> >> + std::isfinite(__d)) { > >> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); > >> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); > >> >> + __real__(z) = __builtin_huge_valf() * (__a * __c + __b * > >> >> __d); > >> >> + __imag__(z) = __builtin_huge_valf() * (__b * __c - __a * > >> >> __d); > >> >> + } else if (std::isinf(__logbw) && __logbw > 0 && > >> >> std::isfinite(__a) && > >> >> + std::isfinite(__b)) { > >> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); > >> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); > >> >> + __real__(z) = 0 * (__a * __c + __b * __d); > >> >> + __imag__(z) = 0 * (__b * __c - __a * __d); > >> >> + } > >> >> + } > >> >> + return z; > >> >> +} > >> >> + > >> >> +#endif // __CLANG_CUDA_COMPLEX_BUILTINS > >> >> > >> >> Modified: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h?rev=283680&r1=283679&r2=283680&view=diff > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h > >> >> (original) > >> >> +++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h Sat > >> >> Oct > >> >> 8 > >> >> 17:16:12 2016 > >> >> @@ -312,6 +312,7 @@ __device__ inline __cuda_builtin_gridDim > >> >> > >> >> #include <__clang_cuda_cmath.h> > >> >> #include <__clang_cuda_intrinsics.h> > >> >> +#include <__clang_cuda_complex_builtins.h> > >> >> > >> >> // curand_mtgp32_kernel helpfully redeclares blockDim and > >> >> threadIdx > >> >> in host > >> >> // mode, giving them their "proper" types of dim3 and uint3. > >> >> This > >> >> is > >> >> > >> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/algorithm > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/algorithm?rev=283680&view=auto > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Headers/cuda_wrappers/algorithm (added) > >> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm Sat Oct 8 > >> >> 17:16:12 > >> >> 2016 > >> >> @@ -0,0 +1,96 @@ > >> >> +/*===---- complex - CUDA wrapper for <algorithm> > >> >> ----------------------------=== > >> >> + * > >> >> + * Permission is hereby granted, free of charge, to any person > >> >> obtaining a copy > >> >> + * of this software and associated documentation files (the > >> >> "Software"), to deal > >> >> + * in the Software without restriction, including without > >> >> limitation > >> >> the rights > >> >> + * to use, copy, modify, merge, publish, distribute, > >> >> sublicense, > >> >> and/or sell > >> >> + * copies of the Software, and to permit persons to whom the > >> >> Software is > >> >> + * furnished to do so, subject to the following conditions: > >> >> + * > >> >> + * The above copyright notice and this permission notice shall > >> >> be > >> >> included in > >> >> + * all copies or substantial portions of the Software. > >> >> + * > >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > >> >> KIND, > >> >> EXPRESS OR > >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > >> >> MERCHANTABILITY, > >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO > >> >> EVENT > >> >> SHALL THE > >> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, > >> >> DAMAGES > >> >> OR > >> >> OTHER > >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR > >> >> OTHERWISE, > >> >> ARISING FROM, > >> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > >> >> OTHER > >> >> DEALINGS IN > >> >> + * THE SOFTWARE. > >> >> + * > >> >> + > >> >> *===-----------------------------------------------------------------------=== > >> >> + */ > >> >> + > >> >> +#ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM > >> >> +#define __CLANG_CUDA_WRAPPERS_ALGORITHM > >> >> + > >> >> +// This header defines __device__ overloads of std::min/max, > >> >> but > >> >> only if we're > >> >> +// <= C++11. In C++14, these functions are constexpr, and so > >> >> are > >> >> implicitly > >> >> +// __host__ __device__. > >> >> +// > >> >> +// We don't support the initializer_list overloads because > >> >> +// initializer_list::begin() and end() are not __host__ > >> >> __device__ > >> >> functions. > >> >> +// > >> >> +// When compiling in C++14 mode, we could force std::min/max > >> >> to > >> >> have > >> >> different > >> >> +// implementations for host and device, by declaring the > >> >> device > >> >> overloads > >> >> +// before the constexpr overloads appear. We choose not to do > >> >> this > >> >> because > >> >> + > >> >> +// a) why write our own implementation when we can use one > >> >> from > >> >> the > >> >> standard > >> >> +// library? and > >> >> +// b) libstdc++ is evil and declares min/max inside a header > >> >> that > >> >> is included > >> >> +// *before* we include <algorithm>. So we'd have to > >> >> unconditionally > >> >> +// declare our __device__ overloads of min/max, but that > >> >> would > >> >> pollute > >> >> +// things for people who choose not to include > >> >> <algorithm>. > >> >> + > >> >> +#include_next <algorithm> > >> >> + > >> >> +#if __cplusplus <= 201103L > >> >> + > >> >> +// We need to define these overloads in exactly the namespace > >> >> our > >> >> standard > >> >> +// library uses (including the right inline namespace), > >> >> otherwise > >> >> they won't be > >> >> +// picked up by other functions in the standard library (e.g. > >> >> functions in > >> >> +// <complex>). Thus the ugliness below. > >> >> +#ifdef _LIBCPP_BEGIN_NAMESPACE_STD > >> >> +_LIBCPP_BEGIN_NAMESPACE_STD > >> >> +#else > >> >> +namespace std { > >> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> +_GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> +#endif > >> >> +#endif > >> >> + > >> >> +template <class __T, class __Cmp> > >> >> +inline __device__ const __T & > >> >> +max(const __T &__a, const __T &__b, __Cmp __cmp) { > >> >> + return __cmp(__a, __b) ? __b : __a; > >> >> +} > >> >> + > >> >> +template <class __T> > >> >> +inline __device__ const __T & > >> >> +max(const __T &__a, const __T &__b) { > >> >> + return __a < __b ? __b : __a; > >> >> +} > >> >> + > >> >> +template <class __T, class __Cmp> > >> >> +inline __device__ const __T & > >> >> +min(const __T &__a, const __T &__b, __Cmp __cmp) { > >> >> + return __cmp(__b, __a) ? __b : __a; > >> >> +} > >> >> + > >> >> +template <class __T> > >> >> +inline __device__ const __T & > >> >> +min(const __T &__a, const __T &__b) { > >> >> + return __a < __b ? __b : __a; > >> >> +} > >> >> + > >> >> +#ifdef _LIBCPP_END_NAMESPACE_STD > >> >> +_LIBCPP_END_NAMESPACE_STD > >> >> +#else > >> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> >> +_GLIBCXX_END_NAMESPACE_VERSION > >> >> +#endif > >> >> +} // namespace std > >> >> +#endif > >> >> + > >> >> +#endif // __cplusplus <= 201103L > >> >> +#endif // __CLANG_CUDA_WRAPPERS_ALGORITHM > >> >> > >> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/complex > >> >> URL: > >> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/complex?rev=283680&view=auto > >> >> ============================================================================== > >> >> --- cfe/trunk/lib/Headers/cuda_wrappers/complex (added) > >> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/complex Sat Oct 8 > >> >> 17:16:12 > >> >> 2016 > >> >> @@ -0,0 +1,79 @@ > >> >> +/*===---- complex - CUDA wrapper for <complex> > >> >> ------------------------------=== > >> >> + * > >> >> + * Permission is hereby granted, free of charge, to any person > >> >> obtaining a copy > >> >> + * of this software and associated documentation files (the > >> >> "Software"), to deal > >> >> + * in the Software without restriction, including without > >> >> limitation > >> >> the rights > >> >> + * to use, copy, modify, merge, publish, distribute, > >> >> sublicense, > >> >> and/or sell > >> >> + * copies of the Software, and to permit persons to whom the > >> >> Software is > >> >> + * furnished to do so, subject to the following conditions: > >> >> + * > >> >> + * The above copyright notice and this permission notice shall > >> >> be > >> >> included in > >> >> + * all copies or substantial portions of the Software. > >> >> + * > >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > >> >> KIND, > >> >> EXPRESS OR > >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > >> >> MERCHANTABILITY, > >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO > >> >> EVENT > >> >> SHALL THE > >> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, > >> >> DAMAGES > >> >> OR > >> >> OTHER > >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR > >> >> OTHERWISE, > >> >> ARISING FROM, > >> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > >> >> OTHER > >> >> DEALINGS IN > >> >> + * THE SOFTWARE. > >> >> + * > >> >> + > >> >> *===-----------------------------------------------------------------------=== > >> >> + */ > >> >> + > >> >> +#pragma once > >> >> + > >> >> +// Wrapper around <complex> that forces its functions to be > >> >> __host__ > >> >> +// __device__. > >> >> + > >> >> +// First, include host-only headers we think are likely to be > >> >> included by > >> >> +// <complex>, so that the pragma below only applies to > >> >> <complex> > >> >> itself. > >> >> +#if __cplusplus >= 201103L > >> >> +#include <type_traits> > >> >> +#endif > >> >> +#include <stdexcept> > >> >> +#include <cmath> > >> >> +#include <sstream> > >> >> + > >> >> +// Next, include our <algorithm> wrapper, to ensure that > >> >> device > >> >> overloads of > >> >> +// std::min/max are available. > >> >> +#include <algorithm> > >> >> + > >> >> +#pragma clang force_cuda_host_device begin > >> >> + > >> >> +// When compiling for device, ask libstdc++ to use its own > >> >> implements of > >> >> +// complex functions, rather than calling builtins (which > >> >> resolve > >> >> to > >> >> library > >> >> +// functions that don't exist when compiling CUDA device > >> >> code). > >> >> +// > >> >> +// This is a little dicey, because it causes libstdc++ to > >> >> define > >> >> a > >> >> different > >> >> +// set of overloads on host and device. > >> >> +// > >> >> +// // Present only when compiling for host. > >> >> +// __host__ __device__ void complex<float> sin(const > >> >> complex<float>& x) { > >> >> +// return __builtin_csinf(x); > >> >> +// } > >> >> +// > >> >> +// // Present when compiling for host and for device. > >> >> +// template <typename T> > >> >> +// void __host__ __device__ complex<T> sin(const complex<T>& > >> >> x) > >> >> { > >> >> +// return complex<T>(sin(x.real()) * cosh(x.imag()), > >> >> +// cos(x.real()), sinh(x.imag())); > >> >> +// } > >> >> +// > >> >> +// This is safe because when compiling for device, all > >> >> function > >> >> calls in > >> >> +// __host__ code to sin() will still resolve to *something*, > >> >> even > >> >> if > >> >> they don't > >> >> +// resolve to the same function as they resolve to when > >> >> compiling > >> >> for host. We > >> >> +// don't care that they don't resolve to the right function > >> >> because > >> >> we won't > >> >> +// codegen this host code when compiling for device. > >> >> + > >> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX") > >> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX_TR1") > >> >> +#define _GLIBCXX_USE_C99_COMPLEX 0 > >> >> +#define _GLIBCXX_USE_C99_COMPLEX_TR1 0 > >> >> + > >> >> +#include_next <complex> > >> >> + > >> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX_TR1") > >> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX") > >> >> + > >> >> +#pragma clang force_cuda_host_device end > >> >> > >> >> > >> >> _______________________________________________ > >> >> cfe-commits mailing list > >> >> cfe-commits@lists.llvm.org > >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > >> >> > >> > > >> > -- > >> > Hal Finkel > >> > Lead, Compiler Technology and Programming Languages > >> > Leadership Computing Facility > >> > Argonne National Laboratory > >> > > > > -- > > Hal Finkel > > Lead, Compiler Technology and Programming Languages > > Leadership Computing Facility > > Argonne National Laboratory > -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits