gtbercea added a comment. I just stumbled upon a very interesting situation.
I noticed that, for OpenMP, the use of device math functions happens as I expected for -O0. For -O1 or higher math functions such as "sqrt" resolve to llvm builtins/intrinsics: call double @llvm.sqrt.f64(double %1) instead of the nvvm variant. The surprising part (at least to me) is that the same llvm intrinsic is used when I use Clang to compile CUDA kernel code calling the "sqrt" function. I would have expected that the NVVM variant would be called for CUDA code. Interestingly, for the "pow" function the expected device version of the function i.e.: @__internal_accurate_pow(double %14, double %4) is used for both CUDA and OpenMP NVPTX targets (with this patch applied of course). Is it ok for CUDA kernels to call llvm intrinsics instead of the device specific math library functions? If it's ok for CUDA can this be ok for OpenMP NVPTX too? If not we probably need to fix it for both toolchains. Repository: rC Clang https://reviews.llvm.org/D47849 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits