gtbercea added a comment.

I just stumbled upon a very interesting situation.

I noticed that, for OpenMP, the use of device math functions happens as I 
expected for -O0. For -O1 or higher math functions such as "sqrt" resolve to 
llvm builtins/intrinsics:

  call double @llvm.sqrt.f64(double %1)

instead of the nvvm variant.

The surprising part (at least to me) is that the same llvm intrinsic is used 
when I use Clang to compile CUDA kernel code calling the "sqrt" function. I 
would have expected that the NVVM variant would be called for CUDA code.

Interestingly, for the "pow" function the expected device version of the 
function i.e.:

  @__internal_accurate_pow(double %14, double %4)

is used for both CUDA and OpenMP NVPTX targets (with this patch applied of 
course).

Is it ok for CUDA kernels to call llvm intrinsics instead of the device 
specific math library functions?
If it's ok for CUDA can this be ok for OpenMP NVPTX too?
If not we probably need to fix it for both toolchains.


Repository:
  rC Clang

https://reviews.llvm.org/D47849



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to