Hahnfeld added a comment.

IMO this goes into the right direction, we should use the fast implementation 
in libdevice. If LLVM doesn't lower these calls in the NVPTX backend, I think 
it's ok to use header wrappers as CUDA already does.

Two questions:

1. Can you explain where this is important for "correctness"? Yesterday I 
compiled a code using `sqrt` and it seems to spit out the correct results. 
Maybe that's relevant for other functions?
2. Incidentally I ran into a closely related problem: I can't `#include 
<math.h>` in translation units compiled for offloading, Clang complains about 
inline assembly for x86 (see below). Does that work for you?

  In file included from /usr/include/math.h:413:
  /usr/include/bits/mathinline.h:131:43: error: invalid input constraint 'x' in 
asm
    __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                            ^
  /usr/include/bits/mathinline.h:143:43: error: invalid input constraint 'x' in 
asm
    __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                            ^
  2 errors generated.



================
Comment at: lib/Headers/__clang_cuda_device_functions.h:65
 }
+#if defined(__cplusplus)
 __DEVICE__ void __brkpt() { asm volatile("brkpt;"); }
----------------
Why is that only valid for C++?


Repository:
  rC Clang

https://reviews.llvm.org/D47849



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to