[clang] [Clang] Support fp16 in libdevice for CUDA 13.3 (PR #174005)

Yonah Goldberg via cfe-commits Wed, 07 Jan 2026 13:51:32 -0800

================
@@ -458,6 +458,196 @@ __DEVICE__ float __nv_y1f(float __a);
 __DEVICE__ float __nv_ynf(int __a, float __b);
 __DEVICE__ double __nv_yn(int __a, double __b);
 
+#if CUDA_VERSION >= 13030
+typedef _Float16 _Float16x2 __attribute__((ext_vector_type(2)));
----------------
YonahGoldberg wrote:


The `__half2` type is defined as `struct {unsigned short; unsigned short;}` but 
all the ops in `cuda_fp16.hpp` reinterpret this to `unsigned int`, so we are 
casting `unsigned int` to `<2 x half>`. I think the code generated looked fine, 
I can look again.

https://github.com/llvm/llvm-project/pull/174005
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Support fp16 in libdevice for CUDA 13.3 (PR #174005)

Reply via email to