================
@@ -458,6 +458,196 @@ __DEVICE__ float __nv_y1f(float __a);
__DEVICE__ float __nv_ynf(int __a, float __b);
__DEVICE__ double __nv_yn(int __a, double __b);
+#if CUDA_VERSION >= 13030
+typedef _Float16 _Float16x2 __attribute__((ext_vector_type(2)));
----------------
YonahGoldberg wrote:
The `__half2` type is defined as `struct {unsigned short; unsigned short;}` but
all the ops in `cuda_fp16.hpp` reinterpret this to `unsigned int`, so we are
casting `unsigned int` to `<2 x half>`. I think the code generated looked fine,
I can look again.
https://github.com/llvm/llvm-project/pull/174005
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits