================
@@ -2496,6 +2496,26 @@ def int_amdgcn_flat_atomic_fmax_num   : 
AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 
+class AMDGPUGlobalLoadTr<LLVMType data_ty> :
+  Intrinsic<
+    [data_ty],
+    [global_ptr_ty],
+    [IntrReadMem, IntrWillReturn, IntrConvergent, NoCapture<ArgIndex<0>>, 
IntrNoCallback, IntrNoFree],
+    "",
+    [SDNPMemOperand]
+  >;
+
+// Wave32
+// <2 x i32>  @llvm.amdgcn.global.load.tr.v2i32(ptr addrspace(1)) -> 
global_load_tr_b64
+// <8 x i16>  @llvm.amdgcn.global.load.tr.v8i16(ptr addrspace(1)) -> 
global_load_tr_b128
+// <8 x half> @llvm.amdgcn.global.load.tr.v8f16(ptr addrspace(1)) -> 
global_load_tr_b128
----------------
changpeng wrote:

global_load_tr_b128 transposes to vector of b16. Do we really need to enumerate 
every possible types (i16, f16)? In that case, we may also need to consider 
bf16. 
  

https://github.com/llvm/llvm-project/pull/77772
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to