[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

Matt Arsenault via cfe-commits Wed, 08 May 2024 09:21:01 -0700

================
@@ -1284,7 +1284,29 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
                                                    |  ``// 5 MFMA``
                                                    |  
``__builtin_amdgcn_sched_group_barrier(8, 5, 0)``
 
-  llvm.amdgcn.iglp_opt                             An **experimental** 
intrinsic for instruction group level parallelism. The intrinsic
+  llvm.amdgcn.sched.group.barrier.rule             It has the same behavior as 
sched.group.barrier, except the intrinsic includes a fourth argument:
+
+                                                   - RuleMask : The bitmask of 
rules which are applied to the SchedGroup.
+
+                                                   The RuleMask is handled as 
a 64 bit integer, so 64 rules are encodable with a single mask.
+
+                                                   Users can access the 
intrinsic by specifying the optional fourth argument in sched_group_barrier 
builtin
+
+                                                   |  ``// 1 VMEM read 
invoking rules 1 and 2``
+                                                   |  
``__builtin_amdgcn_sched_group_barrier(32, 1, 0, 3)``
+
+                                                   Currently available rules 
are:
+                                                   - 0x0000: No rule.
+                                                   - 0x0001: Instructions in 
the SchedGroup must not write to the same register
+                                                     that a previously 
occuring V_CNDMASK_B32_e64 reads from.
+                                                   - 0x0002: Instructions in 
the SchedGroup must not write to the same register
+                                                     that a previously 
occuring V_PERM_B32_e64 reads from.
+                                                   - 0x0004: Instructions in 
the SchedGroup must require data produced by a
+                                                     V_CNDMASK_B32_e64.
+                                                   - 0x0008: Instructions in 
the SchedGroup must require data produced by a
+                                                     V_PERM_B32_e64.
+
----------------
arsenm wrote:


These scheduling rules seem way too specific. Especially that it's pointing out 
specific instruction encodings, by the internal pseudoinstruction names 

https://github.com/llvm/llvm-project/pull/85304
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

Reply via email to