[llvm-branch-commits] [llvm] [AMDGPU] Update documentation for wave reduction intrinsics (PR #175132)

via llvm-branch-commits Fri, 09 Jan 2026 00:20:59 -0800

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/175132


>From 095464114d38ed793295648f1b9562f516e0cd94 Mon Sep 17 00:00:00 2001
From: Aaditya <[email protected]>
Date: Fri, 9 Jan 2026 12:05:04 +0530
Subject: [PATCH] [AMDGPU] Update documentation for wave reduction intrinsics

---
 llvm/docs/AMDGPUUsage.rst | 120 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 118 insertions(+), 2 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 3e7a5dfc504ae..92d7ba6d1c025 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1380,7 +1380,30 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
                                                    2: `DPP`.
                                                    If target does not support 
the DPP operations (e.g. gfx6/7),
                                                    reduction will be performed 
using default iterative strategy.
-                                                   Intrinsic is currently only 
implemented for i32.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.min                      Performs an arithmetic 
signed min reduction on the signed values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.fmin                     Performs an floating-point 
min reduction on the floating-point values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for float and double types.
+                                                   NAN values are 
canonicalized.
+                                                   However if there are two 
consecutive NAN values, and the second value is a SNAN,
+                                                   wave_mode IEEE=False 
propogates the SNAN, while wave_mode IEEE=True quietens it.
 
   llvm.amdgcn.wave.reduce.umax                     Performs an arithmetic 
unsigned max reduction on the unsigned values
                                                    provided by each lane in 
the wavefront.
@@ -1390,7 +1413,100 @@ The AMDGPU backend implements the following LLVM IR 
intrinsics.
                                                    2: `DPP`.
                                                    If target does not support 
the DPP operations (e.g. gfx6/7),
                                                    reduction will be performed 
using default iterative strategy.
-                                                   Intrinsic is currently only 
implemented for i32.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.max                      Performs an arithmetic 
signed max reduction on the signed values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.fmax                     Performs an floating-point 
max reduction on the floating-point values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for float and double types.
+                                                   NAN values are 
canonicalized.
+                                                   However if there are two 
consecutive NAN values, and the second value is a SNAN,
+                                                   wave_mode IEEE=False 
propogates the SNAN, while wave_mode IEEE=True quietens it.
+
+  llvm.amdgcn.wave.reduce.add                      Performs an arithmetic add 
reduction on the signed/unsigned values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for signed/unsigned i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.fadd                     Performs an floating-point 
add reduction on the floating-point values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for float and double types.
+
+  llvm.amdgcn.wave.reduce.sub                      Performs an arithmetic sub 
reduction on the signed/unsigned values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for signed/unsigned i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.fsub                     Performs an floating-point 
sub reduction on the floating-point values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for float and double types.
+
+  llvm.amdgcn.wave.reduce.and                      Performs a bitwise-and 
reduction on the values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.or                       Performs a bitwise-or 
reduction on the values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
+
+  llvm.amdgcn.wave.reduce.xor                      Performs a bitwise-xor 
reduction on the values
+                                                   provided by each lane in 
the wavefront.
+                                                   Intrinsic takes a hint for 
reduction strategy using second operand
+                                                   0: Target default 
preference,
+                                                   1: `Iterative strategy`, and
+                                                   2: `DPP`.
+                                                   If target does not support 
the DPP operations (e.g. gfx6/7),
+                                                   reduction will be performed 
using default iterative strategy.
+                                                   Intrinsic is implemented 
for i32 and i64 types.
 
   llvm.amdgcn.permlane16                           Provides direct access to 
v_permlane16_b32. Performs arbitrary gather-style
                                                    operation within a row (16 
contiguous lanes) of the second input operand.

_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Update documentation for wave reduction intrinsics (PR #175132)

Reply via email to