https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/175132
>From 8adbac06492521901dcb7a30b31d1290c174596f Mon Sep 17 00:00:00 2001 From: Aaditya <[email protected]> Date: Fri, 9 Jan 2026 12:05:04 +0530 Subject: [PATCH] [AMDGPU] Update documentation for wave reduction intrinsics --- llvm/docs/AMDGPUUsage.rst | 74 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 4 deletions(-) diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 3e7a5dfc504ae..03f9823352c5c 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -1378,9 +1378,19 @@ The AMDGPU backend implements the following LLVM IR intrinsics. 0: Target default preference, 1: `Iterative strategy`, and 2: `DPP`. - If target does not support the DPP operations (e.g. gfx6/7), + If the target does not support the DPP operations (e.g. gfx6/7), reduction will be performed using default iterative strategy. - Intrinsic is currently only implemented for i32. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.min Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a signed min + reduction on signed integers. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.fmin Similar to `llvm.amdgcn.wave.reduce.umin`, but performs a floating point min + reduction on floating point values. + Intrinsic is implemented for float and double types. + NAN values are not canonnicalized. + The ordering behaviour of SNANs is non-deterministic. llvm.amdgcn.wave.reduce.umax Performs an arithmetic unsigned max reduction on the unsigned values provided by each lane in the wavefront. @@ -1388,9 +1398,65 @@ The AMDGPU backend implements the following LLVM IR intrinsics. 0: Target default preference, 1: `Iterative strategy`, and 2: `DPP`. - If target does not support the DPP operations (e.g. gfx6/7), + If the target does not support the DPP operations (e.g. gfx6/7), reduction will be performed using default iterative strategy. - Intrinsic is currently only implemented for i32. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.max Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a signed max + reduction on signed integers. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.fmax Similar to `llvm.amdgcn.wave.reduce.umax`, but performs a floating point max + reduction on floating point values. + Intrinsic is implemented for float and double types. + NAN values are not canonnicalized. + The ordering behaviour of SNANs is non-deterministic. + + llvm.amdgcn.wave.reduce.add Performs an arithmetic add reduction on the signed/unsigned values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for signed/unsigned i32 and i64 types. + + llvm.amdgcn.wave.reduce.fadd Similar to `llvm.amdgcn.wave.reduce.add`, but performs a floating point add + reduction on floating point values. + Intrinsic is implemented for float and double types. + + llvm.amdgcn.wave.reduce.sub Performs an arithmetic sub reduction on the signed/unsigned values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for signed/unsigned i32 and i64 types. + + llvm.amdgcn.wave.reduce.fsub Similar to `llvm.amdgcn.wave.reduce.sub`, but performs a floating point sub + reduction on floating point values. + Intrinsic is implemented for float and double types. + + llvm.amdgcn.wave.reduce.and Performs a bitwise-and reduction on the values + provided by each lane in the wavefront. + Intrinsic takes a hint for reduction strategy using second operand + 0: Target default preference, + 1: `Iterative strategy`, and + 2: `DPP`. + If the target does not support the DPP operations (e.g. gfx6/7), + reduction will be performed using default iterative strategy. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.or Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-or + reduction on the values provided by each wavefront. + Intrinsic is implemented for i32 and i64 types. + + llvm.amdgcn.wave.reduce.xor Similar to `llvm.amdgcn.wave.reduce.and`, but performs a bitwise-xor + reduction on the values provided by each wavefront. + Intrinsic is implemented for i32 and i64 types. llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand. _______________________________________________ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
