[clang] [mlir] [llvm] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-15 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache closed 
https://github.com/llvm/llvm-project/pull/74471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-12 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From a6e0f1170cc0a9a3c6541d16edd12f2fafbe0da0 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/8] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/include/llvm/Support/AMDGPUAddrSpace.h   |  3 +
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 27 +++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 9 files changed, 153 insertions(+), 39 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..ff45efac7e848 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat 

[llvm] [clang] [mlir] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-11 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From a6e0f1170cc0a9a3c6541d16edd12f2fafbe0da0 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/7] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/include/llvm/Support/AMDGPUAddrSpace.h   |  3 +
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 27 +++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 9 files changed, 153 insertions(+), 39 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffeb..ff45efac7e8486 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat 

[mlir] [llvm] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-11 Thread Jessica Del via cfe-commits


@@ -864,6 +865,17 @@ supported for the ``amdgcn`` target.
   (bits `127:96`). The specific interpretation of these fields varies by the
   target architecture and is detailed in the ISA descriptions.
 
+**Buffer Strided Pointer**
+  The buffer index pointer is an experimental address space. It is supposed to
+  model a 128-bit buffer descriptor and a 32-bit offset, like the **Buffer Fat
+  Pointer**. Additionally, it contains an index into the descriptor, which
+  allows the direct addressing of structured elements.
+
+  The buffer descriptor must be *raw*:

OutOfCache wrote:

I changed the wording to avoid confusion.

https://github.com/llvm/llvm-project/pull/74471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-11 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From a6e0f1170cc0a9a3c6541d16edd12f2fafbe0da0 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/6] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/include/llvm/Support/AMDGPUAddrSpace.h   |  3 +
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 27 +++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 9 files changed, 153 insertions(+), 39 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffeb..ff45efac7e8486 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat 

[mlir] [clang] [llvm] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-08 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From 94ed734c0d8864a08e3b77600dda811040270bd9 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/5] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 32 +
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 8 files changed, 154 insertions(+), 40 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffeb..ff45efac7e8486 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat Pointer (experimental) 7   *TODO*
+  

[clang] [llvm] [mlir] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-07 Thread Jessica Del via cfe-commits

OutOfCache wrote:

> How do you intend to rewrite these operations down to the underlying 
> instructions?
> 
> That is, what's your planned equivalent to https://reviews.llvm.org/D158463 ?

Thank you for the link to the code review, I was not aware of your changes 
before. Up until now, we intended to use the address space in llpc exclusively 
and lower the pointers to buffer load instructions in `PatchBufferOp`. If we 
also need to lower these pointers like you did for the fat buffers, maybe we 
can extend your new pass to also handle these strided buffer pointers?

https://github.com/llvm/llvm-project/pull/74471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [llvm] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-06 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From 94ed734c0d8864a08e3b77600dda811040270bd9 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/4] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 32 +
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 8 files changed, 154 insertions(+), 40 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..ff45efac7e848 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat Pointer (experimental) 7   *TODO*
+

[llvm] [mlir] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-06 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From 94ed734c0d8864a08e3b77600dda811040270bd9 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/3] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 32 +
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 8 files changed, 154 insertions(+), 40 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..ff45efac7e848 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat Pointer (experimental) 7   *TODO*
+

[mlir] [llvm] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-05 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/74471

>From 94ed734c0d8864a08e3b77600dda811040270bd9 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 5 Dec 2023 13:45:58 +0100
Subject: [PATCH 1/2] [AMDGPU] - Add address space for strided buffers

This is an experimental address space for strided buffers.
These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride,
i.e., they point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which
contain a 128-bit descriptor, a 32-bit offset and a 32-bit
index. Essentially, they are fat buffer pointers with
an additional 32-bit index.
---
 llvm/docs/AMDGPUUsage.rst | 48 -
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 32 +
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |  7 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  2 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |  3 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 13 +++-
 .../CodeGen/AMDGPU/amdgpu-alias-analysis.ll   | 70 +++
 .../AMDGPU/vectorize-buffer-fat-pointer.ll| 19 -
 8 files changed, 154 insertions(+), 40 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..ff45efac7e848 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
  :name: amdgpu-address-spaces-table
 
- = === === 
 === 
- ..
 64-Bit Process Address Space
- - --- --- 
 
- Address Space NameLLVM IR Address HSA Segment Hardware
 Address NULL Value
-   Space NumberNameName
 Size
- = === === 
 === 
- Generic   0   flatflat
 64  0x
- Global1   global  global  
 64  0x
- Region2   N/A GDS 
 32  *not implemented for AMDHSA*
- Local 3   group   LDS 
 32  0x
- Constant  4   constant*same as 
global* 64  0x
- Private   5   private scratch 
 32  0x
- Constant 32-bit   6   *TODO*  
 0x
- Buffer Fat Pointer (experimental) 7   *TODO*
- Buffer Resource (experimental)8   *TODO*
- Streamout Registers   128 N/A GS_REGS
- = === === 
 === 
+ = === === 
 === 
+ ..
 64-Bit Process Address Space
+ - --- --- 
 
+ Address Space NameLLVM IR Address HSA Segment 
Hardware Address NULL Value
+   Space NumberNameName
 Size
+ = === === 
 === 
+ Generic   0   flatflat
 64  0x
+ Global1   global  global  
 64  0x
+ Region2   N/A GDS 
 32  *not implemented for AMDHSA*
+ Local 3   group   LDS 
 32  0x
+ Constant  4   constant*same 
as global* 64  0x
+ Private   5   private scratch 
 32  0x
+ Constant 32-bit   6   *TODO*  
 0x
+ Buffer Fat Pointer (experimental) 7   *TODO*
+

[compiler-rt] [libcxx] [llvm] [clang] [clang-tools-extra] [flang] [libc] [AMDGPU] - Add constant folding to s_wqm intrinsic (PR #72382)

2023-11-21 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache closed 
https://github.com/llvm/llvm-project/pull/72382
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [compiler-rt] [clang] [flang] [clang-tools-extra] [libcxx] [llvm] [AMDGPU] - Add constant folding to s_wqm intrinsic (PR #72382)

2023-11-21 Thread Jessica Del via cfe-commits

OutOfCache wrote:

> LGTM w/ a squash.
> 
> BTW, I'm not a fan of merge commits in PRs. I find it makes it more confusing 
> to review.

Understandable. I just tried using the website UI instead of rebasing for the 
minor merge conflict with the other PRs. Will avoid them in the future.

https://github.com/llvm/llvm-project/pull/72382
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang] [libc] [clang-tools-extra] [flang] [llvm] [libcxx] [AMDGPU] - Add constant folding to s_wqm intrinsic (PR #72382)

2023-11-20 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/72382

>From 1864216324ccc83dd0e77287f00c62e51b07822c Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 12:58:24 +0100
Subject: [PATCH 1/5] [AMDGPU] - Add constant folding to s_wqm intrinsic

Fold any constant input to the s_wqm intrinsic.
---
 llvm/lib/Analysis/ConstantFolding.cpp   | 16 
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll | 13 +
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 966a65ac26b8017..f3f0d079747e13e 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1533,6 +1533,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, 
const Function *F) {
   case Intrinsic::amdgcn_perm:
   case Intrinsic::amdgcn_wave_reduce_umin:
   case Intrinsic::amdgcn_wave_reduce_umax:
+  case Intrinsic::amdgcn_s_wqm:
   case Intrinsic::arm_mve_vctp8:
   case Intrinsic::arm_mve_vctp16:
   case Intrinsic::arm_mve_vctp32:
@@ -2422,6 +2423,21 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
   return ConstantFP::get(Ty->getContext(), Val);
 }
+
+case Intrinsic::amdgcn_s_wqm: {
+  uint64_t Val = Op->getZExtValue();
+  uint64_t WQM = 0;
+  uint64_t Quad = 0xF;
+  for (unsigned i = 0; i < Op->getBitWidth() / 4;
+   ++i, Val >>= 4, Quad <<= 4) {
+if (!(Val & 0xF))
+  continue;
+
+WQM |= Quad;
+  }
+  return ConstantInt::get(Ty, WQM);
+}
+
 default:
   return nullptr;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll
index 6676dac19ba797f..e44043ffacc07d1 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll
@@ -9,10 +9,9 @@ define i32 @test_s_wqm_constant_i32() {
 ; GFX11-LABEL: test_s_wqm_constant_i32:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_wqm_b32 s0, 0x85fe3a92
-; GFX11-NEXT:v_mov_b32_e32 v0, s0
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xff00ff0f
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
-  %br = call i32 @llvm.amdgcn.s.wqm.i32(i32 u0x85FE3A92)
+  %br = call i32 @llvm.amdgcn.s.wqm.i32(i32 u0x85003A02)
   ret i32 %br
 }
 
@@ -48,12 +47,10 @@ define i64 @test_s_wqm_constant_i64() {
 ; GFX11-LABEL: test_s_wqm_constant_i64:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_mov_b32 s0, 0x85fe3a92
-; GFX11-NEXT:s_mov_b32 s1, 0x3a9285fe
-; GFX11-NEXT:s_wqm_b64 s[0:1], s[0:1]
-; GFX11-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xff00
+; GFX11-NEXT:v_mov_b32_e32 v1, 0x0fff
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
-  %br = call i64 @llvm.amdgcn.s.wqm.i64(i64 u0x3A9285FE85FE3A92)
+  %br = call i64 @llvm.amdgcn.s.wqm.i64(i64 u0x12480FDBAC00753E)
   ret i64 %br
 }
 

>From 86da1c3a9370e671d263ee1a0243ed4e466d9f75 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 15:50:06 +0100
Subject: [PATCH 2/5] fixup! [AMDGPU] - Add constant folding to s_wqm intrinsic

---
 llvm/lib/Analysis/ConstantFolding.cpp | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index f3f0d079747e13e..28cbd4112573ede 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2426,16 +2426,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
 case Intrinsic::amdgcn_s_wqm: {
   uint64_t Val = Op->getZExtValue();
-  uint64_t WQM = 0;
-  uint64_t Quad = 0xF;
-  for (unsigned i = 0; i < Op->getBitWidth() / 4;
-   ++i, Val >>= 4, Quad <<= 4) {
-if (!(Val & 0xF))
-  continue;
-
-WQM |= Quad;
-  }
-  return ConstantInt::get(Ty, WQM);
+  Val |=
+  (Val & 0x) << 1 | ((Val >> 1) & 0x);
+  Val |=
+  (Val & 0x) << 2 | ((Val >> 2) & 0x);
+  return ConstantInt::get(Ty, Val);
 }
 
 default:

>From 08ee504df1ec7a76169b79d68150ae074422bf7d Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 18:17:03 +0100
Subject: [PATCH 3/5] fixup! [AMDGPU] - Add constant folding to s_wqm intrinsic

---
 llvm/lib/Analysis/ConstantFolding.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 28cbd4112573ede..d8c96b542ebbb61 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2427,9 +2427,9 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 case Intrinsic::amdgcn_s_wqm: {
   uint64_t Val = Op->getZExtValue();
   Val 

[compiler-rt] [llvm] [lld] [clang-tools-extra] [libcxx] [clang] [flang] [lldb] [libunwind] [AMDGPU] - Add constant folding for s_quadmask (PR #72381)

2023-11-17 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache closed 
https://github.com/llvm/llvm-project/pull/72381
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [libunwind] [libcxx] [compiler-rt] [llvm] [flang] [clang-tools-extra] [lldb] [clang] [AMDGPU] - Add constant folding for s_quadmask (PR #72381)

2023-11-17 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/72381

>From 00d0f99207242befc8022031ccd8faf573cbf014 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 14 Nov 2023 22:17:26 +0100
Subject: [PATCH 1/5] [AMDGPU] - Add constant folding for s_quadmask

If the input is a constant we can constant fold the `s_quadmask`
intrinsic.
---
 llvm/lib/Analysis/ConstantFolding.cpp| 14 ++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll | 12 
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 966a65ac26b8017..40b5938fcda0c2a 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1533,6 +1533,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, 
const Function *F) {
   case Intrinsic::amdgcn_perm:
   case Intrinsic::amdgcn_wave_reduce_umin:
   case Intrinsic::amdgcn_wave_reduce_umax:
+  case Intrinsic::amdgcn_s_quadmask:
   case Intrinsic::arm_mve_vctp8:
   case Intrinsic::arm_mve_vctp16:
   case Intrinsic::arm_mve_vctp32:
@@ -2422,6 +2423,19 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
   return ConstantFP::get(Ty->getContext(), Val);
 }
+
+case Intrinsic::amdgcn_s_quadmask: {
+  uint64_t Val = Op->getZExtValue();
+  uint64_t QuadMask = 0;
+  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+if (!(Val & 0xF))
+  continue;
+
+QuadMask |= (1 << i);
+  }
+  return ConstantInt::get(Ty, QuadMask);
+}
+
 default:
   return nullptr;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
index 65443a6efa789d9..0f500c0999ad9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
@@ -9,11 +9,10 @@ define i32 @test_quadmask_constant_i32() {
 ; GFX11-LABEL: test_quadmask_constant_i32:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_quadmask_b32 s0, 0x85fe3a92
-; GFX11-NEXT:v_mov_b32_e32 v0, s0
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xcb
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85FE3A92)
+  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85003092)
   ret i32 %qm
 }
 
@@ -50,13 +49,10 @@ define i64 @test_quadmask_constant_i64() {
 ; GFX11-LABEL: test_quadmask_constant_i64:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_mov_b32 s0, 0x85fe3a92
-; GFX11-NEXT:s_mov_b32 s1, 0x67de48fc
-; GFX11-NEXT:s_quadmask_b64 s[0:1], s[0:1]
-; GFX11-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
+; GFX11-NEXT:v_dual_mov_b32 v0, 0xe3e6 :: v_dual_mov_b32 v1, 0
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67DE48FC85FE3A92)
+  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67D000FC85F00A90)
   ret i64 %qm
 }
 

>From 144c4dc164ec137e518cfd647c116373e7a61b8f Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 15:59:56 +0100
Subject: [PATCH 2/5] fixup! [AMDGPU] - Add constant folding for s_quadmask

---
 llvm/lib/Analysis/ConstantFolding.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 40b5938fcda0c2a..39bbb04fbcf26cc 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2427,11 +2427,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
-  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+  for (unsigned I = 0; I < Op->getBitWidth() / 4; ++I, Val >>= 4) {
 if (!(Val & 0xF))
   continue;
 
-QuadMask |= (1 << i);
+QuadMask |= (1 << I);
   }
   return ConstantInt::get(Ty, QuadMask);
 }

>From 65bb0b1164ff9b7491589cb88decb1d135504c1b Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Thu, 16 Nov 2023 11:22:52 +0100
Subject: [PATCH 3/5] fixup! Merge branch 'main' into quadmask-folding

---
 llvm/lib/Analysis/ConstantFolding.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 64d088ea7a46404..2771a3d574f7799 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2425,7 +2425,6 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
   return ConstantFP::get(Ty->getContext(), Val);
 }
 
-
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
@@ -2436,6 +2435,7 @@ static 

[libunwind] [clang-tools-extra] [clang] [llvm] [lldb] [libcxx] [compiler-rt] [flang] [lld] [AMDGPU] - Add constant folding for s_quadmask (PR #72381)

2023-11-17 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/72381

>From 00d0f99207242befc8022031ccd8faf573cbf014 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 14 Nov 2023 22:17:26 +0100
Subject: [PATCH 1/4] [AMDGPU] - Add constant folding for s_quadmask

If the input is a constant we can constant fold the `s_quadmask`
intrinsic.
---
 llvm/lib/Analysis/ConstantFolding.cpp| 14 ++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll | 12 
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 966a65ac26b8017..40b5938fcda0c2a 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1533,6 +1533,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, 
const Function *F) {
   case Intrinsic::amdgcn_perm:
   case Intrinsic::amdgcn_wave_reduce_umin:
   case Intrinsic::amdgcn_wave_reduce_umax:
+  case Intrinsic::amdgcn_s_quadmask:
   case Intrinsic::arm_mve_vctp8:
   case Intrinsic::arm_mve_vctp16:
   case Intrinsic::arm_mve_vctp32:
@@ -2422,6 +2423,19 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
   return ConstantFP::get(Ty->getContext(), Val);
 }
+
+case Intrinsic::amdgcn_s_quadmask: {
+  uint64_t Val = Op->getZExtValue();
+  uint64_t QuadMask = 0;
+  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+if (!(Val & 0xF))
+  continue;
+
+QuadMask |= (1 << i);
+  }
+  return ConstantInt::get(Ty, QuadMask);
+}
+
 default:
   return nullptr;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
index 65443a6efa789d9..0f500c0999ad9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
@@ -9,11 +9,10 @@ define i32 @test_quadmask_constant_i32() {
 ; GFX11-LABEL: test_quadmask_constant_i32:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_quadmask_b32 s0, 0x85fe3a92
-; GFX11-NEXT:v_mov_b32_e32 v0, s0
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xcb
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85FE3A92)
+  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85003092)
   ret i32 %qm
 }
 
@@ -50,13 +49,10 @@ define i64 @test_quadmask_constant_i64() {
 ; GFX11-LABEL: test_quadmask_constant_i64:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_mov_b32 s0, 0x85fe3a92
-; GFX11-NEXT:s_mov_b32 s1, 0x67de48fc
-; GFX11-NEXT:s_quadmask_b64 s[0:1], s[0:1]
-; GFX11-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
+; GFX11-NEXT:v_dual_mov_b32 v0, 0xe3e6 :: v_dual_mov_b32 v1, 0
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67DE48FC85FE3A92)
+  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67D000FC85F00A90)
   ret i64 %qm
 }
 

>From 144c4dc164ec137e518cfd647c116373e7a61b8f Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 15:59:56 +0100
Subject: [PATCH 2/4] fixup! [AMDGPU] - Add constant folding for s_quadmask

---
 llvm/lib/Analysis/ConstantFolding.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 40b5938fcda0c2a..39bbb04fbcf26cc 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2427,11 +2427,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
-  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+  for (unsigned I = 0; I < Op->getBitWidth() / 4; ++I, Val >>= 4) {
 if (!(Val & 0xF))
   continue;
 
-QuadMask |= (1 << i);
+QuadMask |= (1 << I);
   }
   return ConstantInt::get(Ty, QuadMask);
 }

>From 65bb0b1164ff9b7491589cb88decb1d135504c1b Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Thu, 16 Nov 2023 11:22:52 +0100
Subject: [PATCH 3/4] fixup! Merge branch 'main' into quadmask-folding

---
 llvm/lib/Analysis/ConstantFolding.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 64d088ea7a46404..2771a3d574f7799 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2425,7 +2425,6 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
   return ConstantFP::get(Ty->getContext(), Val);
 }
 
-
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
@@ -2436,6 +2435,7 @@ static 

[clang-tools-extra] [libunwind] [libcxx] [llvm] [compiler-rt] [lld] [lldb] [clang] [flang] [AMDGPU] - Add constant folding for s_quadmask (PR #72381)

2023-11-16 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/72381

>From 00d0f99207242befc8022031ccd8faf573cbf014 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 14 Nov 2023 22:17:26 +0100
Subject: [PATCH 1/3] [AMDGPU] - Add constant folding for s_quadmask

If the input is a constant we can constant fold the `s_quadmask`
intrinsic.
---
 llvm/lib/Analysis/ConstantFolding.cpp| 14 ++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll | 12 
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 966a65ac26b8017..40b5938fcda0c2a 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1533,6 +1533,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, 
const Function *F) {
   case Intrinsic::amdgcn_perm:
   case Intrinsic::amdgcn_wave_reduce_umin:
   case Intrinsic::amdgcn_wave_reduce_umax:
+  case Intrinsic::amdgcn_s_quadmask:
   case Intrinsic::arm_mve_vctp8:
   case Intrinsic::arm_mve_vctp16:
   case Intrinsic::arm_mve_vctp32:
@@ -2422,6 +2423,19 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
   return ConstantFP::get(Ty->getContext(), Val);
 }
+
+case Intrinsic::amdgcn_s_quadmask: {
+  uint64_t Val = Op->getZExtValue();
+  uint64_t QuadMask = 0;
+  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+if (!(Val & 0xF))
+  continue;
+
+QuadMask |= (1 << i);
+  }
+  return ConstantInt::get(Ty, QuadMask);
+}
+
 default:
   return nullptr;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
index 65443a6efa789d9..0f500c0999ad9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
@@ -9,11 +9,10 @@ define i32 @test_quadmask_constant_i32() {
 ; GFX11-LABEL: test_quadmask_constant_i32:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_quadmask_b32 s0, 0x85fe3a92
-; GFX11-NEXT:v_mov_b32_e32 v0, s0
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xcb
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85FE3A92)
+  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85003092)
   ret i32 %qm
 }
 
@@ -50,13 +49,10 @@ define i64 @test_quadmask_constant_i64() {
 ; GFX11-LABEL: test_quadmask_constant_i64:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_mov_b32 s0, 0x85fe3a92
-; GFX11-NEXT:s_mov_b32 s1, 0x67de48fc
-; GFX11-NEXT:s_quadmask_b64 s[0:1], s[0:1]
-; GFX11-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
+; GFX11-NEXT:v_dual_mov_b32 v0, 0xe3e6 :: v_dual_mov_b32 v1, 0
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67DE48FC85FE3A92)
+  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67D000FC85F00A90)
   ret i64 %qm
 }
 

>From 144c4dc164ec137e518cfd647c116373e7a61b8f Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 15:59:56 +0100
Subject: [PATCH 2/3] fixup! [AMDGPU] - Add constant folding for s_quadmask

---
 llvm/lib/Analysis/ConstantFolding.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 40b5938fcda0c2a..39bbb04fbcf26cc 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2427,11 +2427,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
-  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+  for (unsigned I = 0; I < Op->getBitWidth() / 4; ++I, Val >>= 4) {
 if (!(Val & 0xF))
   continue;
 
-QuadMask |= (1 << i);
+QuadMask |= (1 << I);
   }
   return ConstantInt::get(Ty, QuadMask);
 }

>From 65bb0b1164ff9b7491589cb88decb1d135504c1b Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Thu, 16 Nov 2023 11:22:52 +0100
Subject: [PATCH 3/3] fixup! Merge branch 'main' into quadmask-folding

---
 llvm/lib/Analysis/ConstantFolding.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 64d088ea7a46404..2771a3d574f7799 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2425,7 +2425,6 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
   return ConstantFP::get(Ty->getContext(), Val);
 }
 
-
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
@@ -2436,6 +2435,7 @@ static 

[libunwind] [lld] [lldb] [clang] [clang-tools-extra] [compiler-rt] [llvm] [flang] [libcxx] [AMDGPU] - Add constant folding for s_quadmask (PR #72381)

2023-11-16 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/72381

>From 00d0f99207242befc8022031ccd8faf573cbf014 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Tue, 14 Nov 2023 22:17:26 +0100
Subject: [PATCH 1/2] [AMDGPU] - Add constant folding for s_quadmask

If the input is a constant we can constant fold the `s_quadmask`
intrinsic.
---
 llvm/lib/Analysis/ConstantFolding.cpp| 14 ++
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll | 12 
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 966a65ac26b8017..40b5938fcda0c2a 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1533,6 +1533,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, 
const Function *F) {
   case Intrinsic::amdgcn_perm:
   case Intrinsic::amdgcn_wave_reduce_umin:
   case Intrinsic::amdgcn_wave_reduce_umax:
+  case Intrinsic::amdgcn_s_quadmask:
   case Intrinsic::arm_mve_vctp8:
   case Intrinsic::arm_mve_vctp16:
   case Intrinsic::arm_mve_vctp32:
@@ -2422,6 +2423,19 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 
   return ConstantFP::get(Ty->getContext(), Val);
 }
+
+case Intrinsic::amdgcn_s_quadmask: {
+  uint64_t Val = Op->getZExtValue();
+  uint64_t QuadMask = 0;
+  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+if (!(Val & 0xF))
+  continue;
+
+QuadMask |= (1 << i);
+  }
+  return ConstantInt::get(Ty, QuadMask);
+}
+
 default:
   return nullptr;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
index 65443a6efa789d9..0f500c0999ad9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.quadmask.ll
@@ -9,11 +9,10 @@ define i32 @test_quadmask_constant_i32() {
 ; GFX11-LABEL: test_quadmask_constant_i32:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_quadmask_b32 s0, 0x85fe3a92
-; GFX11-NEXT:v_mov_b32_e32 v0, s0
+; GFX11-NEXT:v_mov_b32_e32 v0, 0xcb
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85FE3A92)
+  %qm = call i32 @llvm.amdgcn.s.quadmask.i32(i32 u0x85003092)
   ret i32 %qm
 }
 
@@ -50,13 +49,10 @@ define i64 @test_quadmask_constant_i64() {
 ; GFX11-LABEL: test_quadmask_constant_i64:
 ; GFX11:   ; %bb.0: ; %entry
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:s_mov_b32 s0, 0x85fe3a92
-; GFX11-NEXT:s_mov_b32 s1, 0x67de48fc
-; GFX11-NEXT:s_quadmask_b64 s[0:1], s[0:1]
-; GFX11-NEXT:v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
+; GFX11-NEXT:v_dual_mov_b32 v0, 0xe3e6 :: v_dual_mov_b32 v1, 0
 ; GFX11-NEXT:s_setpc_b64 s[30:31]
 entry:
-  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67DE48FC85FE3A92)
+  %qm = call i64 @llvm.amdgcn.s.quadmask.i64(i64 u0x67D000FC85F00A90)
   ret i64 %qm
 }
 

>From 144c4dc164ec137e518cfd647c116373e7a61b8f Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Wed, 15 Nov 2023 15:59:56 +0100
Subject: [PATCH 2/2] fixup! [AMDGPU] - Add constant folding for s_quadmask

---
 llvm/lib/Analysis/ConstantFolding.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/ConstantFolding.cpp 
b/llvm/lib/Analysis/ConstantFolding.cpp
index 40b5938fcda0c2a..39bbb04fbcf26cc 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2427,11 +2427,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 case Intrinsic::amdgcn_s_quadmask: {
   uint64_t Val = Op->getZExtValue();
   uint64_t QuadMask = 0;
-  for (unsigned i = 0; i < Op->getBitWidth() / 4; ++i, Val >>= 4) {
+  for (unsigned I = 0; I < Op->getBitWidth() / 4; ++I, Val >>= 4) {
 if (!(Val & 0xF))
   continue;
 
-QuadMask |= (1 << i);
+QuadMask |= (1 << I);
   }
   return ConstantInt::get(Ty, QuadMask);
 }

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-11-13 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache closed 
https://github.com/llvm/llvm-project/pull/70669
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-11-10 Thread Jessica Del via cfe-commits


@@ -292,13 +292,17 @@ 
TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")

OutOfCache wrote:

I see, thanks for letting me know! I tried to do so now.

https://github.com/llvm/llvm-project/pull/70669
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-11-10 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/70669

>From 75db77fef715fa5aee10a8384fca299b7bf2b7a3 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Sun, 29 Oct 2023 21:16:52 +0100
Subject: [PATCH 1/2] [AMDGPU] - Add clang builtins for tied WMMA intrinsics

Add clang builtins for the new tied wmma intrinsics.
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

Add negative tests for gfx10, since we do not support
the wmma intrinsics before gfx11.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  4 +++
 clang/lib/CodeGen/CGBuiltin.cpp   | 14 
 .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl | 30 
 .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 30 
 6 files changed, 146 insertions(+)
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 532a91fd903e87c..a19c8bd5f219ec6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,6 +292,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, 
"V8fV16hV16hV8f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
 
@@ -299,6 +301,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, 
"V4fV16hV16hV4f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d49c44dbaace3a8..f3c989a76cbc380 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17936,9 +17936,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
 
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_f16_w32:
@@ -17976,6 +17980,16 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgForMatchingRetType = 2;
   BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16;
   break;
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied;
+  break;
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied;
+  break;
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32:
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64:
   ArgForMatchingRetType = 4;
diff --git 

[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-11-09 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/70669

>From 75db77fef715fa5aee10a8384fca299b7bf2b7a3 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Sun, 29 Oct 2023 21:16:52 +0100
Subject: [PATCH] [AMDGPU] - Add clang builtins for tied WMMA intrinsics

Add clang builtins for the new tied wmma intrinsics.
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

Add negative tests for gfx10, since we do not support
the wmma intrinsics before gfx11.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  4 +++
 clang/lib/CodeGen/CGBuiltin.cpp   | 14 
 .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl | 30 
 .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 30 
 6 files changed, 146 insertions(+)
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 532a91fd903e87c..a19c8bd5f219ec6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,6 +292,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, 
"V8fV16hV16hV8f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
 
@@ -299,6 +301,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, 
"V4fV16hV16hV4f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d49c44dbaace3a8..f3c989a76cbc380 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17936,9 +17936,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
 
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_f16_w32:
@@ -17976,6 +17980,16 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgForMatchingRetType = 2;
   BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16;
   break;
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied;
+  break;
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied;
+  break;
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32:
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64:
   ArgForMatchingRetType = 4;
diff --git 

[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-10-30 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache created 
https://github.com/llvm/llvm-project/pull/70669

Add clang builtins for the new tied wmma intrinsics. 
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

See https://github.com/llvm/llvm-project/pull/69903 for context.

>From 66a0e00cfc4f5b52aff51ba37ca53544fda9ec99 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Sun, 29 Oct 2023 21:16:52 +0100
Subject: [PATCH] [AMDGPU] - Add clang builtins for tied WMMA intrinsics

Add clang builtins for the new tied wmma intrinsics.
These variations tie the destination
accumulator matrix to the input
accumulator matrix.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  4 +++
 clang/lib/CodeGen/CGBuiltin.cpp   | 14 +
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl | 30 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 30 +++
 4 files changed, 78 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 532a91fd903e87c..a19c8bd5f219ec6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,6 +292,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, 
"V8fV16hV16hV8f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
 
@@ -299,6 +301,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, 
"V4fV16hV16hV4f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dce5ee5888c458e..d162fdfbbfd8921 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17917,9 +17917,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
 
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_f16_w32:
@@ -17957,6 +17961,16 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgForMatchingRetType = 2;
   BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16;
   break;
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied;
+  break;
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied;
+  break;
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32:
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64:
   ArgForMatchingRetType = 4;
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl
index 07e4f71f6dbd32a..4f13c75e5e81f98 100644
---