from:"Jay Foad via cfe\-commits"

[clang] Compiler messages on HIP SDK for Windows (PR #97668)

2024-07-09 Thread Jay Foad via cfe-commits


jayfoad wrote:

> Compiler messages on HIP SDK for Windows

Please rewrite this to say what the patch does or what problem it fixes.

https://github.com/llvm/llvm-project/pull/97668
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-04 Thread Jay Foad via cfe-commits



@@ -14,13 +14,14 @@
 #define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H
 
 #include "llvm/CodeGen/MachineBasicBlock.h"
-#include "llvm/CodeGen/MachinePassManager.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/BranchProbability.h"
 
 namespace llvm {
 
-class MachineBranchProbabilityInfo {
+class MachineBranchProbabilityInfo : public ImmutablePass {

jayfoad wrote:

Why does the PR include all this unrelated stuff? Which part am I supposed to 
review? Normally I just look at the "Files changed" tab.

https://github.com/llvm/llvm-project/pull/96442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-26 Thread Jay Foad via cfe-commits



@@ -402,34 +413,30 @@ Value 
*AMDGPUAtomicOptimizerImpl::buildReduction(IRBuilder<> ,
 
   // Reduce within each pair of rows (i.e. 32 lanes).
   assert(ST->hasPermLaneX16());
-  V = B.CreateBitCast(V, IntNTy);

jayfoad wrote:

Please submit an NFC cleanup patch that just removes unnecessary bitcasting, 
before adding support for new atomic operations.

https://github.com/llvm/llvm-project/pull/96473
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Debug Info] Fix debug info ptr to ptr test (PR #95637)

2024-06-15 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/95637
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Debug Info] Fix debug info ptr to ptr test (PR #95637)

2024-06-15 Thread Jay Foad via cfe-commits


jayfoad wrote:

I'll merge to fix the build.

https://github.com/llvm/llvm-project/pull/95637
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/95373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/95373

None

>From 6d326a96d2651f8836b29ff1e3edef022f41549e Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 13 Jun 2024 09:46:48 +0100
Subject: [PATCH] [llvm-project] Fix typo "seperate"

---
 clang-tools-extra/clangd/TidyProvider.cpp | 10 
 .../include/clang/Frontend/FrontendOptions.h  |  2 +-
 .../include/clang/InstallAPI/DylibVerifier.h  |  2 +-
 clang/lib/InstallAPI/Visitor.cpp  |  2 +-
 clang/lib/Serialization/ASTWriterStmt.cpp |  2 +-
 compiler-rt/test/dfsan/custom.cpp |  2 +-
 .../Linux/ppc64/trivial-tls-pwr10.test|  2 +-
 .../FlangOmpReport/yaml_summarizer.py |  2 +-
 flang/lib/Semantics/check-omp-structure.cpp   | 10 
 flang/test/Driver/mllvm_vs_mmlir.f90  |  2 +-
 libc/src/__support/FPUtil/x86_64/FEnvImpl.h   |  2 +-
 .../stdio/printf_core/float_hex_converter.h   | 10 
 .../str_to_float_comparison_test.cpp  |  2 +-
 lld/test/wasm/data-segments.ll|  2 +-
 .../lldb/Expression/DWARFExpressionList.h |  2 +-
 lldb/include/lldb/Target/MemoryTagManager.h   |  2 +-
 .../NativeRegisterContextLinux_arm64.cpp  |  2 +-
 lldb/test/API/CMakeLists.txt  |  2 +-
 .../TestGdbRemoteMemoryTagging.py |  2 +-
 .../DW_AT_data_bit_offset-DW_OP_stack_value.s |  2 +-
 llvm/include/llvm/CodeGen/LiveRegUnits.h  |  2 +-
 llvm/include/llvm/CodeGen/MIRFormatter.h  |  2 +-
 llvm/include/llvm/MC/MCAsmInfo.h  |  2 +-
 llvm/include/llvm/Support/raw_socket_stream.h |  2 +-
 llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.h   |  2 +-
 .../CodeGen/AssignmentTrackingAnalysis.cpp|  6 ++---
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  4 ++--
 llvm/lib/FileCheck/FileCheck.cpp  |  2 +-
 llvm/lib/IR/DebugInfo.cpp |  2 +-
 llvm/lib/MC/MCPseudoProbe.cpp |  2 +-
 llvm/lib/Support/VirtualFileSystem.cpp|  2 +-
 llvm/lib/Support/raw_socket_stream.cpp|  2 +-
 llvm/lib/Target/ARM/ARMISelLowering.cpp   |  2 +-
 .../Target/RISCV/RISCVMachineFunctionInfo.h   |  2 +-
 llvm/lib/TargetParser/RISCVISAInfo.cpp|  2 +-
 llvm/lib/TextAPI/Utils.cpp|  2 +-
 llvm/lib/Transforms/IPO/Attributor.cpp|  4 ++--
 .../lib/Transforms/IPO/SampleProfileProbe.cpp |  2 +-
 .../Scalar/RewriteStatepointsForGC.cpp|  2 +-
 .../Transforms/Utils/LoopUnrollRuntime.cpp|  2 +-
 llvm/test/CodeGen/X86/AMX/amx-greedy-ra.ll|  2 +-
 llvm/test/CodeGen/X86/apx/shift-eflags.ll | 24 +--
 .../X86/merge-consecutive-stores-nt.ll|  4 ++--
 llvm/test/CodeGen/X86/shift-eflags.ll | 24 +--
 .../InstSimplify/constant-fold-fp-denormal.ll |  2 +-
 .../LoopVectorize/LoongArch/defaults.ll   |  2 +-
 .../LoopVectorize/RISCV/defaults.ll   |  2 +-
 .../split-gep-or-as-add.ll|  2 +-
 llvm/test/Verifier/alloc-size-failedparse.ll  |  2 +-
 llvm/test/tools/llvm-ar/windows-path.test |  2 +-
 .../ELF/mirror-permissions-win.test   |  2 +-
 llvm/tools/llvm-cov/CodeCoverage.cpp  |  2 +-
 llvm/tools/llvm-profgen/PerfReader.cpp|  2 +-
 llvm/unittests/Support/Path.cpp   |  4 ++--
 .../Analysis/Presburger/IntegerRelation.h |  2 +-
 .../Analysis/Presburger/PresburgerSpace.h |  2 +-
 .../mlir/Dialect/OpenMP/OpenMPInterfaces.h|  2 +-
 .../Analysis/Presburger/PresburgerSpace.cpp   |  2 +-
 .../lib/Conversion/GPUCommon/GPUOpsLowering.h |  2 +-
 .../LLVMIR/IR/BasicPtxBuilderInterface.cpp|  2 +-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  6 ++---
 .../CPU/sparse_reduce_custom_prod.mlir|  2 +-
 .../omptarget-constant-alloca-raise.mlir  |  2 +-
 openmp/tools/Modules/FindOpenMPTarget.cmake   |  2 +-
 64 files changed, 106 insertions(+), 106 deletions(-)

diff --git a/clang-tools-extra/clangd/TidyProvider.cpp 
b/clang-tools-extra/clangd/TidyProvider.cpp
index a4121df30d3df..a87238e0c0938 100644
--- a/clang-tools-extra/clangd/TidyProvider.cpp
+++ b/clang-tools-extra/clangd/TidyProvider.cpp
@@ -195,10 +195,10 @@ TidyProvider addTidyChecks(llvm::StringRef Checks,
 }
 
 TidyProvider disableUnusableChecks(llvm::ArrayRef ExtraBadChecks) 
{
-  constexpr llvm::StringLiteral Seperator(",");
+  constexpr llvm::StringLiteral Separator(",");
   static const std::string BadChecks = llvm::join_items(
-  Seperator,
-  // We want this list to start with a seperator to
+  Separator,
+  // We want this list to start with a separator to
   // simplify appending in the lambda. So including an
   // empty string here will force that.
   "",
@@ -227,7 +227,7 @@ TidyProvider 
disableUnusableChecks(llvm::ArrayRef ExtraBadChecks) {
   for (const std::string  : ExtraBadChecks) {
 if (Str.empty())
   continue;
-Size += Seperator.size();
+Size += Separator.size();
 if (LLVM_LIKELY(Str.front() !=

[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)

2024-06-06 Thread Jay Foad via cfe-commits


https://github.com/jayfoad approved this pull request.

Works for me.

https://github.com/llvm/llvm-project/pull/94639
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-06 Thread Jay Foad via cfe-commits



@@ -785,6 +785,7 @@ enum : unsigned {
   EF_AMDGPU_MACH_AMDGCN_GFX1200 = 0x048,
   EF_AMDGPU_MACH_AMDGCN_RESERVED_0X49   = 0x049,
   EF_AMDGPU_MACH_AMDGCN_GFX1151 = 0x04a,
+  EF_AMDGPU_MACH_AMDGCN_GFX1152 = 0x055,

jayfoad wrote:

This table is supposed to be in ELF number order. Can you please move the new 
entry? Consider it pre-approved.

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-06 Thread Jay Foad via cfe-commits



@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()

jayfoad wrote:

I see now that it fails (deterministically) if the NVPTX target is not being 
built.

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)

2024-06-06 Thread Jay Foad via cfe-commits



@@ -6,21 +6,21 @@
 __attribute__((global))
 void kernel(int *out) {
   int i = 0;
-  out[i++] = threadIdx.x; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
-  out[i++] = threadIdx.y; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.y()
-  out[i++] = threadIdx.z; // CHECK: call noundef i32 
@llvm.nvvm.read.ptx.sreg.tid.z()
+  out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 
@llvm.nvvm.read.ptx.sreg.tid.x()

jayfoad wrote:

@AlexMaclean I also see this problem on some internal test machines. It seems 
suspicious - is there some nondeterminism? Or is there a good reason why some 
machines would not add the range metadata here???

https://github.com/llvm/llvm-project/pull/94422
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Jay Foad via cfe-commits


jayfoad wrote:

Is there really a good use case for this? Can you use regular stores to 
addrspace(7) instead? @krzysz00

Also, do you really need a separate builtin for every legal type, or is there 
some way they can be type-overloaded?

https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-06 Thread Jay Foad via cfe-commits


https://github.com/jayfoad approved this pull request.

LGTM.

Could also update `flang/cmake/modules/AddFlangOffloadRuntime.cmake` but I 
don't really know if it's our responsibility to update Flang.

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)

2024-06-06 Thread Jay Foad via cfe-commits



@@ -1534,6 +1534,12 @@ def FeatureISAVersion11_5_1 : FeatureSet<
  FeatureVGPRSingleUseHintInsts,
  Feature1_5xVGPRs])>;
 
+def FeatureISAVersion11_5_2 : FeatureSet<

jayfoad wrote:

I don't have a good answer to this except "it's what we normally do". Other 
parts of the software stack (kernel drivers etc) need to distinguish gfx1150 
from gfx1152, and I guess they don't want to map "gfx1152" -> "gfx1150" before 
invoking the compiler. Also, it will make it easier for us to implement 
gfx1152-specific optimizations and workarounds in future if there are any.

https://github.com/llvm/llvm-project/pull/94534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits


jayfoad wrote:

There is a latent problem to do with convergence. If you add a new test case 
like this:
```diff
diff --git a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll 
b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
index 238f6ab39e83..22995083293d 100644
--- a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
+++ b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll
@@ -55,6 +55,21 @@ else:
   ret i32 %p
 }
 
+define i64 @basic_branch_i64(i64 %src, i1 %cond) #0 {
+entry:
+  %t = call token @llvm.experimental.convergence.anchor()
+  %x = add i64 %src, 1
+  br i1 %cond, label %then, label %else
+
+then:
+  %r = call i64 @llvm.amdgcn.readfirstlane.i64(i64 %x) [ 
"convergencectrl"(token %t) ]
+  br label %else
+
+else:
+  %p = phi i64 [%r, %then], [%x, %entry]
+  ret i64 %p
+}
+
 ; CHECK-LABEL: name:basic_loop
 ;   CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR
 ;   CHECK:  bb.[[#]].loop:
```
Then it will fail with:
```
*** Bad machine code: Cannot mix controlled and uncontrolled convergence in the 
same function. ***
```
This is related to #87509. Since the readlane/readfirstlane/writelane 
intrinsics are IntrConvergent, the corresponding ISD nodes should be marked 
with SDNPInGlue or SDNPOptInGlue. @ssahasra FYI

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits


https://github.com/jayfoad commented:

Does this need IR autoupgrade?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits



@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)
+  NODE_NAME_CASE(WRITELANE)

jayfoad wrote:

Add this to `SITargetLowering::isSDNodeSourceOfDivergence`

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits



@@ -5496,6 +5496,9 @@ const char* 
AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(LDS)
   NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
   NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
+  NODE_NAME_CASE(READLANE)
+  NODE_NAME_CASE(READFIRSTLANE)

jayfoad wrote:

Add these to `AMDGPUTargetLowering::isSDNodeAlwaysUniform`

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Jay Foad via cfe-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-23 Thread Jay Foad via cfe-commits


jayfoad wrote:

> 1. What's the proper way to legalize f16 and bf16 for SDAG case without 
> bitcasts ? (I would think  "fp_extend -> LaneOp -> Fptrunc" is wrong)

Bitcast to i16, anyext to i32, laneop, trunc to i16, bitcast to original type.

Why wouldn't you use bitcasts?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)

2024-05-15 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/92232
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)

2024-05-15 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/92232

None

>From a02c63497b0d60f55e1846f5a050820082fb5c86 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 15 May 2024 10:04:57 +0100
Subject: [PATCH] Fix typo "indicies"

---
 clang/include/clang/AST/VTTBuilder.h  |  6 +-
 clang/lib/AST/VTTBuilder.cpp  |  2 +-
 clang/lib/CodeGen/CGVTT.cpp   | 17 ++---
 clang/lib/CodeGen/CGVTables.h |  6 +-
 .../command/commands/DexExpectStepOrder.py|  2 +-
 flang/docs/HighLevelFIR.md|  2 +-
 flang/test/Lower/HLFIR/forall.f90 |  2 +-
 libc/src/stdio/printf_core/parser.h   |  2 +-
 .../views/mdspan/CustomTestLayouts.h  |  2 +-
 llvm/docs/GlobalISel/GenericOpcode.rst|  4 +-
 llvm/include/llvm/Target/Target.td|  4 +-
 llvm/lib/Analysis/DependenceAnalysis.cpp  | 10 +--
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 12 ++--
 llvm/lib/Bitcode/Writer/ValueEnumerator.cpp   |  2 +-
 llvm/lib/Bitcode/Writer/ValueEnumerator.h |  2 +-
 .../LiveDebugValues/VarLocBasedImpl.cpp   |  2 +-
 llvm/lib/CodeGen/MLRegAllocEvictAdvisor.cpp   |  2 +-
 llvm/lib/CodeGen/PrologEpilogInserter.cpp |  2 +-
 llvm/lib/Support/ELFAttributeParser.cpp   | 10 +--
 .../Target/AArch64/AArch64ISelLowering.cpp|  2 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  2 +-
 .../DirectX/DXILWriter/DXILBitcodeWriter.cpp  | 10 +--
 .../DXILWriter/DXILValueEnumerator.cpp|  2 +-
 .../DirectX/DXILWriter/DXILValueEnumerator.h  |  2 +-
 llvm/lib/Target/PowerPC/PPCISelLowering.cpp   |  2 +-
 .../Transforms/InstCombine/InstCombinePHI.cpp |  4 +-
 .../Scalar/SeparateConstOffsetFromGEP.cpp |  2 +-
 .../Utils/SampleProfileInference.cpp  |  2 +-
 .../Transforms/Vectorize/SLPVectorizer.cpp| 64 +--
 llvm/test/CodeGen/X86/avx-vperm2x128.ll   |  2 +-
 .../test/DebugInfo/PDB/Inputs/every-type.yaml |  4 +-
 ...h-directive-personalityindex-diagnostics.s |  6 +-
 .../InstCombine/phi-extractvalue.ll   |  8 +--
 .../InstCombine/phi-of-insertvalues.ll|  6 +-
 .../VectorCombine/X86/scalarize-vector-gep.ll | 12 ++--
 .../Linalg/Transforms/Vectorization.cpp   |  6 +-
 36 files changed, 114 insertions(+), 113 deletions(-)

diff --git a/clang/include/clang/AST/VTTBuilder.h 
b/clang/include/clang/AST/VTTBuilder.h
index 4acbc1f9e96b2..3c19e61a8701c 100644
--- a/clang/include/clang/AST/VTTBuilder.h
+++ b/clang/include/clang/AST/VTTBuilder.h
@@ -92,7 +92,7 @@ class VTTBuilder {
   using AddressPointsMapTy = llvm::DenseMap;
 
   /// The sub-VTT indices for the bases of the most derived class.
-  llvm::DenseMap SubVTTIndicies;
+  llvm::DenseMap SubVTTIndices;
 
   /// The secondary virtual pointer indices of all subobjects of
   /// the most derived class.
@@ -148,8 +148,8 @@ class VTTBuilder {
   }
 
   /// Returns a reference to the sub-VTT indices.
-  const llvm::DenseMap () const {
-return SubVTTIndicies;
+  const llvm::DenseMap () const {
+return SubVTTIndices;
   }
 
   /// Returns a reference to the secondary virtual pointer indices.
diff --git a/clang/lib/AST/VTTBuilder.cpp b/clang/lib/AST/VTTBuilder.cpp
index d58e875177852..464a2014c430a 100644
--- a/clang/lib/AST/VTTBuilder.cpp
+++ b/clang/lib/AST/VTTBuilder.cpp
@@ -189,7 +189,7 @@ void VTTBuilder::LayoutVTT(BaseSubobject Base, bool 
BaseIsVirtual) {
 
   if (!IsPrimaryVTT) {
 // Remember the sub-VTT index.
-SubVTTIndicies[Base] = VTTComponents.size();
+SubVTTIndices[Base] = VTTComponents.size();
   }
 
   uint64_t VTableIndex = VTTVTables.size();
diff --git a/clang/lib/CodeGen/CGVTT.cpp b/clang/lib/CodeGen/CGVTT.cpp
index d2376b14dd582..4cebb750c89e8 100644
--- a/clang/lib/CodeGen/CGVTT.cpp
+++ b/clang/lib/CodeGen/CGVTT.cpp
@@ -138,23 +138,24 @@ uint64_t CodeGenVTables::getSubVTTIndex(const 
CXXRecordDecl *RD,
 BaseSubobject Base) {
   BaseSubobjectPairTy ClassSubobjectPair(RD, Base);
 
-  SubVTTIndiciesMapTy::iterator I = SubVTTIndicies.find(ClassSubobjectPair);
-  if (I != SubVTTIndicies.end())
+  SubVTTIndicesMapTy::iterator I = SubVTTIndices.find(ClassSubobjectPair);
+  if (I != SubVTTIndices.end())
 return I->second;
 
   VTTBuilder Builder(CGM.getContext(), RD, /*GenerateDefinition=*/false);
 
-  for (llvm::DenseMap::const_iterator I =
-   Builder.getSubVTTIndicies().begin(),
-   E = Builder.getSubVTTIndicies().end(); I != E; ++I) {
+  for (llvm::DenseMap::const_iterator
+   I = Builder.getSubVTTIndices().begin(),
+   E = Builder.getSubVTTIndices().end();
+   I != E; ++I) {
 // Insert all indices.
 BaseSubobjectPairTy ClassSubobjectPair(RD, I->first);
 
-SubVTTIndicies.insert(std::make_pair(ClassSubobjectPair, I->second));
+SubVTTIndices.insert(std::make_pair(ClassSubobjectPair, I->second));
   }
 
-  I = SubVTTIndicies.find(ClassSubobjectPair);
-  assert(I !=

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits



@@ -493,8 +493,8 @@ Value *AMDGPUAtomicOptimizerImpl::buildScan(IRBuilder<> ,
 if (!ST->isWave32()) {
   // Combine lane 31 into lanes 32..63.
   V = B.CreateBitCast(V, IntNTy);
-  Value *const Lane31 = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {},
-  {V, B.getInt32(31)});
+  Value *const Lane31 = B.CreateIntrinsic(
+  Intrinsic::amdgcn_readlane, B.getInt32Ty(), {V, B.getInt32(31)});

jayfoad wrote:

Changes like this should disappear if we merge #91583 first.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits



@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);

jayfoad wrote:

I think you can just reuse Src0 instead of declaring a new Src0Valid. Same for 
Src2, and same for the SDAG code.

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits



@@ -5386,6 +5386,130 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  auto createLaneOp = [&](Register , Register ,
+  Register ) -> Register {
+auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0);
+if (Src2.isValid())
+  return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0);
+if (Src1.isValid())
+  return (LaneOpDst.addUse(Src1)).getReg(0);
+return LaneOpDst.getReg(0);
+  };
+
+  Register Src1, Src2, Src0Valid, Src2Valid;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+if (Src2.isValid())
+  Src2Valid = B.buildBitcast(S32, Src2).getReg(0);
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+B.buildBitcast(DstReg, LaneOp);
+MI.eraseFromParent();
+return true;
+  }
+
+  if (Size < 32) {
+Register Src0Cast = MRI.getType(Src0).isScalar()
+? Src0
+: B.buildBitcast(LLT::scalar(Size), 
Src0).getReg(0);
+Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0);
+
+if (Src2.isValid()) {
+  Register Src2Cast =
+  MRI.getType(Src2).isScalar()
+  ? Src2
+  : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
+  Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0);
+}
+Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid);
+if (Ty.isScalar())
+  B.buildTrunc(DstReg, LaneOp);
+else {
+  auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp);
+  B.buildBitcast(DstReg, Trunc);
+}
+
+MI.eraseFromParent();
+return true;
+  }
+
+  if ((Size % 32) == 0) {
+SmallVector PartialRes;
+unsigned NumParts = Size / 32;
+auto Src0Parts = B.buildUnmerge(S32, Src0);
+
+switch (IID) {
+case Intrinsic::amdgcn_readlane: {
+  Register Src1 = MI.getOperand(3).getReg();
+  for (unsigned i = 0; i < NumParts; ++i)
+PartialRes.push_back(
+(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32})
+ .addUse(Src0Parts.getReg(i))
+ .addUse(Src1))
+.getReg(0));

jayfoad wrote:

Yes, separate patch

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits



@@ -5386,6 +5386,153 @@ bool 
AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper ,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper ,
+ MachineInstr ,
+ Intrinsic::ID IID) const {
+
+  MachineIRBuilder  = Helper.MIRBuilder;
+  MachineRegisterInfo  = *B.getMRI();
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register Src0 = MI.getOperand(2).getReg();
+
+  Register Src1, Src2;
+  if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) 
{
+Src1 = MI.getOperand(3).getReg();
+if (IID == Intrinsic::amdgcn_writelane) {
+  Src2 = MI.getOperand(4).getReg();
+}
+  }
+
+  LLT Ty = MRI.getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+
+  if (Size == 32) {
+if (Ty.isScalar())
+  // Already legal
+  return true;
+
+Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0);
+MachineInstrBuilder LaneOpDst;
+switch (IID) {

jayfoad wrote:

Can you use a `createLaneOp` helper to build the intrinsic, like you do in the 
SDAG path?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-09 Thread Jay Foad via cfe-commits


https://github.com/jayfoad commented:

LGTM overall.

> add f32 pattern to select read/writelane operations

Why would you need this? Don't you legalize f32 to i32?

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [libc] [libclc] [libcxxabi] [lld] [lldb] [llvm] [mlir] Add clarifying parenthesis around non-trivial conditions in ternary expressions. (PR #90391)

2024-04-30 Thread Jay Foad via cfe-commits


jayfoad wrote:

AMDGPU changes are fine.

https://github.com/llvm/llvm-project/pull/90391
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)

2024-04-22 Thread Jay Foad via cfe-commits


jayfoad wrote:

Previous attempts:
* https://reviews.llvm.org/D84639
* https://reviews.llvm.org/D86154
* https://reviews.llvm.org/D147732
* #87334

https://github.com/llvm/llvm-project/pull/89217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-10 Thread Jay Foad via cfe-commits


jayfoad wrote:

No further comments.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-05 Thread Jay Foad via cfe-commits


jayfoad wrote:

Can you add at least one test for a VMEM (flat or scratch or global or buffer 
or image) atomic without return? That should use vscnt on GFX10.

Apart from that the SIInsertWaitcnts.cpp and tests look good to me. I have not 
reviewed the clang parts but it looks like @Pierre-vh approved them previously?

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-02 Thread Jay Foad via cfe-commits



@@ -0,0 +1,1406 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 
-mattr=+enable-flat-scratch,+precise-memory < %s | FileCheck 
--check-prefixes=GFX9-FLATSCR %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX11
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX12
+
+; from atomicrmw-expand.ll
+; covers flat_load, flat_atomic (atomic with return)
+;
+define void @syncscope_workgroup_nortn(ptr %addr, float %val) {
+; GFX9-LABEL: syncscope_workgroup_nortn:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:flat_load_dword v4, v[0:1]
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], 0
+; GFX9-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX9-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX9-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX9-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4
+; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5]
+; GFX9-NEXT:v_mov_b32_e32 v4, v3
+; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5]
+; GFX9-NEXT:s_cbranch_execnz .LBB0_1
+; GFX9-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX9-NEXT:s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: syncscope_workgroup_nortn:
+; GFX90A:   ; %bb.0:
+; GFX90A-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:flat_load_dword v5, v[0:1]
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX90A-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX90A-NEXT:v_add_f32_e32 v4, v5, v2
+; GFX90A-NEXT:flat_atomic_cmpswap v3, v[0:1], v[4:5] glc
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:v_cmp_eq_u32_e32 vcc, v3, v5
+; GFX90A-NEXT:s_or_b64 s[4:5], vcc, s[4:5]
+; GFX90A-NEXT:v_mov_b32_e32 v5, v3
+; GFX90A-NEXT:s_andn2_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:s_cbranch_execnz .LBB0_1
+; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX90A-NEXT:s_or_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: syncscope_workgroup_nortn:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:flat_load_dword v4, v[0:1]
+; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_mov_b32 s4, 0
+; GFX10-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX10-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX10-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_gl0_inv
+; GFX10-NEXT:v_cmp_eq_u32_e32 vcc_lo, v3, v4
+; GFX10-NEXT:v_mov_b32_e32 v4, v3
+; GFX10-NEXT:s_or_b32 s4, vcc_lo, s4
+; GFX10-NEXT:s_andn2_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:s_cbranch_execnz .LBB0_1
+; GFX10-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX10-NEXT:s_or_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-FLATSCR-LABEL: syncscope_workgroup_nortn:
+; GFX9-FLATSCR:   ; %bb.0:
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:flat_load_dword v4, v[0:1]
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:s_mov_b64 s[0:1], 0
+; GFX9-FLATSCR-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX9-FLATSCR-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX9-FLATSCR-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX9-FLATSCR-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4
+; GFX9-FLATSCR-NEXT:s_or_b64 s[0:1], vcc, s[0:1]
+; GFX9-FLATSCR-NEXT:v_mov_b32_e32 v4, v3
+; GFX9-FLATSCR-NEXT:s_andn2_b64 exec, exec, s[0:1]
+; GFX9-FLATSCR-NEXT:s_cbranch_execnz .LBB0_1
+; GFX9-FLATSCR-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX9-FLATSCR-NEXT:s_or_b64 exec, exec, s[0:1]
+; GFX9-FLATSCR-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: syncscope_workgroup_nortn:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:flat_load_b32 v4, v[0:1]
+; GFX11-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX11-NEXT:s_mov_b32 s0, 0
+; GFX11-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX11-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX11-NEXT:

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-02 Thread Jay Foad via cfe-commits



@@ -0,0 +1,1406 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 
-mattr=+enable-flat-scratch,+precise-memory < %s | FileCheck 
--check-prefixes=GFX9-FLATSCR %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX11
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+precise-memory < %s | 
FileCheck %s -check-prefixes=GFX12
+
+; from atomicrmw-expand.ll
+; covers flat_load, flat_atomic (atomic with return)
+;
+define void @syncscope_workgroup_nortn(ptr %addr, float %val) {
+; GFX9-LABEL: syncscope_workgroup_nortn:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:flat_load_dword v4, v[0:1]
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], 0
+; GFX9-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX9-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX9-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX9-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4
+; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5]
+; GFX9-NEXT:v_mov_b32_e32 v4, v3
+; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5]
+; GFX9-NEXT:s_cbranch_execnz .LBB0_1
+; GFX9-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX9-NEXT:s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL: syncscope_workgroup_nortn:
+; GFX90A:   ; %bb.0:
+; GFX90A-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:flat_load_dword v5, v[0:1]
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:s_mov_b64 s[4:5], 0
+; GFX90A-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX90A-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX90A-NEXT:v_add_f32_e32 v4, v5, v2
+; GFX90A-NEXT:flat_atomic_cmpswap v3, v[0:1], v[4:5] glc
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:v_cmp_eq_u32_e32 vcc, v3, v5
+; GFX90A-NEXT:s_or_b64 s[4:5], vcc, s[4:5]
+; GFX90A-NEXT:v_mov_b32_e32 v5, v3
+; GFX90A-NEXT:s_andn2_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:s_cbranch_execnz .LBB0_1
+; GFX90A-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX90A-NEXT:s_or_b64 exec, exec, s[4:5]
+; GFX90A-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: syncscope_workgroup_nortn:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:flat_load_dword v4, v[0:1]
+; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_mov_b32 s4, 0
+; GFX10-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX10-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX10-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_gl0_inv
+; GFX10-NEXT:v_cmp_eq_u32_e32 vcc_lo, v3, v4
+; GFX10-NEXT:v_mov_b32_e32 v4, v3
+; GFX10-NEXT:s_or_b32 s4, vcc_lo, s4
+; GFX10-NEXT:s_andn2_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:s_cbranch_execnz .LBB0_1
+; GFX10-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX10-NEXT:s_or_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX9-FLATSCR-LABEL: syncscope_workgroup_nortn:
+; GFX9-FLATSCR:   ; %bb.0:
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:flat_load_dword v4, v[0:1]
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:s_mov_b64 s[0:1], 0
+; GFX9-FLATSCR-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX9-FLATSCR-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX9-FLATSCR-NEXT:v_add_f32_e32 v3, v4, v2
+; GFX9-FLATSCR-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc
+; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4
+; GFX9-FLATSCR-NEXT:s_or_b64 s[0:1], vcc, s[0:1]
+; GFX9-FLATSCR-NEXT:v_mov_b32_e32 v4, v3
+; GFX9-FLATSCR-NEXT:s_andn2_b64 exec, exec, s[0:1]
+; GFX9-FLATSCR-NEXT:s_cbranch_execnz .LBB0_1
+; GFX9-FLATSCR-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX9-FLATSCR-NEXT:s_or_b64 exec, exec, s[0:1]
+; GFX9-FLATSCR-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: syncscope_workgroup_nortn:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:flat_load_b32 v4, v[0:1]
+; GFX11-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX11-NEXT:s_mov_b32 s0, 0
+; GFX11-NEXT:  .LBB0_1: ; %atomicrmw.start
+; GFX11-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX11-NEXT:

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-02 Thread Jay Foad via cfe-commits



@@ -2594,12 +2594,10 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const 
SIMemOpInfo ,
 MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent ||
 MOI.getFailureOrdering() == AtomicOrdering::Acquire ||
 MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) {
-  Changed |= CC->insertWait(MI, MOI.getScope(),
-MOI.getInstrAddrSpace(),
-isAtomicRet(*MI) ? SIMemOp::LOAD :
-   SIMemOp::STORE,
-MOI.getIsCrossAddressSpaceOrdering(),
-Position::AFTER);
+  Changed |=

jayfoad wrote:

Remove this.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-29 Thread Jay Foad via cfe-commits



@@ -2326,6 +2326,20 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction ,
 }
 #endif
 
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+  AMDGPU::Waitcnt Wait;
+  if (ST->hasExtendedWaitCounts())
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0);
+  else
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
+
+  if (!Inst.mayStore())
+Wait.StoreCnt = ~0u;

jayfoad wrote:

GFX10 introduced a separate counter for **VMEM** stores with the name VScnt. 
GFX12 just renamed it to STOREcnt. No architecture has a separate store counter 
for DS or SMEM. So `ds_add_u32 v0, v1` followed by `s_waitcnt lgkmcnt(0)` 
(pre-GFX12) or `s_wait_dscnt 0` (GFX12) is fine .

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jay Foad via cfe-commits



@@ -2326,6 +2326,20 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction ,
 }
 #endif
 
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+  AMDGPU::Waitcnt Wait;
+  if (ST->hasExtendedWaitCounts())
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0);
+  else
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
+
+  if (!Inst.mayStore())
+Wait.StoreCnt = ~0u;

jayfoad wrote:

```suggestion
  AMDGPU::Waitcnt Wait = WCG->getAllZeroWaitcnt(Inst.mayStore());
```
However, as a general rule:
- loads and atomics-with-return update LOADcnt
- stores and atomics-without-return update STOREcnt

so it might be more accurate to use the condition `Inst.mayStore() && 
!SIInstrInfo::isAtomicRet(Inst)`.

Please make sure you have tests for atomics with and without return.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Jay Foad via cfe-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Jay Foad via cfe-commits



@@ -355,6 +356,18 @@ class SICacheControl {
MachineBasicBlock::iterator ) const {
 return false;
   }
+
+public:
+  // The following is for supporting precise memory mode. When the feature
+  // precise-memory is enabled, an s_waitcnt instruction is inserted
+  // after each memory instruction.
+
+  virtual bool
+  handleNonAtomicForPreciseMemory(MachineBasicBlock::iterator ) = 0;
+  /// Handles atomic instruction \p MI with \p IsAtomicWithRet indicating
+  /// whether \p MI returns a result.
+  virtual bool handleAtomicForPreciseMemory(MachineBasicBlock::iterator ,

jayfoad wrote:

This function is never even called.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Jay Foad via cfe-commits



@@ -2378,6 +2409,215 @@ bool 
SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   return Changed;
 }
 
+bool SIGfx6CacheControl::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0; // LgkmCnt
+  } else {  // vector
+if (Inst.mayLoad()) {   // vector load
+  if (TII->isVMEM(Inst))// VMEM load
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else// LDS load
+Wait.DsCnt = 0; // LgkmCnt
+} else {// vector store
+  if (TII->isVMEM(Inst))// VMEM store
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat store
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+}
+  }
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx6CacheControl::handleAtomicForPreciseMemory(
+MachineBasicBlock::iterator , bool IsAtomicWithRet) {
+  assert(MI->mayLoadOrStore());
+
+  AMDGPU::Waitcnt Wait;
+
+  Wait.LoadCnt = 0; // VmCnt
+  Wait.DsCnt = 0;   // LgkmCnt
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx10CacheControl::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  bool BuildWaitCnt = true;
+  bool BuildVsCnt = false;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0; // LgkmCnt
+  } else {  // vector
+if (Inst.mayLoad()) {   // vector load
+  if (TII->isVMEM(Inst))// VMEM load
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else// LDS load
+Wait.DsCnt = 0; // LgkmCnt
+}
+
+// For some vector instructions, mayLoad() and mayStore() can be both true.
+if (Inst.mayStore()) { // vector store; an instruction can be both
+   // load/store
+  if (TII->isVMEM(Inst)) { // VMEM store
+if (!Inst.mayLoad())
+  BuildWaitCnt = false;
+BuildVsCnt = true;
+  } else if (TII->isFLAT(Inst)) { // Flat store
+Wait.DsCnt = 0;   // LgkmCnt
+BuildVsCnt = true;
+  } else
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+}
+  }
+
+  MachineBasicBlock  = *MI->getParent();
+  if (BuildWaitCnt) {
+unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+--MI;
+  }
+
+  if (BuildVsCnt) {
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))
+.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
+.addImm(0);
+--MI;
+  }
+  return true;
+}
+
+bool SIGfx10CacheControl ::handleAtomicForPreciseMemory(
+MachineBasicBlock::iterator , bool IsAtomicWithRet) {
+  assert(MI->mayLoadOrStore());
+
+  AMDGPU::Waitcnt Wait;
+
+  Wait.DsCnt = 0; // LgkmCnt
+  if (IsAtomicWithRet)
+Wait.LoadCnt = 0; // VmCnt
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  if (!IsAtomicWithRet) {
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))
+.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
+.addImm(0);
+--MI;
+  }
+  return true;
+}
+
+bool SIGfx12CacheControl ::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  unsigned WaitType = 0;
+  // For some vector instructions, mayLoad() and mayStore() can be both true.
+  bool LoadAndStore = false;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+
+WaitType = AMDGPU::S_WAIT_KMCNT;
+  } else { // vector
+if (Inst.mayLoad() && Inst.mayStore()) {
+  WaitType = AMDGPU::S_WAIT_LOADCNT;
+  LoadAndStore = true;
+} else if (Inst.mayLoad()) { // vector load
+  if (TII->isVMEM(Inst)) // VMEM load
+WaitType =

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Jay Foad via cfe-commits



@@ -2378,6 +2409,215 @@ bool 
SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   return Changed;
 }
 
+bool SIGfx6CacheControl::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0; // LgkmCnt
+  } else {  // vector
+if (Inst.mayLoad()) {   // vector load
+  if (TII->isVMEM(Inst))// VMEM load
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else// LDS load
+Wait.DsCnt = 0; // LgkmCnt
+} else {// vector store
+  if (TII->isVMEM(Inst))// VMEM store
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat store
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+}
+  }
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx6CacheControl::handleAtomicForPreciseMemory(
+MachineBasicBlock::iterator , bool IsAtomicWithRet) {
+  assert(MI->mayLoadOrStore());
+
+  AMDGPU::Waitcnt Wait;
+
+  Wait.LoadCnt = 0; // VmCnt
+  Wait.DsCnt = 0;   // LgkmCnt
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx10CacheControl::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  bool BuildWaitCnt = true;
+  bool BuildVsCnt = false;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0; // LgkmCnt
+  } else {  // vector
+if (Inst.mayLoad()) {   // vector load
+  if (TII->isVMEM(Inst))// VMEM load
+Wait.LoadCnt = 0;   // VmCnt
+  else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0;   // VmCnt
+Wait.DsCnt = 0; // LgkmCnt
+  } else// LDS load
+Wait.DsCnt = 0; // LgkmCnt
+}
+
+// For some vector instructions, mayLoad() and mayStore() can be both true.
+if (Inst.mayStore()) { // vector store; an instruction can be both
+   // load/store
+  if (TII->isVMEM(Inst)) { // VMEM store
+if (!Inst.mayLoad())
+  BuildWaitCnt = false;
+BuildVsCnt = true;
+  } else if (TII->isFLAT(Inst)) { // Flat store
+Wait.DsCnt = 0;   // LgkmCnt
+BuildVsCnt = true;
+  } else
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+}
+  }
+
+  MachineBasicBlock  = *MI->getParent();
+  if (BuildWaitCnt) {
+unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+--MI;
+  }
+
+  if (BuildVsCnt) {
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))
+.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
+.addImm(0);
+--MI;
+  }
+  return true;
+}
+
+bool SIGfx10CacheControl ::handleAtomicForPreciseMemory(
+MachineBasicBlock::iterator , bool IsAtomicWithRet) {
+  assert(MI->mayLoadOrStore());
+
+  AMDGPU::Waitcnt Wait;
+
+  Wait.DsCnt = 0; // LgkmCnt
+  if (IsAtomicWithRet)
+Wait.LoadCnt = 0; // VmCnt
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock  = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  if (!IsAtomicWithRet) {
+BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))
+.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
+.addImm(0);
+--MI;
+  }
+  return true;
+}
+
+bool SIGfx12CacheControl ::handleNonAtomicForPreciseMemory(
+MachineBasicBlock::iterator ) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr  = *MI;
+  unsigned WaitType = 0;
+  // For some vector instructions, mayLoad() and mayStore() can be both true.

jayfoad wrote:

What kind of (non-atomic) instructions is this supposed to handle?

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Jay Foad via cfe-commits


https://github.com/jayfoad requested changes to this pull request.

I've added _some_ inline comments, but really I don't want to spend the time to 
review this properly (or maintain it, or extend it for new architectures in 
future). All this logic already exists in SIInsertWaitcnts. Duplicating it here 
is not a good design.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Jay Foad via cfe-commits



@@ -157,6 +157,27 @@ static uint32_t getLit16Encoding(uint16_t Val, const 
MCSubtargetInfo ) {
   return 255;
 }
 
+static uint32_t getLitBF16Encoding(uint16_t Val) {
+  uint16_t IntImm = getIntInlineImmEncoding(static_cast(Val));
+  if (IntImm != 0)
+return IntImm;
+
+  // clang-format off
+  switch (Val) {

jayfoad wrote:

Yeah, I really don't like having 4 different copies of this list of hex values 
(0x3f00, 0xbf00...).

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-16 Thread Jay Foad via cfe-commits



@@ -157,6 +157,27 @@ static uint32_t getLit16Encoding(uint16_t Val, const 
MCSubtargetInfo ) {
   return 255;
 }
 
+static uint32_t getLitBF16Encoding(uint16_t Val) {
+  uint16_t IntImm = getIntInlineImmEncoding(static_cast(Val));
+  if (IntImm != 0)
+return IntImm;
+
+  // clang-format off
+  switch (Val) {

jayfoad wrote:

Can this call `getInlineEncodingV2BF16`?

https://github.com/llvm/llvm-project/pull/80908
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-07 Thread Jay Foad via cfe-commits


jayfoad wrote:

> This logic would need updating again for GFX12. It seems like it's 
> duplicating a lot of knowledge which is already implemented in 
> SIInsertWaitcnts.

Just to demonstrate, you could implement this feature in SIInsertWaitcnts for 
**all** supported architectures with something like:
```diff
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1..910cd094f8f2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2299,6 +2299,12 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction ,
 
 updateEventWaitcntAfter(Inst, );
 
+AMDGPU::Waitcnt Wait =
+AMDGPU::Waitcnt::allZeroExceptVsCnt(ST->hasExtendedWaitCounts());
+ScoreBrackets.simplifyWaitcnt(Wait);
+Modified |= generateWaitcnt(Wait, std::next(Inst.getIterator()), Block,
+ScoreBrackets, /*OldWaitcntInstr=*/nullptr);
+
 #if 0 // TODO: implement resource type check controlled by options with ub = 
LB.
 // If this instruction generates a S_SETVSKIP because it is an
 // indexed resource, and we are on Tahiti, then it will also force
```
Handling VSCNT/STORECNT correctly is a little more complicated but not much.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-02-01 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/79980
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Jay Foad via cfe-commits


jayfoad wrote:

After this change is there any value in having two different builtins? You 
could just have one that always return 64 bits.

https://github.com/llvm/llvm-project/pull/80183
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-01-30 Thread Jay Foad via cfe-commits


jayfoad wrote:

> Do you think it makes sense to add two gfx11 tests where _w32 variant is now 
> rejected with w64, and _w64 variant rejected with w32?

Maybe, but i didn't have the energy to add yet more tests.

> Maybe what is being printed in *-gfx10-err.cl test is enough, though.

Right, that was my thinking.

https://github.com/llvm/llvm-project/pull/79980
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-01-30 Thread Jay Foad via cfe-commits



@@ -21,14 +21,14 @@ void test_amdgcn_wmma_f32_16x16x16_bf16_w64(global v4f* 
out4f, v16h a16h, v16h b
 global v8s* out8s, v4i a4i, v4i 
b4i, v8s c8s,
 global v4i* out4i, v2i a2i, v2i 
b2i, v4i c4i)
 {
- *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f);  // 
expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target 
feature gfx11-insts}}
- *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f);  // 
expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target 
feature gfx11-insts}}
- *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); 
// expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target 
feature gfx11-insts}}
- *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); 
// expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target 
feature gfx11-insts}}
- *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, 
true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' 
needs target feature gfx11-insts}}
- *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, 
true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' 
needs target feature gfx11-insts}}
- *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, 
c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' 
needs target feature gfx11-insts}}
- *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, 
c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' 
needs target feature gfx11-insts}}
+ *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f);  // 
expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target 
feature gfx11-insts,wavefrontsize64}}
+ *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f);  // 
expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target 
feature gfx11-insts,wavefrontsize64}}
+ *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); 
// expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target 
feature gfx11-insts,wavefrontsize64}}
+ *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); 
// expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target 
feature gfx11-insts,wavefrontsize64}}
+ *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, 
true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' 
needs target feature gfx11-insts,wavefrontsize64}}
+ *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, 
true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' 
needs target feature gfx11-insts,wavefrontsize64}}
+ *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, 
c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' 
needs target feature gfx11-insts,wavefrontsize64}}
+ *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, 
c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' 
needs target feature gfx11-insts,wavefrontsize64}}
 }
 
-#endif
\ No newline at end of file
+#endif

jayfoad wrote:

Yes. My editor did that. Previously there was no newline on the end of the 
`#endif`. Lots of tools flag that as unusual.

https://github.com/llvm/llvm-project/pull/79980
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-01-30 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/79980

None

>From cace712a8f379df3498dd76bc1f95eb4671e997c Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Tue, 30 Jan 2024 11:04:33 +
Subject: [PATCH] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  | 34 +--
 .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 16 -
 .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 18 +-
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl |  2 +-
 4 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 74dfd1d214e8..e9dd8dcd0b60 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,23 +292,23 @@ 
TARGET_BUILTIN(__builtin_amdgcn_s_wait_event_export_ready, "v", "n", "gfx11-inst
 // Postfix w32 indicates the builtin requires wavefront size of 32.
 // Postfix w64 indicates the builtin requires wavefront size of 64.
 
//===--===//
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", 
"nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
-
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", 
"nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", 
"nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", 
"nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts,wavefrontsize64")
 
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtn, "UiUIi", "n", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl, "UWiUIi", "n", "gfx11-insts")
diff --git

[llvm] [clang] [clang-tools-extra] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)

2024-01-30 Thread Jay Foad via cfe-commits


jayfoad wrote:

> @jayfoad, can you link to the documentation where these new registers are 
> described? Preferably from a comment in the top of the file(s). It would make 
> it easier to review for correctness.

ISA documentation will be linked from 
https://llvm.org/docs/AMDGPUUsage.html#additional-documentation when it is made 
public.

https://github.com/llvm/llvm-project/pull/77438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [clang] [AMDGPU] Update SITargetLowering::getAddrModeArguments (PR #78740)

2024-01-29 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78740
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [flang] [llvm] [clang] [compiler-rt] [AMDGPU] Fold operand after shrinking instruction in SIFoldOperands (PR #68426)

2024-01-29 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/68426
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[mlir] [clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-24 Thread Jay Foad via cfe-commits


jayfoad wrote:

> Also need to be updated:
> 
> https://github.com/llvm/llvm-project/blob/bb6a4850553dd4140a5bd63187ec1b14d0b731f9/llvm/lib/Target/AMDGPU/SMInstructions.td#L14

What needs to be updated and why?

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [AMDGPU] Update SITargetLowering::getAddrModeArguments (PR #78740)

2024-01-24 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/78740

>From c7636536d65a3792223e083dc5bacd0a8e6ff3d7 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 19 Jan 2024 16:06:00 +
Subject: [PATCH] [AMDGPU] Update SITargetLowering::getAddrModeArguments

Handle every intrinsic for which getTgtMemIntrinsic returns with
Info.ptrVal set to one of the intrinsic's operands. A bunch of these
cases were missing.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 36 +++
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index cc0c4d4e36eaa8e..66ae9222fb50c89 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1406,31 +1406,41 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
 bool SITargetLowering::getAddrModeArguments(IntrinsicInst *II,
 SmallVectorImpl ,
 Type *) const {
+  Value *Ptr = nullptr;
   switch (II->getIntrinsicID()) {
-  case Intrinsic::amdgcn_ds_ordered_add:
-  case Intrinsic::amdgcn_ds_ordered_swap:
+  case Intrinsic::amdgcn_atomic_cond_sub_u32:
   case Intrinsic::amdgcn_ds_append:
   case Intrinsic::amdgcn_ds_consume:
   case Intrinsic::amdgcn_ds_fadd:
-  case Intrinsic::amdgcn_ds_fmin:
   case Intrinsic::amdgcn_ds_fmax:
-  case Intrinsic::amdgcn_global_atomic_fadd:
+  case Intrinsic::amdgcn_ds_fmin:
+  case Intrinsic::amdgcn_ds_ordered_add:
+  case Intrinsic::amdgcn_ds_ordered_swap:
   case Intrinsic::amdgcn_flat_atomic_fadd:
-  case Intrinsic::amdgcn_flat_atomic_fmin:
+  case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16:
   case Intrinsic::amdgcn_flat_atomic_fmax:
-  case Intrinsic::amdgcn_flat_atomic_fmin_num:
   case Intrinsic::amdgcn_flat_atomic_fmax_num:
+  case Intrinsic::amdgcn_flat_atomic_fmin:
+  case Intrinsic::amdgcn_flat_atomic_fmin_num:
+  case Intrinsic::amdgcn_global_atomic_csub:
+  case Intrinsic::amdgcn_global_atomic_fadd:
   case Intrinsic::amdgcn_global_atomic_fadd_v2bf16:
-  case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16:
-  case Intrinsic::amdgcn_global_atomic_csub: {
-Value *Ptr = II->getArgOperand(0);
-AccessTy = II->getType();
-Ops.push_back(Ptr);
-return true;
-  }
+  case Intrinsic::amdgcn_global_atomic_fmax:
+  case Intrinsic::amdgcn_global_atomic_fmax_num:
+  case Intrinsic::amdgcn_global_atomic_fmin:
+  case Intrinsic::amdgcn_global_atomic_fmin_num:
+  case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
+Ptr = II->getArgOperand(0);
+break;
+  case Intrinsic::amdgcn_global_load_lds:
+Ptr = II->getArgOperand(1);
+break;
   default:
 return false;
   }
+  AccessTy = II->getType();
+  Ops.push_back(Ptr);
+  return true;
 }
 
 bool SITargetLowering::isLegalFlatAddressingMode(const AddrMode ,

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)

2024-01-24 Thread Jay Foad via cfe-commits



@@ -4,10 +4,114 @@
 
 typedef unsigned int uint;
 
-kernel void test_builtins_amdgcn_gws_insts(uint a, uint b) {
+#pragma OPENCL EXTENSION cl_khr_fp64:enable
+
+typedef float  v2f   __attribute__((ext_vector_type(2)));
+typedef float  v4f   __attribute__((ext_vector_type(4)));
+typedef float  v16f  __attribute__((ext_vector_type(16)));
+typedef float  v32f  __attribute__((ext_vector_type(32)));
+typedef half   v4h   __attribute__((ext_vector_type(4)));
+typedef half   v8h   __attribute__((ext_vector_type(8)));
+typedef half   v16h  __attribute__((ext_vector_type(16)));
+typedef half   v32h  __attribute__((ext_vector_type(32)));
+typedef intv2i   __attribute__((ext_vector_type(2)));
+typedef intv4i   __attribute__((ext_vector_type(4)));
+typedef intv16i  __attribute__((ext_vector_type(16)));
+typedef intv32i  __attribute__((ext_vector_type(32)));
+typedef short  v2s   __attribute__((ext_vector_type(2)));
+typedef short  v4s   __attribute__((ext_vector_type(4)));
+typedef short  v8s   __attribute__((ext_vector_type(8)));
+typedef short  v16s  __attribute__((ext_vector_type(16)));
+typedef short  v32s  __attribute__((ext_vector_type(32)));
+typedef double v4d   __attribute__((ext_vector_type(4)));
+
+void builtin_test_unsupported(global v32f*out_v32f,
+  global v16f*out_v16f,
+  global v4f* out_v4f,
+  global v32i*out_v32i,
+  global v16i*out_v16i,
+  global v4i* out_v4i,
+  global v4d* out_v4d,
+  global double*  out_double,
+  double a_double , double b_double , double 
c_double,

jayfoad wrote:

Nit: you don't really need separate out/a/b/c versions of all these types. You 
could just test expressions like:
```
x_v32f = __builtin_amdgcn_mfma_f32_32x32x1f32(x_float, x_float, x_v32f, 0, 0, 
0);
```

https://github.com/llvm/llvm-project/pull/78729
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)

2024-01-24 Thread Jay Foad via cfe-commits


https://github.com/jayfoad approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/78729
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)

2024-01-24 Thread Jay Foad via cfe-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/78729
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-24 Thread Jay Foad via cfe-commits



@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const 
SIMemOpInfo ,
   return Changed;
 }
 
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction ) {
+  const GCNSubtarget  = MF.getSubtarget();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  IsaVersion IV = getIsaVersion(ST.getCPU());
+
+  bool Changed = false;
+
+  for (auto  : MF) {
+for (auto MI = MBB.begin(); MI != MBB.end();) {
+  MachineInstr  = *MI;
+  ++MI;
+  if (Inst.mayLoadOrStore() == false)
+continue;
+
+  // Todo: if next insn is an s_waitcnt
+  AMDGPU::Waitcnt Wait;
+
+  if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) {
+if (TII->isSMRD(Inst)) {  // scalar

jayfoad wrote:

This logic would need updating again for GFX12. It seems like it's duplicating 
a lot of knowledge which is already implemented in SIInsertWaitcnts.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libcxx] [lld] [libc] [lldb] [clang-tools-extra] [clang] [openmp] [compiler-rt] [llvm] [flang] [mlir] AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (PR #79104)

2024-01-23 Thread Jay Foad via cfe-commits


https://github.com/jayfoad approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/79104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lldb] [llvm] [mlir] [openmp] [libcxx] [flang] [clang] [clang-tools-extra] [compiler-rt] [lld] [libc] AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (PR #79104)

2024-01-23 Thread Jay Foad via cfe-commits



@@ -1348,6 +1348,14 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MachineMemOperand::MOVolatile;
 return true;
   }
+  case Intrinsic::amdgcn_global_load_tr: {

jayfoad wrote:

This case should also be handled in getAdrModeArguments below.

https://github.com/llvm/llvm-project/pull/79104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)

2024-01-19 Thread Jay Foad via cfe-commits



@@ -0,0 +1,105 @@
+// REQUIRES: amdgpu-registered-target

jayfoad wrote:

Maybe just add these to `test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl` 
instead of a new file?

https://github.com/llvm/llvm-project/pull/78729
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Remove gws feature from GFX12 (PR #78711)

2024-01-19 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78711
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)

2024-01-19 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [compiler-rt] [flang] [lld] [llvm] [clang] [lldb] [clang-tools-extra] [libcxx] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-19 Thread Jay Foad via cfe-commits


jayfoad wrote:

Can you add a GFX12 RUN line to 
clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl? That will probably require 
adding "fp8-conversion-insts" to the GFX12 part of TargetParser.cpp. You can do 
this in a separate patch if you want.

https://github.com/llvm/llvm-project/pull/78414
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Remove gws feature from GFX12 (PR #78711)

2024-01-19 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/78711

This was already done for LLVM. This patch just updates the Clang
builtin handling to match.


>From 8ec83bbc08c6a364efda3724d5886dbd568f956f Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 19 Jan 2024 13:34:37 +
Subject: [PATCH] [AMDGPU] Remove gws feature from GFX12

This was already done for LLVM. This patch just updates the Clang
builtin handling to match.
---
 .../builtins-amdgcn-gfx12-err.cl  | 27 ++-
 .../builtins-amdgcn-gfx12-param-err.cl| 24 +
 llvm/lib/TargetParser/TargetParser.cpp|  1 -
 3 files changed, 32 insertions(+), 20 deletions(-)
 create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl

diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl
index 5e0153c42825e3..bcaea9a2482d18 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl
@@ -2,23 +2,12 @@
 
 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1200 -verify 
-S -emit-llvm -o - %s
 
-kernel void builtins_amdgcn_s_barrier_signal_err(global int* in, global int* 
out, int barrier) {
-
-  __builtin_amdgcn_s_barrier_signal(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_signal' must be a constant integer}}
-  __builtin_amdgcn_s_barrier_wait(-1);
-  *out = *in;
-}
-
-kernel void builtins_amdgcn_s_barrier_wait_err(global int* in, global int* 
out, int barrier) {
-
-  __builtin_amdgcn_s_barrier_signal(-1);
-  __builtin_amdgcn_s_barrier_wait(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_wait' must be a constant integer}}
-  *out = *in;
-}
-
-kernel void builtins_amdgcn_s_barrier_signal_isfirst_err(global int* in, 
global int* out, int barrier) {
-
-  __builtin_amdgcn_s_barrier_signal_isfirst(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_signal_isfirst' must be a constant integer}}
-  __builtin_amdgcn_s_barrier_wait(-1);
-  *out = *in;
+typedef unsigned int uint;
+
+kernel void test_builtins_amdgcn_gws_insts(uint a, uint b) {
+  __builtin_amdgcn_ds_gws_init(a, b); // expected-error 
{{'__builtin_amdgcn_ds_gws_init' needs target feature gws}}
+  __builtin_amdgcn_ds_gws_barrier(a, b); // expected-error 
{{'__builtin_amdgcn_ds_gws_barrier' needs target feature gws}}
+  __builtin_amdgcn_ds_gws_sema_v(a); // expected-error 
{{'__builtin_amdgcn_ds_gws_sema_v' needs target feature gws}}
+  __builtin_amdgcn_ds_gws_sema_br(a, b); // expected-error 
{{'__builtin_amdgcn_ds_gws_sema_br' needs target feature gws}}
+  __builtin_amdgcn_ds_gws_sema_p(a); // expected-error 
{{'__builtin_amdgcn_ds_gws_sema_p' needs target feature gws}}
 }
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl
new file mode 100644
index 00..5e0153c42825e3
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl
@@ -0,0 +1,24 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1200 -verify 
-S -emit-llvm -o - %s
+
+kernel void builtins_amdgcn_s_barrier_signal_err(global int* in, global int* 
out, int barrier) {
+
+  __builtin_amdgcn_s_barrier_signal(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_signal' must be a constant integer}}
+  __builtin_amdgcn_s_barrier_wait(-1);
+  *out = *in;
+}
+
+kernel void builtins_amdgcn_s_barrier_wait_err(global int* in, global int* 
out, int barrier) {
+
+  __builtin_amdgcn_s_barrier_signal(-1);
+  __builtin_amdgcn_s_barrier_wait(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_wait' must be a constant integer}}
+  *out = *in;
+}
+
+kernel void builtins_amdgcn_s_barrier_signal_isfirst_err(global int* in, 
global int* out, int barrier) {
+
+  __builtin_amdgcn_s_barrier_signal_isfirst(barrier); // expected-error 
{{'__builtin_amdgcn_s_barrier_signal_isfirst' must be a constant integer}}
+  __builtin_amdgcn_s_barrier_wait(-1);
+  *out = *in;
+}
diff --git a/llvm/lib/TargetParser/TargetParser.cpp 
b/llvm/lib/TargetParser/TargetParser.cpp
index 2cfe23676d20f8..6bd477aed8fa64 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -295,7 +295,6 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const 
Triple ,
   Features["gfx12-insts"] = true;
   Features["atomic-fadd-rtn-insts"] = true;
   Features["image-insts"] = true;
-  Features["gws"] = true;
   break;
 case GK_GFX1151:
 case GK_GFX1150:

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)

2024-01-19 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/78709

That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.


>From b212d63828ae87b8e40f9d6de7622bc7a14ce48f Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 30 Oct 2023 08:03:17 +0100
Subject: [PATCH] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12

That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.
---
 clang/test/CodeGenOpenCL/amdgpu-features.cl   | 4 ++--
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 1 -
 llvm/lib/TargetParser/TargetParser.cpp| 1 -
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 4 
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index 7495bca72a9df5..1ba2b129f6895a 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -100,8 +100,8 @@
 // GFX1103: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1150: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1151: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1200: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1201: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
+// GFX1200: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
+// GFX1201: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 
 // GFX1103-W64: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7b7fa906b2b1a3..92985f971f17a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1487,7 +1487,6 @@ def FeatureISAVersion12 : FeatureSet<
   [FeatureGFX12,
FeatureLDSBankCount32,
FeatureDLInsts,
-   FeatureDot5Insts,
FeatureDot7Insts,
FeatureDot8Insts,
FeatureDot9Insts,
diff --git a/llvm/lib/TargetParser/TargetParser.cpp 
b/llvm/lib/TargetParser/TargetParser.cpp
index 2cfe23676d20f8..f6d5bfe913b419 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -275,7 +275,6 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const 
Triple ,
 case GK_GFX1201:
 case GK_GFX1200:
   Features["ci-insts"] = true;
-  Features["dot5-insts"] = true;
   Features["dot7-insts"] = true;
   Features["dot8-insts"] = true;
   Features["dot9-insts"] = true;
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
index 240997aeb9a687..26e6bde97f499d 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
@@ -3,12 +3,14 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s | 
FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | 
FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100

[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-19 Thread Jay Foad via cfe-commits


jayfoad wrote:

Some of the tests in this patch need regenerating now that #77438 has been 
merged.

https://github.com/llvm/llvm-project/pull/77795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang-tools-extra] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [clang] [llvm] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)

2024-01-18 Thread Jay Foad via cfe-commits



@@ -1,7 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck 
--check-prefixes=SI %s

jayfoad wrote:

Done as part of a merge from main to fix conflicts.

https://github.com/llvm/llvm-project/pull/78155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [clang] [llvm] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/77438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [clang] [llvm] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78186
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/78186

>From d3f4ebf849f6ef1ea373e5c7f93398db6681b2b6 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Mon, 15 Jan 2024 15:02:08 +
Subject: [PATCH 1/4] Add GFX11/12 test coverage

---
 llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll | 103 +-
 1 file changed, 77 insertions(+), 26 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll 
b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
index 598d7a8033c2e54..2c1baeeeda21697 100644
--- a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
+++ b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
@@ -1,32 +1,83 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s
-
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX9
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX11
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX12
 
 define void @test_remat_s_getpc_b64() {
-; CHECK-LABEL: test_remat_s_getpc_b64:
-; CHECK:   ; %bb.0: ; %entry
-; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v0, s30, 0
-; CHECK-NEXT:s_getpc_b64 s[4:5]
-; CHECK-NEXT:v_writelane_b32 v0, s31, 1
-; CHECK-NEXT:;;#ASMSTART
-; CHECK-NEXT:;;#ASMEND
-; CHECK-NEXT:;;#ASMSTART
-; CHECK-NEXT:;;#ASMEND
-; CHECK-NEXT:s_getpc_b64 s[4:5]
-; CHECK-NEXT:v_mov_b32_e32 v1, s4
-; CHECK-NEXT:v_mov_b32_e32 v2, s5
-; CHECK-NEXT:global_store_dwordx2 v[1:2], v[1:2], off
-; CHECK-NEXT:v_readlane_b32 s31, v0, 1
-; CHECK-NEXT:v_readlane_b32 s30, v0, 0
-; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
-; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:s_waitcnt vmcnt(0)
-; CHECK-NEXT:s_setpc_b64 s[30:31]
+; GFX9-LABEL: test_remat_s_getpc_b64:
+; GFX9:   ; %bb.0: ; %entry
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1
+; GFX9-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; GFX9-NEXT:s_mov_b64 exec, s[4:5]
+; GFX9-NEXT:v_writelane_b32 v0, s30, 0
+; GFX9-NEXT:s_getpc_b64 s[4:5]
+; GFX9-NEXT:v_writelane_b32 v0, s31, 1
+; GFX9-NEXT:;;#ASMSTART
+; GFX9-NEXT:;;#ASMEND
+; GFX9-NEXT:;;#ASMSTART
+; GFX9-NEXT:;;#ASMEND
+; GFX9-NEXT:s_getpc_b64 s[4:5]
+; GFX9-NEXT:v_mov_b32_e32 v1, s4
+; GFX9-NEXT:v_mov_b32_e32 v2, s5
+; GFX9-NEXT:global_store_dwordx2 v[1:2], v[1:2], off
+; GFX9-NEXT:v_readlane_b32 s31, v0, 1
+; GFX9-NEXT:v_readlane_b32 s30, v0, 0
+; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1
+; GFX9-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; GFX9-NEXT:s_mov_b64 exec, s[4:5]
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: test_remat_s_getpc_b64:
+; GFX11:   ; %bb.0: ; %entry
+; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
+; GFX11-NEXT:s_mov_b32 exec_lo, s0
+; GFX11-NEXT:v_writelane_b32 v0, s30, 0
+; GFX11-NEXT:s_getpc_b64 s[0:1]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:v_writelane_b32 v0, s31, 1
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_getpc_b64 s[0:1]
+; GFX11-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | 
instid1(VALU_DEP_2)
+; GFX11-NEXT:v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0
+; GFX11-NEXT:v_readlane_b32 s31, v0, 1
+; GFX11-NEXT:v_readlane_b32 s30, v0, 0
+; GFX11-NEXT:global_store_b64 v[1:2], v[1:2], off
+; GFX11-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
+; GFX11-NEXT:s_mov_b32 exec_lo, s0
+; GFX11-NEXT:s_waitcnt vmcnt(0)
+; GFX11-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_remat_s_getpc_b64:
+; GFX12:   ; %bb.0: ; %entry
+; GFX12-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
+; GFX12-NEXT:s_mov_b32 exec_lo, s0
+; GFX12-NEXT:v_writelane_b32 v0, s30, 0
+; GFX12-NEXT:s_getpc_b64 s[0:1]
+; GFX12-NEXT:;;#ASMSTART
+; GFX12-NEXT:;;#ASMEND
+; GFX12-NEXT:v_writelane_b32 v0, s31, 1
+; GFX12-NEXT:;;#ASMSTART
+;

[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/77926
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang-tools-extra] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)

2024-01-18 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/78186

>From d3f4ebf849f6ef1ea373e5c7f93398db6681b2b6 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Mon, 15 Jan 2024 15:02:08 +
Subject: [PATCH 1/4] Add GFX11/12 test coverage

---
 llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll | 103 +-
 1 file changed, 77 insertions(+), 26 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll 
b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
index 598d7a8033c2e54..2c1baeeeda21697 100644
--- a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
+++ b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll
@@ -1,32 +1,83 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s
-
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX9
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX11
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -stress-regalloc=2 
-verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX12
 
 define void @test_remat_s_getpc_b64() {
-; CHECK-LABEL: test_remat_s_getpc_b64:
-; CHECK:   ; %bb.0: ; %entry
-; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v0, s30, 0
-; CHECK-NEXT:s_getpc_b64 s[4:5]
-; CHECK-NEXT:v_writelane_b32 v0, s31, 1
-; CHECK-NEXT:;;#ASMSTART
-; CHECK-NEXT:;;#ASMEND
-; CHECK-NEXT:;;#ASMSTART
-; CHECK-NEXT:;;#ASMEND
-; CHECK-NEXT:s_getpc_b64 s[4:5]
-; CHECK-NEXT:v_mov_b32_e32 v1, s4
-; CHECK-NEXT:v_mov_b32_e32 v2, s5
-; CHECK-NEXT:global_store_dwordx2 v[1:2], v[1:2], off
-; CHECK-NEXT:v_readlane_b32 s31, v0, 1
-; CHECK-NEXT:v_readlane_b32 s30, v0, 0
-; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
-; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:s_waitcnt vmcnt(0)
-; CHECK-NEXT:s_setpc_b64 s[30:31]
+; GFX9-LABEL: test_remat_s_getpc_b64:
+; GFX9:   ; %bb.0: ; %entry
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1
+; GFX9-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; GFX9-NEXT:s_mov_b64 exec, s[4:5]
+; GFX9-NEXT:v_writelane_b32 v0, s30, 0
+; GFX9-NEXT:s_getpc_b64 s[4:5]
+; GFX9-NEXT:v_writelane_b32 v0, s31, 1
+; GFX9-NEXT:;;#ASMSTART
+; GFX9-NEXT:;;#ASMEND
+; GFX9-NEXT:;;#ASMSTART
+; GFX9-NEXT:;;#ASMEND
+; GFX9-NEXT:s_getpc_b64 s[4:5]
+; GFX9-NEXT:v_mov_b32_e32 v1, s4
+; GFX9-NEXT:v_mov_b32_e32 v2, s5
+; GFX9-NEXT:global_store_dwordx2 v[1:2], v[1:2], off
+; GFX9-NEXT:v_readlane_b32 s31, v0, 1
+; GFX9-NEXT:v_readlane_b32 s30, v0, 0
+; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1
+; GFX9-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; GFX9-NEXT:s_mov_b64 exec, s[4:5]
+; GFX9-NEXT:s_waitcnt vmcnt(0)
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: test_remat_s_getpc_b64:
+; GFX11:   ; %bb.0: ; %entry
+; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
+; GFX11-NEXT:s_mov_b32 exec_lo, s0
+; GFX11-NEXT:v_writelane_b32 v0, s30, 0
+; GFX11-NEXT:s_getpc_b64 s[0:1]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:v_writelane_b32 v0, s31, 1
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_getpc_b64 s[0:1]
+; GFX11-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | 
instid1(VALU_DEP_2)
+; GFX11-NEXT:v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0
+; GFX11-NEXT:v_readlane_b32 s31, v0, 1
+; GFX11-NEXT:v_readlane_b32 s30, v0, 0
+; GFX11-NEXT:global_store_b64 v[1:2], v[1:2], off
+; GFX11-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
+; GFX11-NEXT:s_mov_b32 exec_lo, s0
+; GFX11-NEXT:s_waitcnt vmcnt(0)
+; GFX11-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_remat_s_getpc_b64:
+; GFX12:   ; %bb.0: ; %entry
+; GFX12-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX12-NEXT:s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
+; GFX12-NEXT:s_mov_b32 exec_lo, s0
+; GFX12-NEXT:v_writelane_b32 v0, s30, 0
+; GFX12-NEXT:s_getpc_b64 s[0:1]
+; GFX12-NEXT:;;#ASMSTART
+; GFX12-NEXT:;;#ASMEND
+; GFX12-NEXT:v_writelane_b32 v0, s31, 1
+; GFX12-NEXT:;;#ASMSTART
+;

[clang] [clang-tools-extra] [llvm] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (PR #77929)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/77929
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] [clang] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)

2024-01-17 Thread Jay Foad via cfe-commits


jayfoad wrote:

@Pierre-vh @arsen ping! (Sorry, I know it has only been a few days.)

https://github.com/llvm/llvm-project/pull/77438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (PR #77929)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/77929

>From 4299ba898449f782c642b0c27f0ec9970aee0a1c Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 12 Jan 2024 11:34:02 +
Subject: [PATCH 1/2] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on
 GFX12

---
 llvm/lib/Target/AMDGPU/AMDGPU.td|  3 ++-
 llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir  |  1 +
 llvm/test/MC/AMDGPU/gfx12_asm_features.s| 17 +
 .../Disassembler/AMDGPU/gfx12_dasm_features.txt | 13 +
 4 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index b27edb1e9e14bb..682ca6c57c973b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1502,7 +1502,8 @@ def FeatureISAVersion12 : FeatureSet<
FeatureHasRestrictedSOffset,
FeatureVGPRSingleUseHintInsts,
FeatureMADIntraFwdBug,
-   FeatureScalarDwordx3Loads]>;
+   FeatureScalarDwordx3Loads,
+   FeatureDPPSrc1SGPR]>;
 
 
//===--===//
 
diff --git a/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir 
b/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir
index fe1345e29f133d..7d081a1491da6e 100644
--- a/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir
+++ b/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir
@@ -1,5 +1,6 @@
 # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=gcn-dpp-combine 
-verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1100
 # RUN: llc -march=amdgcn -mcpu=gfx1150 -run-pass=gcn-dpp-combine 
-verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1150
+# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=gcn-dpp-combine 
-verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1150
 
 ---
 
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx12_asm_features.s
index 7e58bdb3b444e1..da4464c6494dbf 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_features.s
@@ -1,5 +1,22 @@
 // RUN: llvm-mc -arch=amdgcn -show-encoding -mcpu=gfx1200 %s | FileCheck 
--check-prefix=GFX12 %s
 
+//
+// Subtargets allow src1 of VOP3 DPP instructions to be SGPR or inlinable
+// constant.
+//
+
+v_add3_u32_e64_dpp v5, v1, s2, v3 quad_perm:[3,2,1,0] row_mask:0xf 
bank_mask:0xf
+// GFX1150: encoding: 
[0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff]
+
+v_add3_u32_e64_dpp v5, v1, 42, v3 quad_perm:[3,2,1,0] row_mask:0xf 
bank_mask:0xf
+// GFX1150: encoding: 
[0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff]
+
+v_add3_u32_e64_dpp v5, v1, s2, v0 dpp8:[7,6,5,4,3,2,1,0]
+// GFX1150: encoding: 
[0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05]
+
+v_add3_u32_e64_dpp v5, v1, 42, v0 dpp8:[7,6,5,4,3,2,1,0]
+// GFX1150: encoding: 
[0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05]
+
 //
 // Elements of CPol operand can be given in any order
 //
diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt
new file mode 100644
index 00..2c64522422ad0d
--- /dev/null
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt
@@ -0,0 +1,13 @@
+# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -disassemble -show-encoding < %s | 
FileCheck -check-prefixes=GFX12 %s
+
+# GFX12: v_add3_u32_e64_dpp v5, v1, s2, v3 quad_perm:[3,2,1,0] row_mask:0xf 
bank_mask:0xf ; encoding: 
[0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff]
+0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff
+
+# GFX12: v_add3_u32_e64_dpp v5, v1, 42, v3 quad_perm:[3,2,1,0] row_mask:0xf 
bank_mask:0xf ; encoding: 
[0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff]
+0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff
+
+# GFX12: v_add3_u32_e64_dpp v5, v1, s2, v0 dpp8:[7,6,5,4,3,2,1,0] ; encoding: 
[0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05]
+0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05
+
+# GFX12: v_add3_u32_e64_dpp v5, v1, 42, v0 dpp8:[7,6,5,4,3,2,1,0] ; encoding: 
[0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05]
+0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05

>From a65834ad3d8aed3e9cb1414d7576d5244a31f8a2 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 17 Jan 2024 14:39:09 +
Subject: [PATCH 2/2] More tests

---
 llvm/test/MC/AMDGPU/gfx1150_asm_features.s | 6 ++
 llvm/test/MC/AMDGPU/gfx12_asm_features.s   | 6 ++
 llvm/test/MC/Disassembler/AMDGPU/gfx1150_dasm_features.txt | 6 ++
 llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt   | 6 ++
 4 files changed, 24 insertions(+)

diff --git a/llvm/test/MC/AMDGPU/gfx1150_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx1150_asm_features.s
index a4904c40b40ae7..55c855175a89e0 100644
---

[flang] [libc] [llvm] [clang-tools-extra] [clang] [compiler-rt] [libcxx] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12 (PR #78191)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/78191
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [clang] [llvm] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/77927
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libcxx] [clang] [libc] [llvm] [clang-tools-extra] [flang] [compiler-rt] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12 (PR #78191)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/78191

>From 9990fbc26ed3dc245a5127345326050acac49d66 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 21 Apr 2023 10:46:43 +0100
Subject: [PATCH] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12

The meaning of bit 0 of the immediate operand of S_WAIT_EVENT has been
flipped from GFX11.
---
 llvm/lib/Target/AMDGPU/SOPInstructions.td| 8 
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll | 9 ++---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 46fa3d57a21cb2..b78d900c9bbf42 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1768,10 +1768,10 @@ def : GCNPat<
   (S_SEXT_I32_I16 $src)
 >;
 
-def : GCNPat <
-  (int_amdgcn_s_wait_event_export_ready),
-(S_WAIT_EVENT (i16 0))
->;
+let SubtargetPredicate = isNotGFX12Plus in
+  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
0))>;
+let SubtargetPredicate = isGFX12Plus in
+  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
1))>;
 
 // The first 10 bits of the mode register are the core FP mode on all
 // subtargets.
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
index 3e95e4dec67a2b..25b5ddcf946b35 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
@@ -1,8 +1,11 @@
-; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < 
%s | FileCheck -check-prefix=GCN %s
-; RUN: llc -global-isel -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < 
%s | FileCheck -check-prefixes=GCN,GFX11 %s
+; RUN: llc -global-isel=1 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < 
%s | FileCheck -check-prefixes=GCN,GFX11 %s
+; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1200 < 
%s | FileCheck -check-prefixes=GCN,GFX12 %s
+; RUN: llc -global-isel=1 -march=amdgcn -verify-machineinstrs -mcpu=gfx1200 < 
%s | FileCheck -check-prefixes=GCN,GFX12 %s
 
 ; GCN-LABEL: {{^}}test_wait_event:
-; GCN: s_wait_event 0x0
+; GFX11: s_wait_event 0x0
+; GFX12: s_wait_event 0x1
 
 define amdgpu_ps void @test_wait_event() #0 {
 entry:

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [clang-tools-extra] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/77927

>From 3f3bcdb89adf032e26c95807abf5e3b23ff50e4a Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 12 Jan 2024 12:24:28 +
Subject: [PATCH 1/3] Precommit extra GFX12 test coverage

---
 .../GlobalISel/inst-select-mad_64_32.mir  |  21 ++
 llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | 163 ++
 llvm/test/CodeGen/AMDGPU/mad_64_32.ll | 211 ++
 3 files changed, 395 insertions(+)

diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
index 698281caca245e..6e33ef37397d6b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
@@ -1,6 +1,7 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -march=amdgcn -mcpu=gfx1030 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX10 %s
 # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX11 %s
+# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX12 %s
 
 ---
 name: mad_u64_u32_vvv
@@ -18,6 +19,7 @@ body: |
 ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX10-NEXT: [[V_MAD_U64_U32_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_e64 [[COPY]], 
[[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_e64_]], implicit 
[[V_MAD_U64_U32_e64_1]]
+;
 ; GFX11-LABEL: name: mad_u64_u32_vvv
 ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
 ; GFX11-NEXT: {{  $}}
@@ -26,6 +28,15 @@ body: |
 ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX11-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit 
[[V_MAD_U64_U32_gfx11_e64_1]]
+;
+; GFX12-LABEL: name: mad_u64_u32_vvv
+; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+; GFX12-NEXT: {{  $}}
+; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
+; GFX12-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
+; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit 
[[V_MAD_U64_U32_gfx11_e64_1]]
 %0:vgpr(s32) = COPY $vgpr0
 %1:vgpr(s32) = COPY $vgpr1
 %2:vgpr(s32) = COPY $vgpr2
@@ -51,6 +62,7 @@ body: |
 ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX10-NEXT: [[V_MAD_I64_I32_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_e64 [[COPY]], 
[[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_e64_]], implicit 
[[V_MAD_I64_I32_e64_1]]
+;
 ; GFX11-LABEL: name: mad_i64_i32_vvv
 ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
 ; GFX11-NEXT: {{  $}}
@@ -59,6 +71,15 @@ body: |
 ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX11-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit 
[[V_MAD_I64_I32_gfx11_e64_1]]
+;
+; GFX12-LABEL: name: mad_i64_i32_vvv
+; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+; GFX12-NEXT: {{  $}}
+; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
+; GFX12-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
+; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit 
[[V_MAD_I64_I32_gfx11_e64_1]]
 %0:vgpr(s32) = COPY $vgpr0
 %1:vgpr(s32) = COPY $vgpr1
 %2:vgpr(s32) = COPY $vgpr2
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
index 249acec639540b..b9b03e52ec865c 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
@@ -3,6 +3,7 @@
 ; RUN: llc -march=amdgcn

[llvm] [clang] [clang-tools-extra] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)

2024-01-17 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/77927

>From 3f3bcdb89adf032e26c95807abf5e3b23ff50e4a Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 12 Jan 2024 12:24:28 +
Subject: [PATCH 1/2] Precommit extra GFX12 test coverage

---
 .../GlobalISel/inst-select-mad_64_32.mir  |  21 ++
 llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | 163 ++
 llvm/test/CodeGen/AMDGPU/mad_64_32.ll | 211 ++
 3 files changed, 395 insertions(+)

diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
index 698281caca245e9..6e33ef37397d6b4 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir
@@ -1,6 +1,7 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -march=amdgcn -mcpu=gfx1030 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX10 %s
 # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX11 %s
+# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=instruction-select 
-global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o 
- 2>%t | FileCheck -check-prefix=GFX12 %s
 
 ---
 name: mad_u64_u32_vvv
@@ -18,6 +19,7 @@ body: |
 ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX10-NEXT: [[V_MAD_U64_U32_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_e64 [[COPY]], 
[[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_e64_]], implicit 
[[V_MAD_U64_U32_e64_1]]
+;
 ; GFX11-LABEL: name: mad_u64_u32_vvv
 ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
 ; GFX11-NEXT: {{  $}}
@@ -26,6 +28,15 @@ body: |
 ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX11-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit 
[[V_MAD_U64_U32_gfx11_e64_1]]
+;
+; GFX12-LABEL: name: mad_u64_u32_vvv
+; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+; GFX12-NEXT: {{  $}}
+; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
+; GFX12-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
+; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit 
[[V_MAD_U64_U32_gfx11_e64_1]]
 %0:vgpr(s32) = COPY $vgpr0
 %1:vgpr(s32) = COPY $vgpr1
 %2:vgpr(s32) = COPY $vgpr2
@@ -51,6 +62,7 @@ body: |
 ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX10-NEXT: [[V_MAD_I64_I32_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_e64 [[COPY]], 
[[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_e64_]], implicit 
[[V_MAD_I64_I32_e64_1]]
+;
 ; GFX11-LABEL: name: mad_i64_i32_vvv
 ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
 ; GFX11-NEXT: {{  $}}
@@ -59,6 +71,15 @@ body: |
 ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
 ; GFX11-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
 ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit 
[[V_MAD_I64_I32_gfx11_e64_1]]
+;
+; GFX12-LABEL: name: mad_i64_i32_vvv
+; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+; GFX12-NEXT: {{  $}}
+; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3
+; GFX12-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, 
[[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = 
V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec
+; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit 
[[V_MAD_I64_I32_gfx11_e64_1]]
 %0:vgpr(s32) = COPY $vgpr0
 %1:vgpr(s32) = COPY $vgpr1
 %2:vgpr(s32) = COPY $vgpr2
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
index 249acec639540b3..b9b03e52ec865c0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
@@ -3,6 +3,7 @@
 ; RUN: llc -march=amdgcn

[clang-tools-extra] [llvm] [clang] [AMDGPU][GFX12] Add Atomic cond_sub_u32 (PR #76224)

2024-01-15 Thread Jay Foad via cfe-commits

jayfoad wrote:

> Adding support in atomicrmw. This will require to add new operation to 
> aromicrmw "cond_sub"

Yes, and we have (Matt has) done this in the past, but it will require a wider 
consensus. I think it's fine to add AMDGPU intrinsics for this in the mean time.

https://github.com/llvm/llvm-project/pull/76224
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)

2024-01-12 Thread Jay Foad via cfe-commits


https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/77926

None

>From 3d4b8547514f2315130599230e769a8c73be01c3 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Fri, 12 Jan 2024 12:43:16 +
Subject: [PATCH] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  1 +
 clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e562ef04a30194..d0c4b664bf0313 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -410,6 +410,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", 
"nc", "fp8-insts")
 // GFX12+ only builtins.
 
//===--===//
 
+TARGET_BUILTIN(__builtin_amdgcn_s_sleep_var, "vUi", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_permlane16_var,  "UiUiUiUiIbIb", "nc", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_permlanex16_var, "UiUiUiUiIbIb", "nc", 
"gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal, "vIi", "n", "gfx12-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl
index 2899d9e5c28898..ebd367bba0cdc1 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl
@@ -5,6 +5,21 @@
 
 typedef unsigned int uint;
 
+// CHECK-LABEL: @test_s_sleep_var(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[D_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT:store i32 [[D:%.*]], ptr addrspace(5) [[D_ADDR]], align 4
+// CHECK-NEXT:[[TMP0:%.*]] = load i32, ptr addrspace(5) [[D_ADDR]], align 4
+// CHECK-NEXT:call void @llvm.amdgcn.s.sleep.var(i32 [[TMP0]])
+// CHECK-NEXT:call void @llvm.amdgcn.s.sleep.var(i32 15)
+// CHECK-NEXT:ret void
+//
+void test_s_sleep_var(int d)
+{
+  __builtin_amdgcn_s_sleep_var(d);
+  __builtin_amdgcn_s_sleep_var(15);
+}
+
 // CHECK-LABEL: @test_permlane16_var(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, 
addrspace(5)

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [libc] [mlir] [lld] [libcxx] [libclc] [llvm] [clang] [flang] [libunwind] [lldb] [compiler-rt] [AMDGPU] Fix broken sign-extended subword buffer load combine (PR #77470)

2024-01-10 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/77470
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [libclc] [lld] [flang] [mlir] [libcxx] [libunwind] [clang] [lldb] [libc] [llvm] [compiler-rt] [AMDGPU] Fix broken sign-extended subword buffer load combine (PR #77470)

2024-01-09 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/77470

>From ae231d88c5b5e2e0996edefd45389992f8e97d05 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Tue, 9 Jan 2024 13:16:24 +
Subject: [PATCH 1/3] [AMDGPU] Precommit tests for broken combine

Add tests for sign-extending the result of an unsigned subword buffer
load from the wrong width.
---
 .../llvm.amdgcn.struct.buffer.load.ll | 82 +++
 1 file changed, 82 insertions(+)

diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll
index 81c0f7557e6417..fcd7821a86897e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll
@@ -500,6 +500,47 @@ define amdgpu_ps float 
@struct_buffer_load_i8_sext__sgpr_rsrc__vgpr_vindex__vgpr
   ret float %cast
 }
 
+define amdgpu_ps float @struct_buffer_load_i8_sext_wrong_width(<4 x i32> inreg 
%rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) {
+  ; GFX8-LABEL: name: struct_buffer_load_i8_sext_wrong_width
+  ; GFX8: bb.1 (%ir-block.0):
+  ; GFX8-NEXT:   liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, 
$vgpr1
+  ; GFX8-NEXT: {{  $}}
+  ; GFX8-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2
+  ; GFX8-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3
+  ; GFX8-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4
+  ; GFX8-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5
+  ; GFX8-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], 
%subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], 
%subreg.sub3
+  ; GFX8-NEXT:   [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX8-NEXT:   [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+  ; GFX8-NEXT:   [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX8-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY4]], 
%subreg.sub0, [[COPY5]], %subreg.sub1
+  ; GFX8-NEXT:   [[BUFFER_LOAD_SBYTE_BOTHEN:%[0-9]+]]:vgpr_32 = 
BUFFER_LOAD_SBYTE_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE]], [[COPY6]], 0, 0, 
0, implicit $exec :: (dereferenceable load (s8), addrspace 8)
+  ; GFX8-NEXT:   $vgpr0 = COPY [[BUFFER_LOAD_SBYTE_BOTHEN]]
+  ; GFX8-NEXT:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  ;
+  ; GFX12-LABEL: name: struct_buffer_load_i8_sext_wrong_width
+  ; GFX12: bb.1 (%ir-block.0):
+  ; GFX12-NEXT:   liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, 
$vgpr1
+  ; GFX12-NEXT: {{  $}}
+  ; GFX12-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2
+  ; GFX12-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3
+  ; GFX12-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4
+  ; GFX12-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5
+  ; GFX12-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], 
%subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], 
%subreg.sub3
+  ; GFX12-NEXT:   [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+  ; GFX12-NEXT:   [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+  ; GFX12-NEXT:   [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6
+  ; GFX12-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY4]], 
%subreg.sub0, [[COPY5]], %subreg.sub1
+  ; GFX12-NEXT:   [[BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN:%[0-9]+]]:vgpr_32 = 
BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE]], 
[[COPY6]], 0, 0, 0, implicit $exec :: (dereferenceable load (s8), addrspace 8)
+  ; GFX12-NEXT:   $vgpr0 = COPY [[BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN]]
+  ; GFX12-NEXT:   SI_RETURN_TO_EPILOG implicit $vgpr0
+  %val = call i8 @llvm.amdgcn.struct.buffer.load.i8(<4 x i32> %rsrc, i32 
%vindex, i32 %voffset, i32 %soffset, i32 0)
+  %trunc = trunc i8 %val to i4
+  %ext = sext i4 %trunc to i32
+  %cast = bitcast i32 %ext to float
+  ret float %cast
+}
+
 define amdgpu_ps float 
@struct_buffer_load_i16_zext__sgpr_rsrc__vgpr_vindex__vgpr_voffset__sgpr_soffset(<4
 x i32> inreg %rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) {
   ; GFX8-LABEL: name: 
struct_buffer_load_i16_zext__sgpr_rsrc__vgpr_vindex__vgpr_voffset__sgpr_soffset
   ; GFX8: bb.1 (%ir-block.0):
@@ -580,6 +621,47 @@ define amdgpu_ps float 
@struct_buffer_load_i16_sext__sgpr_rsrc__vgpr_vindex__vgp
   ret float %cast
 }
 
+define amdgpu_ps float @struct_buffer_load_i16_sext_wrong_width(<4 x i32> 
inreg %rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) {
+  ; GFX8-LABEL: name: struct_buffer_load_i16_sext_wrong_width
+  ; GFX8: bb.1 (%ir-block.0):
+  ; GFX8-NEXT:   liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, 
$vgpr1
+  ; GFX8-NEXT: {{  $}}
+  ; GFX8-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2
+  ; GFX8-NEXT:   [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3
+  ; GFX8-NEXT:   [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4
+  ; GFX8-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5
+  ; GFX8-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], 
%subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], 
%subreg.sub3
+  ;

[llvm] [clang] [clang-tools-extra] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)

2024-01-09 Thread Jay Foad via cfe-commits



@@ -29,6 +29,7 @@ class SM_Pseudo  patt
   let mayStore = 0;
   let mayLoad = 1;
   let hasSideEffects = 0;
+  let maybeAtomic = 0;

jayfoad wrote:

#77443

https://github.com/llvm/llvm-project/pull/75220
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)

2024-01-09 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/75220
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] [clang] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)

2024-01-09 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/75220

>From 429d0a22cd4208eb0c854ccf98df1ba86fd3b0cb Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Tue, 12 Dec 2023 17:15:26 +
Subject: [PATCH] [AMDGPU] Flip the default value of maybeAtomic. NFCI.

In practice maybeAtomic = 0 is used to prevent SIMemoryLegalizer from
interfering with instructions that are mayLoad or mayStore but lack
MachineMemOperands. These instructions should be the exception not the
rule, so this patch sets maybeAtomic = 1 by default and only overrides
it to 0 where necessary.
---
 llvm/lib/Target/AMDGPU/BUFInstructions.td| 4 
 llvm/lib/Target/AMDGPU/DSInstructions.td | 1 -
 llvm/lib/Target/AMDGPU/EXPInstructions.td| 1 +
 llvm/lib/Target/AMDGPU/FLATInstructions.td   | 7 ---
 llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | 1 +
 llvm/lib/Target/AMDGPU/SIInstrFormats.td | 2 +-
 llvm/lib/Target/AMDGPU/SIInstructions.td | 2 +-
 llvm/lib/Target/AMDGPU/SMInstructions.td | 1 +
 8 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 44fd4ef8641270..4696ea47f9cefd 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -477,7 +477,6 @@ class MUBUF_Load_Pseudo .ret;
   let mayLoad = 0;
   let mayStore = 1;
-  let maybeAtomic = 1;
   let elements = getMUBUFElements.ret;
   let tfe = isTFE;
 }
@@ -618,7 +616,6 @@ class MUBUF_Pseudo_Store_Lds
   let LGKM_CNT = 1;
   let mayLoad = 1;
   let mayStore = 1;
-  let maybeAtomic = 1;
 
   let has_vdata = 0;
   let has_vaddr = 0;
@@ -680,7 +677,6 @@ class MUBUF_Atomic_Pseudo patt
   // Most instruction load and store data, so set this as the default.
   let mayLoad = 1;
   let mayStore = 1;
-  let maybeAtomic = 1;
 
   let hasSideEffects = 0;
   let SchedRW = [WriteLDS];
diff --git a/llvm/lib/Target/AMDGPU/EXPInstructions.td 
b/llvm/lib/Target/AMDGPU/EXPInstructions.td
index ff1d661ef6fe1d..4cfee7d013ef1a 100644
--- a/llvm/lib/Target/AMDGPU/EXPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/EXPInstructions.td
@@ -20,6 +20,7 @@ class EXPCommon : InstSI<
   let EXP_CNT = 1;
   let mayLoad = done;
   let mayStore = 1;
+  let maybeAtomic = 0;
   let UseNamedOperandTable = 1;
   let Uses = !if(row, [EXEC, M0], [EXEC]);
   let SchedRW = [WriteExport];
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td 
b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index c0251164faee8b..a1ff3af663352e 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -173,7 +173,6 @@ class FLAT_Load_Pseudo  {
@@ -221,7 +219,6 @@ class FLAT_Global_Load_AddTid_Pseudo  {
@@ -450,7 +444,6 @@ class FLAT_AtomicNoRet_Pseudo : InstSI<
   let hasSideEffects = 0;
   let mayLoad = 1;
   let mayStore = 0;
+  let maybeAtomic = 0;
 
   string Mnemonic = opName;
   let UseNamedOperandTable = 1;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrFormats.td 
b/llvm/lib/Target/AMDGPU/SIInstrFormats.td
index 585a3eb7861878..1b66d163714fbc 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrFormats.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrFormats.td
@@ -91,7 +91,7 @@ class InstSI  {
   let hasSideEffects = 1;
-  let maybeAtomic = 1;
 }
 
 let hasSideEffects = 0, mayLoad = 0, mayStore = 0, Uses = [EXEC] in {
@@ -557,6 +556,7 @@ def SI_MASKED_UNREACHABLE : SPseudoInstSI <(outs), (ins),
   let hasNoSchedulingInfo = 1;
   let FixedSize = 1;
   let isMeta = 1;
+  let maybeAtomic = 0;
 }
 
 // Used as an isel pseudo to directly emit initialization with an
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td 
b/llvm/lib/Target/AMDGPU/SMInstructions.td
index c18846483cf95a..323f49ab91f01e 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -29,6 +29,7 @@ class SM_Pseudo  patt
   let mayStore = 0;
   let mayLoad = 1;
   let hasSideEffects = 0;
+  let maybeAtomic = 0;
   let UseNamedOperandTable = 1;
   let SchedRW = [WriteSMEM];
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [clang] [AMDGPU][GFX12] Default component broadcast store (PR #76212)

2024-01-05 Thread Jay Foad via cfe-commits


https://github.com/jayfoad approved this pull request.

LGTM. @arsenm does this address your concerns?

https://github.com/llvm/llvm-project/pull/76212
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [clang] [libc] [mlir] [lldb] [flang] [llvm] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)

2024-01-02 Thread Jay Foad via cfe-commits


https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/76149
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [flang] [lldb] [libc] [mlir] [llvm] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)

2024-01-02 Thread Jay Foad via cfe-commits


jayfoad wrote:

Ping!

https://github.com/llvm/llvm-project/pull/76149
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lldb] [llvm] [mlir] [openmp] [libc] [flang] [clang] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)

2023-12-21 Thread Jay Foad via cfe-commits


https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/76149

>From b14a554a15e4de88c9afc428f9c6898090e6eb23 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 21 Dec 2023 12:00:26 +
Subject: [PATCH] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and
 intrinsic

---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  | 10 ++-
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |  1 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  1 +
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |  1 +
 llvm/lib/Target/AMDGPU/FLATInstructions.td| 11 +++-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  1 +
 ...vm.amdgcn.global.atomic.ordered.add.b64.ll | 65 +++
 llvm/test/MC/AMDGPU/gfx11_unsupported.s   |  3 +
 llvm/test/MC/AMDGPU/gfx12_asm_vflat.s | 24 +++
 .../Disassembler/AMDGPU/gfx12_dasm_vflat.txt  | 12 
 10 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.atomic.ordered.add.b64.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 51bd9b63c127ed..3985c8871e1615 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -10,6 +10,8 @@
 //
 
//===--===//
 
+def global_ptr_ty : LLVMQualPointerType<1>;
+
 class AMDGPUReadPreloadRegisterIntrinsic
   : DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
 
@@ -2353,10 +2355,10 @@ def int_amdgcn_s_get_waveid_in_workgroup :
   Intrinsic<[llvm_i32_ty], [],
 [IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, 
IntrNoFree]>;
 
-class AMDGPUGlobalAtomicRtn : Intrinsic <
+class AMDGPUGlobalAtomicRtn : 
Intrinsic <
   [vt],
-  [llvm_anyptr_ty,// vaddr
-   vt],   // vdata(VGPR)
+  [pt,  // vaddr
+   vt], // vdata(VGPR)
   [IntrArgMemOnly, IntrWillReturn, NoCapture>, IntrNoCallback, 
IntrNoFree], "",
   [SDNPMemOperand]>;
 
@@ -2486,6 +2488,8 @@ def int_amdgcn_permlanex16_var : 
ClangBuiltin<"__builtin_amdgcn_permlanex16_var"
 [IntrNoMem, IntrConvergent, IntrWillReturn,
  ImmArg>, ImmArg>, IntrNoCallback, 
IntrNoFree]>;
 
+def int_amdgcn_global_atomic_ordered_add_b64 : 
AMDGPUGlobalAtomicRtn;
+
 def int_amdgcn_flat_atomic_fmin_num   : 
AMDGPUGlobalAtomicRtn;
 def int_amdgcn_flat_atomic_fmax_num   : 
AMDGPUGlobalAtomicRtn;
 def int_amdgcn_global_atomic_fmin_num : 
AMDGPUGlobalAtomicRtn;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index eaf72d7157ee2d..36e07d944c942c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -642,6 +642,7 @@ defm int_amdgcn_global_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_csub : noret_op;
 defm int_amdgcn_flat_atomic_fadd : local_addr_space_atomic_op;
 defm int_amdgcn_ds_fadd_v2bf16 : noret_op;
+defm int_amdgcn_global_atomic_ordered_add_b64 : noret_op;
 defm int_amdgcn_flat_atomic_fmin_num : noret_op;
 defm int_amdgcn_flat_atomic_fmax_num : noret_op;
 defm int_amdgcn_global_atomic_fmin_num : noret_op;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index c9412f720c62ec..fba060464a6e74 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4690,6 +4690,7 @@ AMDGPURegisterBankInfo::getInstrMapping(const 
MachineInstr ) const {
 case Intrinsic::amdgcn_flat_atomic_fmax_num:
 case Intrinsic::amdgcn_global_atomic_fadd_v2bf16:
 case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16:
+case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
   return getDefaultMappingAllVGPR(MI);
 case Intrinsic::amdgcn_ds_ordered_add:
 case Intrinsic::amdgcn_ds_ordered_swap:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td 
b/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
index beb670669581f1..4cc8871a00fe1f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
@@ -243,6 +243,7 @@ def : SourceOfDivergence;
 def : SourceOfDivergence;
 def : SourceOfDivergence;
 def : SourceOfDivergence;
+def : SourceOfDivergence;
 def : SourceOfDivergence;
 def : SourceOfDivergence;
 def : SourceOfDivergence;
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td 
b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index 0dd2b3f5c2c912..615f8cd54d8f9c 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -926,9 +926,11 @@ defm GLOBAL_LOAD_LDS_USHORT : FLAT_Global_Load_LDS_Pseudo 
<"global_load_lds_usho
 defm GLOBAL_LOAD_LDS_SSHORT : FLAT_Global_Load_LDS_Pseudo 
<"global_load_lds_sshort">;
 defm GLOBAL_LOAD_LDS_DWORD  : FLAT_Global_Load_LDS_Pseudo 
<"global_load_lds_dword">;
 
-} // End is_flat_global = 1
-
+let

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2023-12-21 Thread Jay Foad via cfe-commits

jayfoad wrote:

> The referenced issue violates the spec for finite-only math only by
> using a return value for a constant infinity.

You mean this issue? 
https://github.com/llvm/llvm-project/commit/5a36904c515b#commitcomment-129847939

Can you explain how your patch "broke" it? If you return infinity from a 
function marked with `ninf`, I would expect your patch to have no effect, 
because `DemandedMask & Known.KnownFPClasses` will be empty so 
`getFPClassConstant` will return `nullptr`.

https://github.com/llvm/llvm-project/pull/74056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang-tools-extra] [lld] [llvm] [compiler-rt] [lldb] [libc] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Jay Foad via cfe-commits


jayfoad wrote:

> > How does this work in a case like this?
> > ```
> > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> > addrspace(3) @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0)
> > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> > addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
> > %val.3 = load float, ptr addrspace(3) @lds.3, align 4
> > ```
> > 
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > i.e.
> > ```
> > * store to known lds address `@lds.3` (this will use slot 0 and another 
> > slot e.g. slot 3?)
> > 
> > * store to unknown lds address (this will use slot 0?)
> > 
> > * load from known lds address `@lds.3` (this will use slot 3?)
> > ```
> 
> It does not know the pointer, so it uses default slot 0 and waits till 0.

Test case:
```
@lds.0 = internal addrspace(3) global [64 x float] poison, align 16
@lds.1 = internal addrspace(3) global [64 x float] poison, align 16

declare void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
nocapture, i32 %size, i32 %voffset, i32 %soffset, i32 %offset, i32 %aux)

define amdgpu_kernel void @f(<4 x i32> %rsrc, i32 %i1, i32 %i2, ptr 
addrspace(1) %out, ptr addrspace(3) %ptr) {
main_body:
  call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
@lds.0, i32 4, i32 0, i32 0, i32 0, i32 0)
  call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
%ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
  %gep.0 = getelementptr float, ptr addrspace(3) @lds.0, i32 %i1
  %gep.1 = getelementptr float, ptr addrspace(3) @lds.1, i32 %i2
  %val.0 = load volatile float, ptr addrspace(3) %gep.0, align 4
  %val.1 = load volatile float, ptr addrspace(3) %gep.1, align 4
  %out.gep.1 = getelementptr float, ptr addrspace(1) %out, i32 1
  store float %val.0, ptr addrspace(1) %out
  store float %val.1, ptr addrspace(1) %out.gep.1
  ret void
}
```
Generates:
```
s_load_dwordx8 s[4:11], s[0:1], 0x24
s_load_dword s2, s[0:1], 0x44
s_mov_b32 m0, 0
v_mov_b32_e32 v2, 0
s_waitcnt lgkmcnt(0)
buffer_load_dword off, s[4:7], 0 lds
s_mov_b32 m0, s2
s_lshl_b32 s0, s8, 2
buffer_load_dword off, s[4:7], 0 lds
s_lshl_b32 s1, s9, 2
v_mov_b32_e32 v0, s0
v_mov_b32_e32 v1, s1
s_waitcnt vmcnt(1)
ds_read_b32 v0, v0
s_waitcnt vmcnt(0)
ds_read_b32 v1, v1 offset:256
s_waitcnt lgkmcnt(0)
global_store_dwordx2 v2, v[0:1], s[10:11]
s_endpgm
```
The `s_waitcnt vmcnt(1)` seems incorrect, because the second buffer-load-to-lds 
might clobber `@lds.0`.

> I have to tell anyone interested here: before I even wrote this code it 
> didn't know of the dependency and did not wait for anything at all. Everyone 
> was happy.

I am still happy, because buffer/flat/global-load-to-lds was removed in GFX11.

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [libcxx] [lldb] [clang] [lld] [flang] [compiler-rt] [clang-tools-extra] [libc] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Jay Foad via cfe-commits


jayfoad wrote:

How does this work in a case like this?
```
call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
@lds.3, i32 4, i32 0, i32 0, i32 0, i32 0)
call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
%ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
%val.3 = load float, ptr addrspace(3) @lds.3, align 4
```
i.e.
- store to known lds address `@lds.3` (this will use slot 0 and another slot 
e.g. slot 3?)
- store to unknown lds address (this will use slot 0?)
- load from known lds address `@lds.3` (this will use slot 3?)

https://github.com/llvm/llvm-project/pull/74537
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [llvm] [clang-tools-extra] [clang] [libcxx] [libc] [compiler-rt] [AMDGPU] Produce better memoperand for LDS DMA (PR #75247)

2023-12-19 Thread Jay Foad via cfe-commits


jayfoad wrote:

> Use PoisonValue instead of nullptr for load memop as a Value.

What is the effect of that? I thought nullptr was supposed to represent an 
unknown value, so you have to conservatively assume it might alias with 
anything.

https://github.com/llvm/llvm-project/pull/75247
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [RISCV] Implement multi-lib reuse rule for RISC-V bare-metal toolchain (PR #73765)

2023-12-18 Thread Jay Foad via cfe-commits


jayfoad wrote:

The new test is crashing in my Release+Asserts build:
```
FAIL: Clang :: Driver/riscv-toolchain-gcc-multilib-reuse.c (1081 of 1081)
 TEST 'Clang :: 
Driver/riscv-toolchain-gcc-multilib-reuse.c' FAILED 
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/jayfoad2/llvm-release/bin/clang 
/home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c
-target riscv64-unknown-elf
--gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk
--print-multi-directory-march=rv32imc -mabi=ilp32| 
/home/jayfoad2/llvm-release/bin/FileCheck 
-check-prefix=GCC-MULTI-LIB-REUSE-RV32IMC-ILP32 
/home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c
+ /home/jayfoad2/llvm-release/bin/FileCheck 
-check-prefix=GCC-MULTI-LIB-REUSE-RV32IMC-ILP32 
/home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c
+ /home/jayfoad2/llvm-release/bin/clang 
/home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c
 -target riscv64-unknown-elf 
--gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk
 --print-multi-directory -march=rv32imc -mabi=ilp32
clang: 
/home/jayfoad2/git/llvm-project/clang/lib/Driver/ToolChains/CommonArgs.cpp:2189:
 void clang::driver::tools::addMultilibFlag(bool, const llvm::StringRef, 
Multilib::flags_list &): Assertion `Flag.front() == '-'' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and 
include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: /home/jayfoad2/llvm-release/bin/clang 
/home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c
 -target riscv64-unknown-elf 
--gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk
 --print-multi-directory -march=rv32imc -mabi=ilp32
1.  Compilation construction
 #0 0x070bfaf7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) 
(/home/jayfoad2/llvm-release/bin/clang+0x70bfaf7)
 #1 0x070bd6ae llvm::sys::RunSignalHandlers() 
(/home/jayfoad2/llvm-release/bin/clang+0x70bd6ae)
 #2 0x070c01ca SignalHandler(int) Signals.cpp:0:0
 #3 0x7fc909c42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x7fc909c969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x7fc909c969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x7fc909c969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x7fc909c42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x7fc909c287f3 abort ./stdlib/abort.c:81:7
 #9 0x7fc909c2871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x7fc909c39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x07b32257 clang::driver::tools::addMultilibFlag(bool, 
llvm::StringRef, std::vector, std::allocator>, 
std::allocator, 
std::allocator>>>&) (/home/jayfoad2/llvm-release/bin/clang+0x7b32257)
#12 0x07abb016 clang::driver::MultilibBuilder::flag(llvm::StringRef, 
bool) (/home/jayfoad2/llvm-release/bin/clang+0x7abb016)
#13 0x07b9ddbf findRISCVMultilibs(clang::driver::Driver const&, 
llvm::Triple const&, llvm::StringRef, llvm::opt::ArgList const&, 
clang::driver::DetectedMultilibs&) Gnu.cpp:0:0
#14 0x07b95459 
clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::ScanGCCForMultilibs(llvm::Triple
 const&, llvm::opt::ArgList const&, llvm::StringRef, bool) Gnu.cpp:0:0
#15 0x07b9b164 
clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::ScanLibDirForGCCTriple(llvm::Triple
 const&, llvm::opt::ArgList const&, std::__cxx11::basic_string, std::allocator> const&, llvm::StringRef, bool, 
bool, bool) Gnu.cpp:0:0
#16 0x07b9324c 
clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::init(llvm::Triple
 const&, llvm::opt::ArgList const&) Gnu.cpp:0:0
#17 0x07bf34ba 
clang::driver::toolchains::RISCVToolChain::RISCVToolChain(clang::driver::Driver 
const&, llvm::Triple const&, llvm::opt::ArgList const&) RISCVToolchain.cpp:0:0
#18 0x07a31458 clang::driver::Driver::getToolChain(llvm::opt::ArgList 
const&, llvm::Triple const&) const 
(/home/jayfoad2/llvm-release/bin/clang+0x7a31458)
#19 0x07a38bbe 
clang::driver::Driver::BuildCompilation(llvm::ArrayRef) 
(/home/jayfoad2/llvm-release/bin/clang+0x7a38bbe)
#20 0x04a8a25a clang_main(int, char**, llvm::ToolContext const&) 
(/home/jayfoad2/llvm-release/bin/clang+0x4a8a25a)
#21 0x04a9bb61 main (/home/jayfoad2/llvm-release/bin/clang+0x4a9bb61)
#22 0x7fc909c29d90 __libc_start_call_main 
./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#23 0x7fc909c29e40 call_init ./csu/../csu/libc-start.c:128:20
#24 0x7fc909c29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#25 0x04a875a5 _start

[llvm] [flang] [compiler-rt] [lld] [libcxx] [clang] [libcxxabi] [clang-tools-extra] [lldb] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)

2023-12-14 Thread Jay Foad via cfe-commits



@@ -3164,6 +3164,18 @@ def : GCNPat <
 (as_i1timm $bound_ctrl))
 >;
 
+class SMPrefetchGetPcPat : GCNPat <

jayfoad wrote:

This pattern also interprets the "address" argument as being an offset from PC, 
so it should also be removed from this version of the patch.

https://github.com/llvm/llvm-project/pull/74576
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 >

1 - 100 of 175 matches

Mail list logo