[clang] Compiler messages on HIP SDK for Windows (PR #97668)
jayfoad wrote: > Compiler messages on HIP SDK for Windows Please rewrite this to say what the patch does or what problem it fixes. https://github.com/llvm/llvm-project/pull/97668 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)
@@ -14,13 +14,14 @@ #define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H #include "llvm/CodeGen/MachineBasicBlock.h" -#include "llvm/CodeGen/MachinePassManager.h" #include "llvm/Pass.h" #include "llvm/Support/BranchProbability.h" namespace llvm { -class MachineBranchProbabilityInfo { +class MachineBranchProbabilityInfo : public ImmutablePass { jayfoad wrote: Why does the PR include all this unrelated stuff? Which part am I supposed to review? Normally I just look at the "Files changed" tab. https://github.com/llvm/llvm-project/pull/96442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
@@ -402,34 +413,30 @@ Value *AMDGPUAtomicOptimizerImpl::buildReduction(IRBuilder<> , // Reduce within each pair of rows (i.e. 32 lanes). assert(ST->hasPermLaneX16()); - V = B.CreateBitCast(V, IntNTy); jayfoad wrote: Please submit an NFC cleanup patch that just removes unnecessary bitcasting, before adding support for new atomic operations. https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Debug Info] Fix debug info ptr to ptr test (PR #95637)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/95637 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Debug Info] Fix debug info ptr to ptr test (PR #95637)
jayfoad wrote: I'll merge to fix the build. https://github.com/llvm/llvm-project/pull/95637 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/95373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/95373 None >From 6d326a96d2651f8836b29ff1e3edef022f41549e Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Thu, 13 Jun 2024 09:46:48 +0100 Subject: [PATCH] [llvm-project] Fix typo "seperate" --- clang-tools-extra/clangd/TidyProvider.cpp | 10 .../include/clang/Frontend/FrontendOptions.h | 2 +- .../include/clang/InstallAPI/DylibVerifier.h | 2 +- clang/lib/InstallAPI/Visitor.cpp | 2 +- clang/lib/Serialization/ASTWriterStmt.cpp | 2 +- compiler-rt/test/dfsan/custom.cpp | 2 +- .../Linux/ppc64/trivial-tls-pwr10.test| 2 +- .../FlangOmpReport/yaml_summarizer.py | 2 +- flang/lib/Semantics/check-omp-structure.cpp | 10 flang/test/Driver/mllvm_vs_mmlir.f90 | 2 +- libc/src/__support/FPUtil/x86_64/FEnvImpl.h | 2 +- .../stdio/printf_core/float_hex_converter.h | 10 .../str_to_float_comparison_test.cpp | 2 +- lld/test/wasm/data-segments.ll| 2 +- .../lldb/Expression/DWARFExpressionList.h | 2 +- lldb/include/lldb/Target/MemoryTagManager.h | 2 +- .../NativeRegisterContextLinux_arm64.cpp | 2 +- lldb/test/API/CMakeLists.txt | 2 +- .../TestGdbRemoteMemoryTagging.py | 2 +- .../DW_AT_data_bit_offset-DW_OP_stack_value.s | 2 +- llvm/include/llvm/CodeGen/LiveRegUnits.h | 2 +- llvm/include/llvm/CodeGen/MIRFormatter.h | 2 +- llvm/include/llvm/MC/MCAsmInfo.h | 2 +- llvm/include/llvm/Support/raw_socket_stream.h | 2 +- llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.h | 2 +- .../CodeGen/AssignmentTrackingAnalysis.cpp| 6 ++--- .../SelectionDAG/SelectionDAGBuilder.cpp | 4 ++-- llvm/lib/FileCheck/FileCheck.cpp | 2 +- llvm/lib/IR/DebugInfo.cpp | 2 +- llvm/lib/MC/MCPseudoProbe.cpp | 2 +- llvm/lib/Support/VirtualFileSystem.cpp| 2 +- llvm/lib/Support/raw_socket_stream.cpp| 2 +- llvm/lib/Target/ARM/ARMISelLowering.cpp | 2 +- .../Target/RISCV/RISCVMachineFunctionInfo.h | 2 +- llvm/lib/TargetParser/RISCVISAInfo.cpp| 2 +- llvm/lib/TextAPI/Utils.cpp| 2 +- llvm/lib/Transforms/IPO/Attributor.cpp| 4 ++-- .../lib/Transforms/IPO/SampleProfileProbe.cpp | 2 +- .../Scalar/RewriteStatepointsForGC.cpp| 2 +- .../Transforms/Utils/LoopUnrollRuntime.cpp| 2 +- llvm/test/CodeGen/X86/AMX/amx-greedy-ra.ll| 2 +- llvm/test/CodeGen/X86/apx/shift-eflags.ll | 24 +-- .../X86/merge-consecutive-stores-nt.ll| 4 ++-- llvm/test/CodeGen/X86/shift-eflags.ll | 24 +-- .../InstSimplify/constant-fold-fp-denormal.ll | 2 +- .../LoopVectorize/LoongArch/defaults.ll | 2 +- .../LoopVectorize/RISCV/defaults.ll | 2 +- .../split-gep-or-as-add.ll| 2 +- llvm/test/Verifier/alloc-size-failedparse.ll | 2 +- llvm/test/tools/llvm-ar/windows-path.test | 2 +- .../ELF/mirror-permissions-win.test | 2 +- llvm/tools/llvm-cov/CodeCoverage.cpp | 2 +- llvm/tools/llvm-profgen/PerfReader.cpp| 2 +- llvm/unittests/Support/Path.cpp | 4 ++-- .../Analysis/Presburger/IntegerRelation.h | 2 +- .../Analysis/Presburger/PresburgerSpace.h | 2 +- .../mlir/Dialect/OpenMP/OpenMPInterfaces.h| 2 +- .../Analysis/Presburger/PresburgerSpace.cpp | 2 +- .../lib/Conversion/GPUCommon/GPUOpsLowering.h | 2 +- .../LLVMIR/IR/BasicPtxBuilderInterface.cpp| 2 +- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 6 ++--- .../CPU/sparse_reduce_custom_prod.mlir| 2 +- .../omptarget-constant-alloca-raise.mlir | 2 +- openmp/tools/Modules/FindOpenMPTarget.cmake | 2 +- 64 files changed, 106 insertions(+), 106 deletions(-) diff --git a/clang-tools-extra/clangd/TidyProvider.cpp b/clang-tools-extra/clangd/TidyProvider.cpp index a4121df30d3df..a87238e0c0938 100644 --- a/clang-tools-extra/clangd/TidyProvider.cpp +++ b/clang-tools-extra/clangd/TidyProvider.cpp @@ -195,10 +195,10 @@ TidyProvider addTidyChecks(llvm::StringRef Checks, } TidyProvider disableUnusableChecks(llvm::ArrayRef ExtraBadChecks) { - constexpr llvm::StringLiteral Seperator(","); + constexpr llvm::StringLiteral Separator(","); static const std::string BadChecks = llvm::join_items( - Seperator, - // We want this list to start with a seperator to + Separator, + // We want this list to start with a separator to // simplify appending in the lambda. So including an // empty string here will force that. "", @@ -227,7 +227,7 @@ TidyProvider disableUnusableChecks(llvm::ArrayRef ExtraBadChecks) { for (const std::string : ExtraBadChecks) { if (Str.empty()) continue; -Size += Seperator.size(); +Size += Separator.size(); if (LLVM_LIKELY(Str.front() !=
[clang] fixup cuda-builtin-vars.cu broken in IntrRange change (PR #94639)
https://github.com/jayfoad approved this pull request. Works for me. https://github.com/llvm/llvm-project/pull/94639 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)
@@ -785,6 +785,7 @@ enum : unsigned { EF_AMDGPU_MACH_AMDGCN_GFX1200 = 0x048, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X49 = 0x049, EF_AMDGPU_MACH_AMDGCN_GFX1151 = 0x04a, + EF_AMDGPU_MACH_AMDGCN_GFX1152 = 0x055, jayfoad wrote: This table is supposed to be in ELF number order. Can you please move the new entry? Consider it pre-approved. https://github.com/llvm/llvm-project/pull/94534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() jayfoad wrote: I see now that it fails (deterministically) if the NVPTX target is not being built. https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NVPTX] Revamp NVVMIntrRange pass (PR #94422)
@@ -6,21 +6,21 @@ __attribute__((global)) void kernel(int *out) { int i = 0; - out[i++] = threadIdx.x; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() - out[i++] = threadIdx.y; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.y() - out[i++] = threadIdx.z; // CHECK: call noundef i32 @llvm.nvvm.read.ptx.sreg.tid.z() + out[i++] = threadIdx.x; // CHECK: call noundef {{.*}} i32 @llvm.nvvm.read.ptx.sreg.tid.x() jayfoad wrote: @AlexMaclean I also see this problem on some internal test machines. It seems suspicious - is there some nondeterminism? Or is there a good reason why some machines would not add the range metadata here??? https://github.com/llvm/llvm-project/pull/94422 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)
jayfoad wrote: Is there really a good use case for this? Can you use regular stores to addrspace(7) instead? @krzysz00 Also, do you really need a separate builtin for every legal type, or is there some way they can be type-overloaded? https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)
https://github.com/jayfoad approved this pull request. LGTM. Could also update `flang/cmake/modules/AddFlangOffloadRuntime.cmake` but I don't really know if it's our responsibility to update Flang. https://github.com/llvm/llvm-project/pull/94534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libclc] [llvm] [AMDGPU] Add a new target gfx1152 (PR #94534)
@@ -1534,6 +1534,12 @@ def FeatureISAVersion11_5_1 : FeatureSet< FeatureVGPRSingleUseHintInsts, Feature1_5xVGPRs])>; +def FeatureISAVersion11_5_2 : FeatureSet< jayfoad wrote: I don't have a good answer to this except "it's what we normally do". Other parts of the software stack (kernel drivers etc) need to distinguish gfx1150 from gfx1152, and I guess they don't want to map "gfx1152" -> "gfx1150" before invoking the compiler. Also, it will make it easier for us to implement gfx1152-specific optimizations and workarounds in future if there are any. https://github.com/llvm/llvm-project/pull/94534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
jayfoad wrote: There is a latent problem to do with convergence. If you add a new test case like this: ```diff diff --git a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll index 238f6ab39e83..22995083293d 100644 --- a/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll +++ b/llvm/test/CodeGen/AMDGPU/convergence-tokens.ll @@ -55,6 +55,21 @@ else: ret i32 %p } +define i64 @basic_branch_i64(i64 %src, i1 %cond) #0 { +entry: + %t = call token @llvm.experimental.convergence.anchor() + %x = add i64 %src, 1 + br i1 %cond, label %then, label %else + +then: + %r = call i64 @llvm.amdgcn.readfirstlane.i64(i64 %x) [ "convergencectrl"(token %t) ] + br label %else + +else: + %p = phi i64 [%r, %then], [%x, %entry] + ret i64 %p +} + ; CHECK-LABEL: name:basic_loop ; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR ; CHECK: bb.[[#]].loop: ``` Then it will fail with: ``` *** Bad machine code: Cannot mix controlled and uncontrolled convergence in the same function. *** ``` This is related to #87509. Since the readlane/readfirstlane/writelane intrinsics are IntrConvergent, the corresponding ISD nodes should be marked with SDNPInGlue or SDNPOptInGlue. @ssahasra FYI https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad commented: Does this need IR autoupgrade? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) + NODE_NAME_CASE(WRITELANE) jayfoad wrote: Add this to `SITargetLowering::isSDNodeSourceOfDivergence` https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) jayfoad wrote: Add these to `AMDGPUTargetLowering::isSDNodeAlwaysUniform` https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
jayfoad wrote: > 1. What's the proper way to legalize f16 and bf16 for SDAG case without > bitcasts ? (I would think "fp_extend -> LaneOp -> Fptrunc" is wrong) Bitcast to i16, anyext to i32, laneop, trunc to i16, bitcast to original type. Why wouldn't you use bitcasts? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/92232 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [libc] [libcxx] [llvm] [mlir] Fix typo "indicies" (PR #92232)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/92232 None >From a02c63497b0d60f55e1846f5a050820082fb5c86 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Wed, 15 May 2024 10:04:57 +0100 Subject: [PATCH] Fix typo "indicies" --- clang/include/clang/AST/VTTBuilder.h | 6 +- clang/lib/AST/VTTBuilder.cpp | 2 +- clang/lib/CodeGen/CGVTT.cpp | 17 ++--- clang/lib/CodeGen/CGVTables.h | 6 +- .../command/commands/DexExpectStepOrder.py| 2 +- flang/docs/HighLevelFIR.md| 2 +- flang/test/Lower/HLFIR/forall.f90 | 2 +- libc/src/stdio/printf_core/parser.h | 2 +- .../views/mdspan/CustomTestLayouts.h | 2 +- llvm/docs/GlobalISel/GenericOpcode.rst| 4 +- llvm/include/llvm/Target/Target.td| 4 +- llvm/lib/Analysis/DependenceAnalysis.cpp | 10 +-- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 12 ++-- llvm/lib/Bitcode/Writer/ValueEnumerator.cpp | 2 +- llvm/lib/Bitcode/Writer/ValueEnumerator.h | 2 +- .../LiveDebugValues/VarLocBasedImpl.cpp | 2 +- llvm/lib/CodeGen/MLRegAllocEvictAdvisor.cpp | 2 +- llvm/lib/CodeGen/PrologEpilogInserter.cpp | 2 +- llvm/lib/Support/ELFAttributeParser.cpp | 10 +-- .../Target/AArch64/AArch64ISelLowering.cpp| 2 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 2 +- .../DirectX/DXILWriter/DXILBitcodeWriter.cpp | 10 +-- .../DXILWriter/DXILValueEnumerator.cpp| 2 +- .../DirectX/DXILWriter/DXILValueEnumerator.h | 2 +- llvm/lib/Target/PowerPC/PPCISelLowering.cpp | 2 +- .../Transforms/InstCombine/InstCombinePHI.cpp | 4 +- .../Scalar/SeparateConstOffsetFromGEP.cpp | 2 +- .../Utils/SampleProfileInference.cpp | 2 +- .../Transforms/Vectorize/SLPVectorizer.cpp| 64 +-- llvm/test/CodeGen/X86/avx-vperm2x128.ll | 2 +- .../test/DebugInfo/PDB/Inputs/every-type.yaml | 4 +- ...h-directive-personalityindex-diagnostics.s | 6 +- .../InstCombine/phi-extractvalue.ll | 8 +-- .../InstCombine/phi-of-insertvalues.ll| 6 +- .../VectorCombine/X86/scalarize-vector-gep.ll | 12 ++-- .../Linalg/Transforms/Vectorization.cpp | 6 +- 36 files changed, 114 insertions(+), 113 deletions(-) diff --git a/clang/include/clang/AST/VTTBuilder.h b/clang/include/clang/AST/VTTBuilder.h index 4acbc1f9e96b2..3c19e61a8701c 100644 --- a/clang/include/clang/AST/VTTBuilder.h +++ b/clang/include/clang/AST/VTTBuilder.h @@ -92,7 +92,7 @@ class VTTBuilder { using AddressPointsMapTy = llvm::DenseMap; /// The sub-VTT indices for the bases of the most derived class. - llvm::DenseMap SubVTTIndicies; + llvm::DenseMap SubVTTIndices; /// The secondary virtual pointer indices of all subobjects of /// the most derived class. @@ -148,8 +148,8 @@ class VTTBuilder { } /// Returns a reference to the sub-VTT indices. - const llvm::DenseMap () const { -return SubVTTIndicies; + const llvm::DenseMap () const { +return SubVTTIndices; } /// Returns a reference to the secondary virtual pointer indices. diff --git a/clang/lib/AST/VTTBuilder.cpp b/clang/lib/AST/VTTBuilder.cpp index d58e875177852..464a2014c430a 100644 --- a/clang/lib/AST/VTTBuilder.cpp +++ b/clang/lib/AST/VTTBuilder.cpp @@ -189,7 +189,7 @@ void VTTBuilder::LayoutVTT(BaseSubobject Base, bool BaseIsVirtual) { if (!IsPrimaryVTT) { // Remember the sub-VTT index. -SubVTTIndicies[Base] = VTTComponents.size(); +SubVTTIndices[Base] = VTTComponents.size(); } uint64_t VTableIndex = VTTVTables.size(); diff --git a/clang/lib/CodeGen/CGVTT.cpp b/clang/lib/CodeGen/CGVTT.cpp index d2376b14dd582..4cebb750c89e8 100644 --- a/clang/lib/CodeGen/CGVTT.cpp +++ b/clang/lib/CodeGen/CGVTT.cpp @@ -138,23 +138,24 @@ uint64_t CodeGenVTables::getSubVTTIndex(const CXXRecordDecl *RD, BaseSubobject Base) { BaseSubobjectPairTy ClassSubobjectPair(RD, Base); - SubVTTIndiciesMapTy::iterator I = SubVTTIndicies.find(ClassSubobjectPair); - if (I != SubVTTIndicies.end()) + SubVTTIndicesMapTy::iterator I = SubVTTIndices.find(ClassSubobjectPair); + if (I != SubVTTIndices.end()) return I->second; VTTBuilder Builder(CGM.getContext(), RD, /*GenerateDefinition=*/false); - for (llvm::DenseMap::const_iterator I = - Builder.getSubVTTIndicies().begin(), - E = Builder.getSubVTTIndicies().end(); I != E; ++I) { + for (llvm::DenseMap::const_iterator + I = Builder.getSubVTTIndices().begin(), + E = Builder.getSubVTTIndices().end(); + I != E; ++I) { // Insert all indices. BaseSubobjectPairTy ClassSubobjectPair(RD, I->first); -SubVTTIndicies.insert(std::make_pair(ClassSubobjectPair, I->second)); +SubVTTIndices.insert(std::make_pair(ClassSubobjectPair, I->second)); } - I = SubVTTIndicies.find(ClassSubobjectPair); - assert(I !=
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -493,8 +493,8 @@ Value *AMDGPUAtomicOptimizerImpl::buildScan(IRBuilder<> , if (!ST->isWave32()) { // Combine lane 31 into lanes 32..63. V = B.CreateBitCast(V, IntNTy); - Value *const Lane31 = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {}, - {V, B.getInt32(31)}); + Value *const Lane31 = B.CreateIntrinsic( + Intrinsic::amdgcn_readlane, B.getInt32Ty(), {V, B.getInt32(31)}); jayfoad wrote: Changes like this should disappear if we merge #91583 first. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); jayfoad wrote: I think you can just reuse Src0 instead of declaring a new Src0Valid. Same for Src2, and same for the SDAG code. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register , Register , + Register ) -> Register { +auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0); +if (Src2.isValid()) + return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0); +if (Src1.isValid()) + return (LaneOpDst.addUse(Src1)).getReg(0); +return LaneOpDst.getReg(0); + }; + + Register Src1, Src2, Src0Valid, Src2Valid; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2Valid = B.buildBitcast(S32, Src2).getReg(0); +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +B.buildBitcast(DstReg, LaneOp); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOp); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0Parts.getReg(i)) + .addUse(Src1)) +.getReg(0)); jayfoad wrote: Yes, separate patch https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper , return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper , + MachineInstr , + Intrinsic::ID IID) const { + + MachineIRBuilder = Helper.MIRBuilder; + MachineRegisterInfo = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { jayfoad wrote: Can you use a `createLaneOp` helper to build the intrinsic, like you do in the SDAG path? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/jayfoad commented: LGTM overall. > add f32 pattern to select read/writelane operations Why would you need this? Don't you legalize f32 to i32? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [libc] [libclc] [libcxxabi] [lld] [lldb] [llvm] [mlir] Add clarifying parenthesis around non-trivial conditions in ternary expressions. (PR #90391)
jayfoad wrote: AMDGPU changes are fine. https://github.com/llvm/llvm-project/pull/90391 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
jayfoad wrote: Previous attempts: * https://reviews.llvm.org/D84639 * https://reviews.llvm.org/D86154 * https://reviews.llvm.org/D147732 * #87334 https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
jayfoad wrote: No further comments. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
jayfoad wrote: Can you add at least one test for a VMEM (flat or scratch or global or buffer or image) atomic without return? That should use vscnt on GFX10. Apart from that the SIInsertWaitcnts.cpp and tests look good to me. I have not reviewed the clang parts but it looks like @Pierre-vh approved them previously? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -0,0 +1,1406 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX10 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+enable-flat-scratch,+precise-memory < %s | FileCheck --check-prefixes=GFX9-FLATSCR %s +; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX11 +; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX12 + +; from atomicrmw-expand.ll +; covers flat_load, flat_atomic (atomic with return) +; +define void @syncscope_workgroup_nortn(ptr %addr, float %val) { +; GFX9-LABEL: syncscope_workgroup_nortn: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:flat_load_dword v4, v[0:1] +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], 0 +; GFX9-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX9-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX9-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX9-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4 +; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX9-NEXT:v_mov_b32_e32 v4, v3 +; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5] +; GFX9-NEXT:s_cbranch_execnz .LBB0_1 +; GFX9-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX9-NEXT:s_or_b64 exec, exec, s[4:5] +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX90A-LABEL: syncscope_workgroup_nortn: +; GFX90A: ; %bb.0: +; GFX90A-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX90A-NEXT:flat_load_dword v5, v[0:1] +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX90A-NEXT:s_mov_b64 s[4:5], 0 +; GFX90A-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX90A-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX90A-NEXT:v_add_f32_e32 v4, v5, v2 +; GFX90A-NEXT:flat_atomic_cmpswap v3, v[0:1], v[4:5] glc +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX90A-NEXT:v_cmp_eq_u32_e32 vcc, v3, v5 +; GFX90A-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX90A-NEXT:v_mov_b32_e32 v5, v3 +; GFX90A-NEXT:s_andn2_b64 exec, exec, s[4:5] +; GFX90A-NEXT:s_cbranch_execnz .LBB0_1 +; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX90A-NEXT:s_or_b64 exec, exec, s[4:5] +; GFX90A-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: syncscope_workgroup_nortn: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:flat_load_dword v4, v[0:1] +; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX10-NEXT:s_mov_b32 s4, 0 +; GFX10-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX10-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX10-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX10-NEXT:s_waitcnt_vscnt null, 0x0 +; GFX10-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_gl0_inv +; GFX10-NEXT:v_cmp_eq_u32_e32 vcc_lo, v3, v4 +; GFX10-NEXT:v_mov_b32_e32 v4, v3 +; GFX10-NEXT:s_or_b32 s4, vcc_lo, s4 +; GFX10-NEXT:s_andn2_b32 exec_lo, exec_lo, s4 +; GFX10-NEXT:s_cbranch_execnz .LBB0_1 +; GFX10-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX10-NEXT:s_or_b32 exec_lo, exec_lo, s4 +; GFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-FLATSCR-LABEL: syncscope_workgroup_nortn: +; GFX9-FLATSCR: ; %bb.0: +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:flat_load_dword v4, v[0:1] +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:s_mov_b64 s[0:1], 0 +; GFX9-FLATSCR-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX9-FLATSCR-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX9-FLATSCR-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX9-FLATSCR-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4 +; GFX9-FLATSCR-NEXT:s_or_b64 s[0:1], vcc, s[0:1] +; GFX9-FLATSCR-NEXT:v_mov_b32_e32 v4, v3 +; GFX9-FLATSCR-NEXT:s_andn2_b64 exec, exec, s[0:1] +; GFX9-FLATSCR-NEXT:s_cbranch_execnz .LBB0_1 +; GFX9-FLATSCR-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX9-FLATSCR-NEXT:s_or_b64 exec, exec, s[0:1] +; GFX9-FLATSCR-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-LABEL: syncscope_workgroup_nortn: +; GFX11: ; %bb.0: +; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-NEXT:flat_load_b32 v4, v[0:1] +; GFX11-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX11-NEXT:s_mov_b32 s0, 0 +; GFX11-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX11-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX11-NEXT:
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -0,0 +1,1406 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX10 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+enable-flat-scratch,+precise-memory < %s | FileCheck --check-prefixes=GFX9-FLATSCR %s +; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX11 +; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX12 + +; from atomicrmw-expand.ll +; covers flat_load, flat_atomic (atomic with return) +; +define void @syncscope_workgroup_nortn(ptr %addr, float %val) { +; GFX9-LABEL: syncscope_workgroup_nortn: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:flat_load_dword v4, v[0:1] +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], 0 +; GFX9-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX9-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX9-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX9-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4 +; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX9-NEXT:v_mov_b32_e32 v4, v3 +; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5] +; GFX9-NEXT:s_cbranch_execnz .LBB0_1 +; GFX9-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX9-NEXT:s_or_b64 exec, exec, s[4:5] +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX90A-LABEL: syncscope_workgroup_nortn: +; GFX90A: ; %bb.0: +; GFX90A-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX90A-NEXT:flat_load_dword v5, v[0:1] +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX90A-NEXT:s_mov_b64 s[4:5], 0 +; GFX90A-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX90A-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX90A-NEXT:v_add_f32_e32 v4, v5, v2 +; GFX90A-NEXT:flat_atomic_cmpswap v3, v[0:1], v[4:5] glc +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX90A-NEXT:v_cmp_eq_u32_e32 vcc, v3, v5 +; GFX90A-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX90A-NEXT:v_mov_b32_e32 v5, v3 +; GFX90A-NEXT:s_andn2_b64 exec, exec, s[4:5] +; GFX90A-NEXT:s_cbranch_execnz .LBB0_1 +; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX90A-NEXT:s_or_b64 exec, exec, s[4:5] +; GFX90A-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: syncscope_workgroup_nortn: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:flat_load_dword v4, v[0:1] +; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX10-NEXT:s_mov_b32 s4, 0 +; GFX10-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX10-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX10-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX10-NEXT:s_waitcnt_vscnt null, 0x0 +; GFX10-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX10-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_gl0_inv +; GFX10-NEXT:v_cmp_eq_u32_e32 vcc_lo, v3, v4 +; GFX10-NEXT:v_mov_b32_e32 v4, v3 +; GFX10-NEXT:s_or_b32 s4, vcc_lo, s4 +; GFX10-NEXT:s_andn2_b32 exec_lo, exec_lo, s4 +; GFX10-NEXT:s_cbranch_execnz .LBB0_1 +; GFX10-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX10-NEXT:s_or_b32 exec_lo, exec_lo, s4 +; GFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX9-FLATSCR-LABEL: syncscope_workgroup_nortn: +; GFX9-FLATSCR: ; %bb.0: +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:flat_load_dword v4, v[0:1] +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:s_mov_b64 s[0:1], 0 +; GFX9-FLATSCR-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX9-FLATSCR-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX9-FLATSCR-NEXT:v_add_f32_e32 v3, v4, v2 +; GFX9-FLATSCR-NEXT:flat_atomic_cmpswap v3, v[0:1], v[3:4] glc +; GFX9-FLATSCR-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-FLATSCR-NEXT:v_cmp_eq_u32_e32 vcc, v3, v4 +; GFX9-FLATSCR-NEXT:s_or_b64 s[0:1], vcc, s[0:1] +; GFX9-FLATSCR-NEXT:v_mov_b32_e32 v4, v3 +; GFX9-FLATSCR-NEXT:s_andn2_b64 exec, exec, s[0:1] +; GFX9-FLATSCR-NEXT:s_cbranch_execnz .LBB0_1 +; GFX9-FLATSCR-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX9-FLATSCR-NEXT:s_or_b64 exec, exec, s[0:1] +; GFX9-FLATSCR-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-LABEL: syncscope_workgroup_nortn: +; GFX11: ; %bb.0: +; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-NEXT:flat_load_b32 v4, v[0:1] +; GFX11-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX11-NEXT:s_mov_b32 s0, 0 +; GFX11-NEXT: .LBB0_1: ; %atomicrmw.start +; GFX11-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX11-NEXT:
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2594,12 +2594,10 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo , MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent || MOI.getFailureOrdering() == AtomicOrdering::Acquire || MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) { - Changed |= CC->insertWait(MI, MOI.getScope(), -MOI.getInstrAddrSpace(), -isAtomicRet(*MI) ? SIMemOp::LOAD : - SIMemOp::STORE, -MOI.getIsCrossAddressSpaceOrdering(), -Position::AFTER); + Changed |= jayfoad wrote: Remove this. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction , } #endif +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + AMDGPU::Waitcnt Wait; + if (ST->hasExtendedWaitCounts()) +Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0); + else +Wait = AMDGPU::Waitcnt(0, 0, 0, 0); + + if (!Inst.mayStore()) +Wait.StoreCnt = ~0u; jayfoad wrote: GFX10 introduced a separate counter for **VMEM** stores with the name VScnt. GFX12 just renamed it to STOREcnt. No architecture has a separate store counter for DS or SMEM. So `ds_add_u32 v0, v1` followed by `s_waitcnt lgkmcnt(0)` (pre-GFX12) or `s_wait_dscnt 0` (GFX12) is fine . https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction , } #endif +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + AMDGPU::Waitcnt Wait; + if (ST->hasExtendedWaitCounts()) +Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0); + else +Wait = AMDGPU::Waitcnt(0, 0, 0, 0); + + if (!Inst.mayStore()) +Wait.StoreCnt = ~0u; jayfoad wrote: ```suggestion AMDGPU::Waitcnt Wait = WCG->getAllZeroWaitcnt(Inst.mayStore()); ``` However, as a general rule: - loads and atomics-with-return update LOADcnt - stores and atomics-without-return update STOREcnt so it might be more accurate to use the condition `Inst.mayStore() && !SIInstrInfo::isAtomicRet(Inst)`. Please make sure you have tests for atomics with and without return. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -355,6 +356,18 @@ class SICacheControl { MachineBasicBlock::iterator ) const { return false; } + +public: + // The following is for supporting precise memory mode. When the feature + // precise-memory is enabled, an s_waitcnt instruction is inserted + // after each memory instruction. + + virtual bool + handleNonAtomicForPreciseMemory(MachineBasicBlock::iterator ) = 0; + /// Handles atomic instruction \p MI with \p IsAtomicWithRet indicating + /// whether \p MI returns a result. + virtual bool handleAtomicForPreciseMemory(MachineBasicBlock::iterator , jayfoad wrote: This function is never even called. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2378,6 +2409,215 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( return Changed; } +bool SIGfx6CacheControl::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + AMDGPU::Waitcnt Wait; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else { // vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst))// VMEM load +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else// LDS load +Wait.DsCnt = 0; // LgkmCnt +} else {// vector store + if (TII->isVMEM(Inst))// VMEM store +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat store +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else +Wait.DsCnt = 0; // LDS store; LgkmCnt +} + } + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx6CacheControl::handleAtomicForPreciseMemory( +MachineBasicBlock::iterator , bool IsAtomicWithRet) { + assert(MI->mayLoadOrStore()); + + AMDGPU::Waitcnt Wait; + + Wait.LoadCnt = 0; // VmCnt + Wait.DsCnt = 0; // LgkmCnt + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx10CacheControl::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + AMDGPU::Waitcnt Wait; + + bool BuildWaitCnt = true; + bool BuildVsCnt = false; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else { // vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst))// VMEM load +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else// LDS load +Wait.DsCnt = 0; // LgkmCnt +} + +// For some vector instructions, mayLoad() and mayStore() can be both true. +if (Inst.mayStore()) { // vector store; an instruction can be both + // load/store + if (TII->isVMEM(Inst)) { // VMEM store +if (!Inst.mayLoad()) + BuildWaitCnt = false; +BuildVsCnt = true; + } else if (TII->isFLAT(Inst)) { // Flat store +Wait.DsCnt = 0; // LgkmCnt +BuildVsCnt = true; + } else +Wait.DsCnt = 0; // LDS store; LgkmCnt +} + } + + MachineBasicBlock = *MI->getParent(); + if (BuildWaitCnt) { +unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); +--MI; + } + + if (BuildVsCnt) { +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT)) +.addReg(AMDGPU::SGPR_NULL, RegState::Undef) +.addImm(0); +--MI; + } + return true; +} + +bool SIGfx10CacheControl ::handleAtomicForPreciseMemory( +MachineBasicBlock::iterator , bool IsAtomicWithRet) { + assert(MI->mayLoadOrStore()); + + AMDGPU::Waitcnt Wait; + + Wait.DsCnt = 0; // LgkmCnt + if (IsAtomicWithRet) +Wait.LoadCnt = 0; // VmCnt + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + if (!IsAtomicWithRet) { +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT)) +.addReg(AMDGPU::SGPR_NULL, RegState::Undef) +.addImm(0); +--MI; + } + return true; +} + +bool SIGfx12CacheControl ::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + unsigned WaitType = 0; + // For some vector instructions, mayLoad() and mayStore() can be both true. + bool LoadAndStore = false; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; + +WaitType = AMDGPU::S_WAIT_KMCNT; + } else { // vector +if (Inst.mayLoad() && Inst.mayStore()) { + WaitType = AMDGPU::S_WAIT_LOADCNT; + LoadAndStore = true; +} else if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst)) // VMEM load +WaitType =
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2378,6 +2409,215 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal( return Changed; } +bool SIGfx6CacheControl::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + AMDGPU::Waitcnt Wait; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else { // vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst))// VMEM load +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else// LDS load +Wait.DsCnt = 0; // LgkmCnt +} else {// vector store + if (TII->isVMEM(Inst))// VMEM store +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat store +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else +Wait.DsCnt = 0; // LDS store; LgkmCnt +} + } + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx6CacheControl::handleAtomicForPreciseMemory( +MachineBasicBlock::iterator , bool IsAtomicWithRet) { + assert(MI->mayLoadOrStore()); + + AMDGPU::Waitcnt Wait; + + Wait.LoadCnt = 0; // VmCnt + Wait.DsCnt = 0; // LgkmCnt + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx10CacheControl::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + AMDGPU::Waitcnt Wait; + + bool BuildWaitCnt = true; + bool BuildVsCnt = false; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else { // vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst))// VMEM load +Wait.LoadCnt = 0; // VmCnt + else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else// LDS load +Wait.DsCnt = 0; // LgkmCnt +} + +// For some vector instructions, mayLoad() and mayStore() can be both true. +if (Inst.mayStore()) { // vector store; an instruction can be both + // load/store + if (TII->isVMEM(Inst)) { // VMEM store +if (!Inst.mayLoad()) + BuildWaitCnt = false; +BuildVsCnt = true; + } else if (TII->isFLAT(Inst)) { // Flat store +Wait.DsCnt = 0; // LgkmCnt +BuildVsCnt = true; + } else +Wait.DsCnt = 0; // LDS store; LgkmCnt +} + } + + MachineBasicBlock = *MI->getParent(); + if (BuildWaitCnt) { +unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); +--MI; + } + + if (BuildVsCnt) { +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT)) +.addReg(AMDGPU::SGPR_NULL, RegState::Undef) +.addImm(0); +--MI; + } + return true; +} + +bool SIGfx10CacheControl ::handleAtomicForPreciseMemory( +MachineBasicBlock::iterator , bool IsAtomicWithRet) { + assert(MI->mayLoadOrStore()); + + AMDGPU::Waitcnt Wait; + + Wait.DsCnt = 0; // LgkmCnt + if (IsAtomicWithRet) +Wait.LoadCnt = 0; // VmCnt + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + if (!IsAtomicWithRet) { +BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT)) +.addReg(AMDGPU::SGPR_NULL, RegState::Undef) +.addImm(0); +--MI; + } + return true; +} + +bool SIGfx12CacheControl ::handleNonAtomicForPreciseMemory( +MachineBasicBlock::iterator ) { + assert(MI->mayLoadOrStore()); + + MachineInstr = *MI; + unsigned WaitType = 0; + // For some vector instructions, mayLoad() and mayStore() can be both true. jayfoad wrote: What kind of (non-atomic) instructions is this supposed to handle? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/jayfoad requested changes to this pull request. I've added _some_ inline comments, but really I don't want to spend the time to review this properly (or maintain it, or extend it for new architectures in future). All this logic already exists in SIInsertWaitcnts. Duplicating it here is not a good design. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -157,6 +157,27 @@ static uint32_t getLit16Encoding(uint16_t Val, const MCSubtargetInfo ) { return 255; } +static uint32_t getLitBF16Encoding(uint16_t Val) { + uint16_t IntImm = getIntInlineImmEncoding(static_cast(Val)); + if (IntImm != 0) +return IntImm; + + // clang-format off + switch (Val) { jayfoad wrote: Yeah, I really don't like having 4 different copies of this list of hex values (0x3f00, 0xbf00...). https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)
@@ -157,6 +157,27 @@ static uint32_t getLit16Encoding(uint16_t Val, const MCSubtargetInfo ) { return 255; } +static uint32_t getLitBF16Encoding(uint16_t Val) { + uint16_t IntImm = getIntInlineImmEncoding(static_cast(Val)); + if (IntImm != 0) +return IntImm; + + // clang-format off + switch (Val) { jayfoad wrote: Can this call `getInlineEncodingV2BF16`? https://github.com/llvm/llvm-project/pull/80908 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
jayfoad wrote: > This logic would need updating again for GFX12. It seems like it's > duplicating a lot of knowledge which is already implemented in > SIInsertWaitcnts. Just to demonstrate, you could implement this feature in SIInsertWaitcnts for **all** supported architectures with something like: ```diff diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1..910cd094f8f2 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -2299,6 +2299,12 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction , updateEventWaitcntAfter(Inst, ); +AMDGPU::Waitcnt Wait = +AMDGPU::Waitcnt::allZeroExceptVsCnt(ST->hasExtendedWaitCounts()); +ScoreBrackets.simplifyWaitcnt(Wait); +Modified |= generateWaitcnt(Wait, std::next(Inst.getIterator()), Block, +ScoreBrackets, /*OldWaitcntInstr=*/nullptr); + #if 0 // TODO: implement resource type check controlled by options with ub = LB. // If this instruction generates a S_SETVSKIP because it is an // indexed resource, and we are on Tahiti, then it will also force ``` Handling VSCNT/STORECNT correctly is a little more complicated but not much. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/79980 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)
jayfoad wrote: After this change is there any value in having two different builtins? You could just have one that always return 64 bits. https://github.com/llvm/llvm-project/pull/80183 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)
jayfoad wrote: > Do you think it makes sense to add two gfx11 tests where _w32 variant is now > rejected with w64, and _w64 variant rejected with w32? Maybe, but i didn't have the energy to add yet more tests. > Maybe what is being printed in *-gfx10-err.cl test is enough, though. Right, that was my thinking. https://github.com/llvm/llvm-project/pull/79980 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)
@@ -21,14 +21,14 @@ void test_amdgcn_wmma_f32_16x16x16_bf16_w64(global v4f* out4f, v16h a16h, v16h b global v8s* out8s, v4i a4i, v4i b4i, v8s c8s, global v4i* out4i, v2i a2i, v2i b2i, v4i c4i) { - *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f); // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target feature gfx11-insts}} - *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f); // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target feature gfx11-insts}} - *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target feature gfx11-insts}} - *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target feature gfx11-insts}} - *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' needs target feature gfx11-insts}} - *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' needs target feature gfx11-insts}} - *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' needs target feature gfx11-insts}} - *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' needs target feature gfx11-insts}} + *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f); // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f); // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' needs target feature gfx11-insts,wavefrontsize64}} + *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' needs target feature gfx11-insts,wavefrontsize64}} } -#endif \ No newline at end of file +#endif jayfoad wrote: Yes. My editor did that. Previously there was no newline on the end of the `#endif`. Lots of tools flag that as unusual. https://github.com/llvm/llvm-project/pull/79980 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/79980 None >From cace712a8f379df3498dd76bc1f95eb4671e997c Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Tue, 30 Jan 2024 11:04:33 + Subject: [PATCH] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 34 +-- .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 16 - .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 18 +- .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 2 +- 4 files changed, 35 insertions(+), 35 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 74dfd1d214e8..e9dd8dcd0b60 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -292,23 +292,23 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wait_event_export_ready, "v", "n", "gfx11-inst // Postfix w32 indicates the builtin requires wavefront size of 32. // Postfix w64 indicates the builtin requires wavefront size of 64. //===--===// -TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, "V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, "V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts") - -TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, "V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts") -TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, "V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, "V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts,wavefrontsize32") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, "V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts,wavefrontsize32") + +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, "V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts,wavefrontsize64") +TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, "V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts,wavefrontsize64") TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtn, "UiUIi", "n", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl, "UWiUIi", "n", "gfx11-insts") diff --git
[llvm] [clang] [clang-tools-extra] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)
jayfoad wrote: > @jayfoad, can you link to the documentation where these new registers are > described? Preferably from a comment in the top of the file(s). It would make > it easier to review for correctness. ISA documentation will be linked from https://llvm.org/docs/AMDGPUUsage.html#additional-documentation when it is made public. https://github.com/llvm/llvm-project/pull/77438 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [clang] [AMDGPU] Update SITargetLowering::getAddrModeArguments (PR #78740)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78740 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [flang] [llvm] [clang] [compiler-rt] [AMDGPU] Fold operand after shrinking instruction in SIFoldOperands (PR #68426)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/68426 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[mlir] [clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
jayfoad wrote: > Also need to be updated: > > https://github.com/llvm/llvm-project/blob/bb6a4850553dd4140a5bd63187ec1b14d0b731f9/llvm/lib/Target/AMDGPU/SMInstructions.td#L14 What needs to be updated and why? https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [llvm] [AMDGPU] Update SITargetLowering::getAddrModeArguments (PR #78740)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/78740 >From c7636536d65a3792223e083dc5bacd0a8e6ff3d7 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 19 Jan 2024 16:06:00 + Subject: [PATCH] [AMDGPU] Update SITargetLowering::getAddrModeArguments Handle every intrinsic for which getTgtMemIntrinsic returns with Info.ptrVal set to one of the intrinsic's operands. A bunch of these cases were missing. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 36 +++ 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index cc0c4d4e36eaa8e..66ae9222fb50c89 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1406,31 +1406,41 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , bool SITargetLowering::getAddrModeArguments(IntrinsicInst *II, SmallVectorImpl , Type *) const { + Value *Ptr = nullptr; switch (II->getIntrinsicID()) { - case Intrinsic::amdgcn_ds_ordered_add: - case Intrinsic::amdgcn_ds_ordered_swap: + case Intrinsic::amdgcn_atomic_cond_sub_u32: case Intrinsic::amdgcn_ds_append: case Intrinsic::amdgcn_ds_consume: case Intrinsic::amdgcn_ds_fadd: - case Intrinsic::amdgcn_ds_fmin: case Intrinsic::amdgcn_ds_fmax: - case Intrinsic::amdgcn_global_atomic_fadd: + case Intrinsic::amdgcn_ds_fmin: + case Intrinsic::amdgcn_ds_ordered_add: + case Intrinsic::amdgcn_ds_ordered_swap: case Intrinsic::amdgcn_flat_atomic_fadd: - case Intrinsic::amdgcn_flat_atomic_fmin: + case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16: case Intrinsic::amdgcn_flat_atomic_fmax: - case Intrinsic::amdgcn_flat_atomic_fmin_num: case Intrinsic::amdgcn_flat_atomic_fmax_num: + case Intrinsic::amdgcn_flat_atomic_fmin: + case Intrinsic::amdgcn_flat_atomic_fmin_num: + case Intrinsic::amdgcn_global_atomic_csub: + case Intrinsic::amdgcn_global_atomic_fadd: case Intrinsic::amdgcn_global_atomic_fadd_v2bf16: - case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16: - case Intrinsic::amdgcn_global_atomic_csub: { -Value *Ptr = II->getArgOperand(0); -AccessTy = II->getType(); -Ops.push_back(Ptr); -return true; - } + case Intrinsic::amdgcn_global_atomic_fmax: + case Intrinsic::amdgcn_global_atomic_fmax_num: + case Intrinsic::amdgcn_global_atomic_fmin: + case Intrinsic::amdgcn_global_atomic_fmin_num: + case Intrinsic::amdgcn_global_atomic_ordered_add_b64: +Ptr = II->getArgOperand(0); +break; + case Intrinsic::amdgcn_global_load_lds: +Ptr = II->getArgOperand(1); +break; default: return false; } + AccessTy = II->getType(); + Ops.push_back(Ptr); + return true; } bool SITargetLowering::isLegalFlatAddressingMode(const AddrMode , ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)
@@ -4,10 +4,114 @@ typedef unsigned int uint; -kernel void test_builtins_amdgcn_gws_insts(uint a, uint b) { +#pragma OPENCL EXTENSION cl_khr_fp64:enable + +typedef float v2f __attribute__((ext_vector_type(2))); +typedef float v4f __attribute__((ext_vector_type(4))); +typedef float v16f __attribute__((ext_vector_type(16))); +typedef float v32f __attribute__((ext_vector_type(32))); +typedef half v4h __attribute__((ext_vector_type(4))); +typedef half v8h __attribute__((ext_vector_type(8))); +typedef half v16h __attribute__((ext_vector_type(16))); +typedef half v32h __attribute__((ext_vector_type(32))); +typedef intv2i __attribute__((ext_vector_type(2))); +typedef intv4i __attribute__((ext_vector_type(4))); +typedef intv16i __attribute__((ext_vector_type(16))); +typedef intv32i __attribute__((ext_vector_type(32))); +typedef short v2s __attribute__((ext_vector_type(2))); +typedef short v4s __attribute__((ext_vector_type(4))); +typedef short v8s __attribute__((ext_vector_type(8))); +typedef short v16s __attribute__((ext_vector_type(16))); +typedef short v32s __attribute__((ext_vector_type(32))); +typedef double v4d __attribute__((ext_vector_type(4))); + +void builtin_test_unsupported(global v32f*out_v32f, + global v16f*out_v16f, + global v4f* out_v4f, + global v32i*out_v32i, + global v16i*out_v16i, + global v4i* out_v4i, + global v4d* out_v4d, + global double* out_double, + double a_double , double b_double , double c_double, jayfoad wrote: Nit: you don't really need separate out/a/b/c versions of all these types. You could just test expressions like: ``` x_v32f = __builtin_amdgcn_mfma_f32_32x32x1f32(x_float, x_float, x_v32f, 0, 0, 0); ``` https://github.com/llvm/llvm-project/pull/78729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)
https://github.com/jayfoad approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/78729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)
https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/78729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo , return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction ) { + const GCNSubtarget = MF.getSubtarget(); + const SIInstrInfo *TII = ST.getInstrInfo(); + IsaVersion IV = getIsaVersion(ST.getCPU()); + + bool Changed = false; + + for (auto : MF) { +for (auto MI = MBB.begin(); MI != MBB.end();) { + MachineInstr = *MI; + ++MI; + if (Inst.mayLoadOrStore() == false) +continue; + + // Todo: if next insn is an s_waitcnt + AMDGPU::Waitcnt Wait; + + if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) { +if (TII->isSMRD(Inst)) { // scalar jayfoad wrote: This logic would need updating again for GFX12. It seems like it's duplicating a lot of knowledge which is already implemented in SIInsertWaitcnts. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [lld] [libc] [lldb] [clang-tools-extra] [clang] [openmp] [compiler-rt] [llvm] [flang] [mlir] AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (PR #79104)
https://github.com/jayfoad approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/79104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lldb] [llvm] [mlir] [openmp] [libcxx] [flang] [clang] [clang-tools-extra] [compiler-rt] [lld] [libc] AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (PR #79104)
@@ -1348,6 +1348,14 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MachineMemOperand::MOVolatile; return true; } + case Intrinsic::amdgcn_global_load_tr: { jayfoad wrote: This case should also be handled in getAdrModeArguments below. https://github.com/llvm/llvm-project/pull/79104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU][GFX12] Add tests for unsupported builtins (PR #78729)
@@ -0,0 +1,105 @@ +// REQUIRES: amdgpu-registered-target jayfoad wrote: Maybe just add these to `test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl` instead of a new file? https://github.com/llvm/llvm-project/pull/78729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Remove gws feature from GFX12 (PR #78711)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78711 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [compiler-rt] [flang] [lld] [llvm] [clang] [lldb] [clang-tools-extra] [libcxx] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)
jayfoad wrote: Can you add a GFX12 RUN line to clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl? That will probably require adding "fp8-conversion-insts" to the GFX12 part of TargetParser.cpp. You can do this in a separate patch if you want. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Remove gws feature from GFX12 (PR #78711)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/78711 This was already done for LLVM. This patch just updates the Clang builtin handling to match. >From 8ec83bbc08c6a364efda3724d5886dbd568f956f Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 19 Jan 2024 13:34:37 + Subject: [PATCH] [AMDGPU] Remove gws feature from GFX12 This was already done for LLVM. This patch just updates the Clang builtin handling to match. --- .../builtins-amdgcn-gfx12-err.cl | 27 ++- .../builtins-amdgcn-gfx12-param-err.cl| 24 + llvm/lib/TargetParser/TargetParser.cpp| 1 - 3 files changed, 32 insertions(+), 20 deletions(-) create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl index 5e0153c42825e3..bcaea9a2482d18 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-err.cl @@ -2,23 +2,12 @@ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1200 -verify -S -emit-llvm -o - %s -kernel void builtins_amdgcn_s_barrier_signal_err(global int* in, global int* out, int barrier) { - - __builtin_amdgcn_s_barrier_signal(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_signal' must be a constant integer}} - __builtin_amdgcn_s_barrier_wait(-1); - *out = *in; -} - -kernel void builtins_amdgcn_s_barrier_wait_err(global int* in, global int* out, int barrier) { - - __builtin_amdgcn_s_barrier_signal(-1); - __builtin_amdgcn_s_barrier_wait(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_wait' must be a constant integer}} - *out = *in; -} - -kernel void builtins_amdgcn_s_barrier_signal_isfirst_err(global int* in, global int* out, int barrier) { - - __builtin_amdgcn_s_barrier_signal_isfirst(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_signal_isfirst' must be a constant integer}} - __builtin_amdgcn_s_barrier_wait(-1); - *out = *in; +typedef unsigned int uint; + +kernel void test_builtins_amdgcn_gws_insts(uint a, uint b) { + __builtin_amdgcn_ds_gws_init(a, b); // expected-error {{'__builtin_amdgcn_ds_gws_init' needs target feature gws}} + __builtin_amdgcn_ds_gws_barrier(a, b); // expected-error {{'__builtin_amdgcn_ds_gws_barrier' needs target feature gws}} + __builtin_amdgcn_ds_gws_sema_v(a); // expected-error {{'__builtin_amdgcn_ds_gws_sema_v' needs target feature gws}} + __builtin_amdgcn_ds_gws_sema_br(a, b); // expected-error {{'__builtin_amdgcn_ds_gws_sema_br' needs target feature gws}} + __builtin_amdgcn_ds_gws_sema_p(a); // expected-error {{'__builtin_amdgcn_ds_gws_sema_p' needs target feature gws}} } diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl new file mode 100644 index 00..5e0153c42825e3 --- /dev/null +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12-param-err.cl @@ -0,0 +1,24 @@ +// REQUIRES: amdgpu-registered-target + +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1200 -verify -S -emit-llvm -o - %s + +kernel void builtins_amdgcn_s_barrier_signal_err(global int* in, global int* out, int barrier) { + + __builtin_amdgcn_s_barrier_signal(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_signal' must be a constant integer}} + __builtin_amdgcn_s_barrier_wait(-1); + *out = *in; +} + +kernel void builtins_amdgcn_s_barrier_wait_err(global int* in, global int* out, int barrier) { + + __builtin_amdgcn_s_barrier_signal(-1); + __builtin_amdgcn_s_barrier_wait(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_wait' must be a constant integer}} + *out = *in; +} + +kernel void builtins_amdgcn_s_barrier_signal_isfirst_err(global int* in, global int* out, int barrier) { + + __builtin_amdgcn_s_barrier_signal_isfirst(barrier); // expected-error {{'__builtin_amdgcn_s_barrier_signal_isfirst' must be a constant integer}} + __builtin_amdgcn_s_barrier_wait(-1); + *out = *in; +} diff --git a/llvm/lib/TargetParser/TargetParser.cpp b/llvm/lib/TargetParser/TargetParser.cpp index 2cfe23676d20f8..6bd477aed8fa64 100644 --- a/llvm/lib/TargetParser/TargetParser.cpp +++ b/llvm/lib/TargetParser/TargetParser.cpp @@ -295,7 +295,6 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple , Features["gfx12-insts"] = true; Features["atomic-fadd-rtn-insts"] = true; Features["image-insts"] = true; - Features["gws"] = true; break; case GK_GFX1151: case GK_GFX1150: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (PR #78709)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/78709 That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. >From b212d63828ae87b8e40f9d6de7622bc7a14ce48f Mon Sep 17 00:00:00 2001 From: pvanhout Date: Mon, 30 Oct 2023 08:03:17 +0100 Subject: [PATCH] [AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. --- clang/test/CodeGenOpenCL/amdgpu-features.cl | 4 ++-- llvm/lib/Target/AMDGPU/AMDGPU.td | 1 - llvm/lib/TargetParser/TargetParser.cpp| 1 - llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll | 4 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 7495bca72a9df5..1ba2b129f6895a 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -100,8 +100,8 @@ // GFX1103: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" // GFX1150: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" // GFX1151: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" -// GFX1200: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" -// GFX1201: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" +// GFX1200: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" +// GFX1201: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" // GFX1103-W64: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64" diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 7b7fa906b2b1a3..92985f971f17a7 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1487,7 +1487,6 @@ def FeatureISAVersion12 : FeatureSet< [FeatureGFX12, FeatureLDSBankCount32, FeatureDLInsts, - FeatureDot5Insts, FeatureDot7Insts, FeatureDot8Insts, FeatureDot9Insts, diff --git a/llvm/lib/TargetParser/TargetParser.cpp b/llvm/lib/TargetParser/TargetParser.cpp index 2cfe23676d20f8..f6d5bfe913b419 100644 --- a/llvm/lib/TargetParser/TargetParser.cpp +++ b/llvm/lib/TargetParser/TargetParser.cpp @@ -275,7 +275,6 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple , case GK_GFX1201: case GK_GFX1200: Features["ci-insts"] = true; - Features["dot5-insts"] = true; Features["dot7-insts"] = true; Features["dot8-insts"] = true; Features["dot9-insts"] = true; diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll index 240997aeb9a687..26e6bde97f499d 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll @@ -3,12 +3,14 @@ ; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100
[clang] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)
jayfoad wrote: Some of the tests in this patch need regenerating now that #77438 has been merged. https://github.com/llvm/llvm-project/pull/77795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang-tools-extra] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [AMDGPU] Update uses of new VOP2 pseudos for GFX12 (PR #78155)
@@ -1,7 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck --check-prefixes=SI %s jayfoad wrote: Done as part of a merge from main to fix conflicts. https://github.com/llvm/llvm-project/pull/78155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/77438 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78186 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [llvm] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/78186 >From d3f4ebf849f6ef1ea373e5c7f93398db6681b2b6 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Mon, 15 Jan 2024 15:02:08 + Subject: [PATCH 1/4] Add GFX11/12 test coverage --- llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll | 103 +- 1 file changed, 77 insertions(+), 26 deletions(-) diff --git a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll index 598d7a8033c2e54..2c1baeeeda21697 100644 --- a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll +++ b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll @@ -1,32 +1,83 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s - +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX9 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX11 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX12 define void @test_remat_s_getpc_b64() { -; CHECK-LABEL: test_remat_s_getpc_b64: -; CHECK: ; %bb.0: ; %entry -; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v0, s30, 0 -; CHECK-NEXT:s_getpc_b64 s[4:5] -; CHECK-NEXT:v_writelane_b32 v0, s31, 1 -; CHECK-NEXT:;;#ASMSTART -; CHECK-NEXT:;;#ASMEND -; CHECK-NEXT:;;#ASMSTART -; CHECK-NEXT:;;#ASMEND -; CHECK-NEXT:s_getpc_b64 s[4:5] -; CHECK-NEXT:v_mov_b32_e32 v1, s4 -; CHECK-NEXT:v_mov_b32_e32 v2, s5 -; CHECK-NEXT:global_store_dwordx2 v[1:2], v[1:2], off -; CHECK-NEXT:v_readlane_b32 s31, v0, 1 -; CHECK-NEXT:v_readlane_b32 s30, v0, 0 -; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload -; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:s_waitcnt vmcnt(0) -; CHECK-NEXT:s_setpc_b64 s[30:31] +; GFX9-LABEL: test_remat_s_getpc_b64: +; GFX9: ; %bb.0: ; %entry +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1 +; GFX9-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; GFX9-NEXT:s_mov_b64 exec, s[4:5] +; GFX9-NEXT:v_writelane_b32 v0, s30, 0 +; GFX9-NEXT:s_getpc_b64 s[4:5] +; GFX9-NEXT:v_writelane_b32 v0, s31, 1 +; GFX9-NEXT:;;#ASMSTART +; GFX9-NEXT:;;#ASMEND +; GFX9-NEXT:;;#ASMSTART +; GFX9-NEXT:;;#ASMEND +; GFX9-NEXT:s_getpc_b64 s[4:5] +; GFX9-NEXT:v_mov_b32_e32 v1, s4 +; GFX9-NEXT:v_mov_b32_e32 v2, s5 +; GFX9-NEXT:global_store_dwordx2 v[1:2], v[1:2], off +; GFX9-NEXT:v_readlane_b32 s31, v0, 1 +; GFX9-NEXT:v_readlane_b32 s30, v0, 0 +; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1 +; GFX9-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload +; GFX9-NEXT:s_mov_b64 exec, s[4:5] +; GFX9-NEXT:s_waitcnt vmcnt(0) +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-LABEL: test_remat_s_getpc_b64: +; GFX11: ; %bb.0: ; %entry +; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX11-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill +; GFX11-NEXT:s_mov_b32 exec_lo, s0 +; GFX11-NEXT:v_writelane_b32 v0, s30, 0 +; GFX11-NEXT:s_getpc_b64 s[0:1] +; GFX11-NEXT:;;#ASMSTART +; GFX11-NEXT:;;#ASMEND +; GFX11-NEXT:v_writelane_b32 v0, s31, 1 +; GFX11-NEXT:;;#ASMSTART +; GFX11-NEXT:;;#ASMEND +; GFX11-NEXT:s_getpc_b64 s[0:1] +; GFX11-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(VALU_DEP_2) +; GFX11-NEXT:v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0 +; GFX11-NEXT:v_readlane_b32 s31, v0, 1 +; GFX11-NEXT:v_readlane_b32 s30, v0, 0 +; GFX11-NEXT:global_store_b64 v[1:2], v[1:2], off +; GFX11-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX11-NEXT:scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload +; GFX11-NEXT:s_mov_b32 exec_lo, s0 +; GFX11-NEXT:s_waitcnt vmcnt(0) +; GFX11-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-LABEL: test_remat_s_getpc_b64: +; GFX12: ; %bb.0: ; %entry +; GFX12-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX12-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX12-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill +; GFX12-NEXT:s_mov_b32 exec_lo, s0 +; GFX12-NEXT:v_writelane_b32 v0, s30, 0 +; GFX12-NEXT:s_getpc_b64 s[0:1] +; GFX12-NEXT:;;#ASMSTART +; GFX12-NEXT:;;#ASMEND +; GFX12-NEXT:v_writelane_b32 v0, s31, 1 +; GFX12-NEXT:;;#ASMSTART +;
[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/77926 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang-tools-extra] [AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (PR #78186)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/78186 >From d3f4ebf849f6ef1ea373e5c7f93398db6681b2b6 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Mon, 15 Jan 2024 15:02:08 + Subject: [PATCH 1/4] Add GFX11/12 test coverage --- llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll | 103 +- 1 file changed, 77 insertions(+), 26 deletions(-) diff --git a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll index 598d7a8033c2e54..2c1baeeeda21697 100644 --- a/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll +++ b/llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll @@ -1,32 +1,83 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s - +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX9 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX11 +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -stress-regalloc=2 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX12 define void @test_remat_s_getpc_b64() { -; CHECK-LABEL: test_remat_s_getpc_b64: -; CHECK: ; %bb.0: ; %entry -; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v0, s30, 0 -; CHECK-NEXT:s_getpc_b64 s[4:5] -; CHECK-NEXT:v_writelane_b32 v0, s31, 1 -; CHECK-NEXT:;;#ASMSTART -; CHECK-NEXT:;;#ASMEND -; CHECK-NEXT:;;#ASMSTART -; CHECK-NEXT:;;#ASMEND -; CHECK-NEXT:s_getpc_b64 s[4:5] -; CHECK-NEXT:v_mov_b32_e32 v1, s4 -; CHECK-NEXT:v_mov_b32_e32 v2, s5 -; CHECK-NEXT:global_store_dwordx2 v[1:2], v[1:2], off -; CHECK-NEXT:v_readlane_b32 s31, v0, 1 -; CHECK-NEXT:v_readlane_b32 s30, v0, 0 -; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload -; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:s_waitcnt vmcnt(0) -; CHECK-NEXT:s_setpc_b64 s[30:31] +; GFX9-LABEL: test_remat_s_getpc_b64: +; GFX9: ; %bb.0: ; %entry +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1 +; GFX9-NEXT:buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill +; GFX9-NEXT:s_mov_b64 exec, s[4:5] +; GFX9-NEXT:v_writelane_b32 v0, s30, 0 +; GFX9-NEXT:s_getpc_b64 s[4:5] +; GFX9-NEXT:v_writelane_b32 v0, s31, 1 +; GFX9-NEXT:;;#ASMSTART +; GFX9-NEXT:;;#ASMEND +; GFX9-NEXT:;;#ASMSTART +; GFX9-NEXT:;;#ASMEND +; GFX9-NEXT:s_getpc_b64 s[4:5] +; GFX9-NEXT:v_mov_b32_e32 v1, s4 +; GFX9-NEXT:v_mov_b32_e32 v2, s5 +; GFX9-NEXT:global_store_dwordx2 v[1:2], v[1:2], off +; GFX9-NEXT:v_readlane_b32 s31, v0, 1 +; GFX9-NEXT:v_readlane_b32 s30, v0, 0 +; GFX9-NEXT:s_xor_saveexec_b64 s[4:5], -1 +; GFX9-NEXT:buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload +; GFX9-NEXT:s_mov_b64 exec, s[4:5] +; GFX9-NEXT:s_waitcnt vmcnt(0) +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX11-LABEL: test_remat_s_getpc_b64: +; GFX11: ; %bb.0: ; %entry +; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX11-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill +; GFX11-NEXT:s_mov_b32 exec_lo, s0 +; GFX11-NEXT:v_writelane_b32 v0, s30, 0 +; GFX11-NEXT:s_getpc_b64 s[0:1] +; GFX11-NEXT:;;#ASMSTART +; GFX11-NEXT:;;#ASMEND +; GFX11-NEXT:v_writelane_b32 v0, s31, 1 +; GFX11-NEXT:;;#ASMSTART +; GFX11-NEXT:;;#ASMEND +; GFX11-NEXT:s_getpc_b64 s[0:1] +; GFX11-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(VALU_DEP_2) +; GFX11-NEXT:v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0 +; GFX11-NEXT:v_readlane_b32 s31, v0, 1 +; GFX11-NEXT:v_readlane_b32 s30, v0, 0 +; GFX11-NEXT:global_store_b64 v[1:2], v[1:2], off +; GFX11-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX11-NEXT:scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload +; GFX11-NEXT:s_mov_b32 exec_lo, s0 +; GFX11-NEXT:s_waitcnt vmcnt(0) +; GFX11-NEXT:s_setpc_b64 s[30:31] +; +; GFX12-LABEL: test_remat_s_getpc_b64: +; GFX12: ; %bb.0: ; %entry +; GFX12-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX12-NEXT:s_xor_saveexec_b32 s0, -1 +; GFX12-NEXT:scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill +; GFX12-NEXT:s_mov_b32 exec_lo, s0 +; GFX12-NEXT:v_writelane_b32 v0, s30, 0 +; GFX12-NEXT:s_getpc_b64 s[0:1] +; GFX12-NEXT:;;#ASMSTART +; GFX12-NEXT:;;#ASMEND +; GFX12-NEXT:v_writelane_b32 v0, s31, 1 +; GFX12-NEXT:;;#ASMSTART +;
[clang] [clang-tools-extra] [llvm] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (PR #77929)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/77929 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (PR #77438)
jayfoad wrote: @Pierre-vh @arsen ping! (Sorry, I know it has only been a few days.) https://github.com/llvm/llvm-project/pull/77438 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [llvm] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (PR #77929)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/77929 >From 4299ba898449f782c642b0c27f0ec9970aee0a1c Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 12 Jan 2024 11:34:02 + Subject: [PATCH 1/2] [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 --- llvm/lib/Target/AMDGPU/AMDGPU.td| 3 ++- llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir | 1 + llvm/test/MC/AMDGPU/gfx12_asm_features.s| 17 + .../Disassembler/AMDGPU/gfx12_dasm_features.txt | 13 + 4 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index b27edb1e9e14bb..682ca6c57c973b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1502,7 +1502,8 @@ def FeatureISAVersion12 : FeatureSet< FeatureHasRestrictedSOffset, FeatureVGPRSingleUseHintInsts, FeatureMADIntraFwdBug, - FeatureScalarDwordx3Loads]>; + FeatureScalarDwordx3Loads, + FeatureDPPSrc1SGPR]>; //===--===// diff --git a/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir b/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir index fe1345e29f133d..7d081a1491da6e 100644 --- a/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir +++ b/llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir @@ -1,5 +1,6 @@ # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=gcn-dpp-combine -verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1100 # RUN: llc -march=amdgcn -mcpu=gfx1150 -run-pass=gcn-dpp-combine -verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1150 +# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=gcn-dpp-combine -verify-machineinstrs -o - %s | FileCheck %s -check-prefixes=GCN,GFX1150 --- diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_features.s b/llvm/test/MC/AMDGPU/gfx12_asm_features.s index 7e58bdb3b444e1..da4464c6494dbf 100644 --- a/llvm/test/MC/AMDGPU/gfx12_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx12_asm_features.s @@ -1,5 +1,22 @@ // RUN: llvm-mc -arch=amdgcn -show-encoding -mcpu=gfx1200 %s | FileCheck --check-prefix=GFX12 %s +// +// Subtargets allow src1 of VOP3 DPP instructions to be SGPR or inlinable +// constant. +// + +v_add3_u32_e64_dpp v5, v1, s2, v3 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf +// GFX1150: encoding: [0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff] + +v_add3_u32_e64_dpp v5, v1, 42, v3 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf +// GFX1150: encoding: [0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff] + +v_add3_u32_e64_dpp v5, v1, s2, v0 dpp8:[7,6,5,4,3,2,1,0] +// GFX1150: encoding: [0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05] + +v_add3_u32_e64_dpp v5, v1, 42, v0 dpp8:[7,6,5,4,3,2,1,0] +// GFX1150: encoding: [0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05] + // // Elements of CPol operand can be given in any order // diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt b/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt new file mode 100644 index 00..2c64522422ad0d --- /dev/null +++ b/llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt @@ -0,0 +1,13 @@ +# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -disassemble -show-encoding < %s | FileCheck -check-prefixes=GFX12 %s + +# GFX12: v_add3_u32_e64_dpp v5, v1, s2, v3 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf ; encoding: [0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff] +0x05,0x00,0x55,0xd6,0xfa,0x04,0x0c,0x04,0x01,0x1b,0x00,0xff + +# GFX12: v_add3_u32_e64_dpp v5, v1, 42, v3 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf ; encoding: [0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff] +0x05,0x00,0x55,0xd6,0xfa,0x54,0x0d,0x04,0x01,0x1b,0x00,0xff + +# GFX12: v_add3_u32_e64_dpp v5, v1, s2, v0 dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05] +0x05,0x00,0x55,0xd6,0xe9,0x04,0x00,0x04,0x01,0x77,0x39,0x05 + +# GFX12: v_add3_u32_e64_dpp v5, v1, 42, v0 dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05] +0x05,0x00,0x55,0xd6,0xe9,0x54,0x01,0x04,0x01,0x77,0x39,0x05 >From a65834ad3d8aed3e9cb1414d7576d5244a31f8a2 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Wed, 17 Jan 2024 14:39:09 + Subject: [PATCH 2/2] More tests --- llvm/test/MC/AMDGPU/gfx1150_asm_features.s | 6 ++ llvm/test/MC/AMDGPU/gfx12_asm_features.s | 6 ++ llvm/test/MC/Disassembler/AMDGPU/gfx1150_dasm_features.txt | 6 ++ llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_features.txt | 6 ++ 4 files changed, 24 insertions(+) diff --git a/llvm/test/MC/AMDGPU/gfx1150_asm_features.s b/llvm/test/MC/AMDGPU/gfx1150_asm_features.s index a4904c40b40ae7..55c855175a89e0 100644 ---
[flang] [libc] [llvm] [clang-tools-extra] [clang] [compiler-rt] [libcxx] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12 (PR #78191)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/78191 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/77927 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [clang] [libc] [llvm] [clang-tools-extra] [flang] [compiler-rt] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12 (PR #78191)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/78191 >From 9990fbc26ed3dc245a5127345326050acac49d66 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 21 Apr 2023 10:46:43 +0100 Subject: [PATCH] [AMDGPU] Fix llvm.amdgcn.s.wait.event.export.ready for GFX12 The meaning of bit 0 of the immediate operand of S_WAIT_EVENT has been flipped from GFX11. --- llvm/lib/Target/AMDGPU/SOPInstructions.td| 8 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll | 9 ++--- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td index 46fa3d57a21cb2..b78d900c9bbf42 100644 --- a/llvm/lib/Target/AMDGPU/SOPInstructions.td +++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td @@ -1768,10 +1768,10 @@ def : GCNPat< (S_SEXT_I32_I16 $src) >; -def : GCNPat < - (int_amdgcn_s_wait_event_export_ready), -(S_WAIT_EVENT (i16 0)) ->; +let SubtargetPredicate = isNotGFX12Plus in + def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 0))>; +let SubtargetPredicate = isGFX12Plus in + def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 1))>; // The first 10 bits of the mode register are the core FP mode on all // subtargets. diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll index 3e95e4dec67a2b..25b5ddcf946b35 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll @@ -1,8 +1,11 @@ -; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefix=GCN %s -; RUN: llc -global-isel -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefix=GCN %s +; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefixes=GCN,GFX11 %s +; RUN: llc -global-isel=1 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefixes=GCN,GFX11 %s +; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1200 < %s | FileCheck -check-prefixes=GCN,GFX12 %s +; RUN: llc -global-isel=1 -march=amdgcn -verify-machineinstrs -mcpu=gfx1200 < %s | FileCheck -check-prefixes=GCN,GFX12 %s ; GCN-LABEL: {{^}}test_wait_event: -; GCN: s_wait_event 0x0 +; GFX11: s_wait_event 0x0 +; GFX12: s_wait_event 0x1 define amdgpu_ps void @test_wait_event() #0 { entry: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [clang-tools-extra] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/77927 >From 3f3bcdb89adf032e26c95807abf5e3b23ff50e4a Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 12 Jan 2024 12:24:28 + Subject: [PATCH 1/3] Precommit extra GFX12 test coverage --- .../GlobalISel/inst-select-mad_64_32.mir | 21 ++ llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | 163 ++ llvm/test/CodeGen/AMDGPU/mad_64_32.ll | 211 ++ 3 files changed, 395 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir index 698281caca245e..6e33ef37397d6b 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir @@ -1,6 +1,7 @@ # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py # RUN: llc -march=amdgcn -mcpu=gfx1030 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX10 %s # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX11 %s +# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX12 %s --- name: mad_u64_u32_vvv @@ -18,6 +19,7 @@ body: | ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX10-NEXT: [[V_MAD_U64_U32_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_e64_]], implicit [[V_MAD_U64_U32_e64_1]] +; ; GFX11-LABEL: name: mad_u64_u32_vvv ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 ; GFX11-NEXT: {{ $}} @@ -26,6 +28,15 @@ body: | ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX11-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit [[V_MAD_U64_U32_gfx11_e64_1]] +; +; GFX12-LABEL: name: mad_u64_u32_vvv +; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 +; GFX12-NEXT: {{ $}} +; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 +; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 +; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 +; GFX12-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec +; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit [[V_MAD_U64_U32_gfx11_e64_1]] %0:vgpr(s32) = COPY $vgpr0 %1:vgpr(s32) = COPY $vgpr1 %2:vgpr(s32) = COPY $vgpr2 @@ -51,6 +62,7 @@ body: | ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX10-NEXT: [[V_MAD_I64_I32_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_e64_]], implicit [[V_MAD_I64_I32_e64_1]] +; ; GFX11-LABEL: name: mad_i64_i32_vvv ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 ; GFX11-NEXT: {{ $}} @@ -59,6 +71,15 @@ body: | ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX11-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit [[V_MAD_I64_I32_gfx11_e64_1]] +; +; GFX12-LABEL: name: mad_i64_i32_vvv +; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 +; GFX12-NEXT: {{ $}} +; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 +; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 +; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 +; GFX12-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec +; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit [[V_MAD_I64_I32_gfx11_e64_1]] %0:vgpr(s32) = COPY $vgpr0 %1:vgpr(s32) = COPY $vgpr1 %2:vgpr(s32) = COPY $vgpr2 diff --git a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll index 249acec639540b..b9b03e52ec865c 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll @@ -3,6 +3,7 @@ ; RUN: llc -march=amdgcn
[llvm] [clang] [clang-tools-extra] [AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (PR #77927)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/77927 >From 3f3bcdb89adf032e26c95807abf5e3b23ff50e4a Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 12 Jan 2024 12:24:28 + Subject: [PATCH 1/2] Precommit extra GFX12 test coverage --- .../GlobalISel/inst-select-mad_64_32.mir | 21 ++ llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | 163 ++ llvm/test/CodeGen/AMDGPU/mad_64_32.ll | 211 ++ 3 files changed, 395 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir index 698281caca245e9..6e33ef37397d6b4 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir @@ -1,6 +1,7 @@ # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py # RUN: llc -march=amdgcn -mcpu=gfx1030 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX10 %s # RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX11 %s +# RUN: llc -march=amdgcn -mcpu=gfx1200 -run-pass=instruction-select -global-isel-abort=2 -pass-remarks-missed='gisel*' -verify-machineinstrs %s -o - 2>%t | FileCheck -check-prefix=GFX12 %s --- name: mad_u64_u32_vvv @@ -18,6 +19,7 @@ body: | ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX10-NEXT: [[V_MAD_U64_U32_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_e64_]], implicit [[V_MAD_U64_U32_e64_1]] +; ; GFX11-LABEL: name: mad_u64_u32_vvv ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 ; GFX11-NEXT: {{ $}} @@ -26,6 +28,15 @@ body: | ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX11-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit [[V_MAD_U64_U32_gfx11_e64_1]] +; +; GFX12-LABEL: name: mad_u64_u32_vvv +; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 +; GFX12-NEXT: {{ $}} +; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 +; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 +; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 +; GFX12-NEXT: [[V_MAD_U64_U32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_U64_U32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_U64_U32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec +; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_U64_U32_gfx11_e64_]], implicit [[V_MAD_U64_U32_gfx11_e64_1]] %0:vgpr(s32) = COPY $vgpr0 %1:vgpr(s32) = COPY $vgpr1 %2:vgpr(s32) = COPY $vgpr2 @@ -51,6 +62,7 @@ body: | ; GFX10-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX10-NEXT: [[V_MAD_I64_I32_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX10-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_e64_]], implicit [[V_MAD_I64_I32_e64_1]] +; ; GFX11-LABEL: name: mad_i64_i32_vvv ; GFX11: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 ; GFX11-NEXT: {{ $}} @@ -59,6 +71,15 @@ body: | ; GFX11-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 ; GFX11-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit [[V_MAD_I64_I32_gfx11_e64_1]] +; +; GFX12-LABEL: name: mad_i64_i32_vvv +; GFX12: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3 +; GFX12-NEXT: {{ $}} +; GFX12-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0 +; GFX12-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 +; GFX12-NEXT: [[COPY2:%[0-9]+]]:vreg_64 = COPY $vgpr3 +; GFX12-NEXT: [[V_MAD_I64_I32_gfx11_e64_:%[0-9]+]]:vreg_64, [[V_MAD_I64_I32_gfx11_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_MAD_I64_I32_gfx11_e64 [[COPY]], [[COPY1]], [[COPY2]], 0, implicit $exec +; GFX12-NEXT: S_ENDPGM 0, implicit [[V_MAD_I64_I32_gfx11_e64_]], implicit [[V_MAD_I64_I32_gfx11_e64_1]] %0:vgpr(s32) = COPY $vgpr0 %1:vgpr(s32) = COPY $vgpr1 %2:vgpr(s32) = COPY $vgpr2 diff --git a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll index 249acec639540b3..b9b03e52ec865c0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.mulo.ll @@ -3,6 +3,7 @@ ; RUN: llc -march=amdgcn
[clang-tools-extra] [llvm] [clang] [AMDGPU][GFX12] Add Atomic cond_sub_u32 (PR #76224)
jayfoad wrote: > Adding support in atomicrmw. This will require to add new operation to > aromicrmw "cond_sub" Yes, and we have (Matt has) done this in the past, but it will require a wider consensus. I think it's fine to add AMDGPU intrinsics for this in the mean time. https://github.com/llvm/llvm-project/pull/76224 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (PR #77926)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/77926 None >From 3d4b8547514f2315130599230e769a8c73be01c3 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Fri, 12 Jan 2024 12:43:16 + Subject: [PATCH] [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl | 15 +++ 2 files changed, 16 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e562ef04a30194..d0c4b664bf0313 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -410,6 +410,7 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") // GFX12+ only builtins. //===--===// +TARGET_BUILTIN(__builtin_amdgcn_s_sleep_var, "vUi", "n", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_permlane16_var, "UiUiUiUiIbIb", "nc", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_permlanex16_var, "UiUiUiUiIbIb", "nc", "gfx12-insts") TARGET_BUILTIN(__builtin_amdgcn_s_barrier_signal, "vIi", "n", "gfx12-insts") diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl index 2899d9e5c28898..ebd367bba0cdc1 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx12.cl @@ -5,6 +5,21 @@ typedef unsigned int uint; +// CHECK-LABEL: @test_s_sleep_var( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[D_ADDR:%.*]] = alloca i32, align 4, addrspace(5) +// CHECK-NEXT:store i32 [[D:%.*]], ptr addrspace(5) [[D_ADDR]], align 4 +// CHECK-NEXT:[[TMP0:%.*]] = load i32, ptr addrspace(5) [[D_ADDR]], align 4 +// CHECK-NEXT:call void @llvm.amdgcn.s.sleep.var(i32 [[TMP0]]) +// CHECK-NEXT:call void @llvm.amdgcn.s.sleep.var(i32 15) +// CHECK-NEXT:ret void +// +void test_s_sleep_var(int d) +{ + __builtin_amdgcn_s_sleep_var(d); + __builtin_amdgcn_s_sleep_var(15); +} + // CHECK-LABEL: @test_permlane16_var( // CHECK-NEXT: entry: // CHECK-NEXT:[[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [libc] [mlir] [lld] [libcxx] [libclc] [llvm] [clang] [flang] [libunwind] [lldb] [compiler-rt] [AMDGPU] Fix broken sign-extended subword buffer load combine (PR #77470)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/77470 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [libclc] [lld] [flang] [mlir] [libcxx] [libunwind] [clang] [lldb] [libc] [llvm] [compiler-rt] [AMDGPU] Fix broken sign-extended subword buffer load combine (PR #77470)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/77470 >From ae231d88c5b5e2e0996edefd45389992f8e97d05 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Tue, 9 Jan 2024 13:16:24 + Subject: [PATCH 1/3] [AMDGPU] Precommit tests for broken combine Add tests for sign-extending the result of an unsigned subword buffer load from the wrong width. --- .../llvm.amdgcn.struct.buffer.load.ll | 82 +++ 1 file changed, 82 insertions(+) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll index 81c0f7557e6417..fcd7821a86897e 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.ll @@ -500,6 +500,47 @@ define amdgpu_ps float @struct_buffer_load_i8_sext__sgpr_rsrc__vgpr_vindex__vgpr ret float %cast } +define amdgpu_ps float @struct_buffer_load_i8_sext_wrong_width(<4 x i32> inreg %rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) { + ; GFX8-LABEL: name: struct_buffer_load_i8_sext_wrong_width + ; GFX8: bb.1 (%ir-block.0): + ; GFX8-NEXT: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; GFX8-NEXT: {{ $}} + ; GFX8-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; GFX8-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; GFX8-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; GFX8-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; GFX8-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3 + ; GFX8-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; GFX8-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; GFX8-NEXT: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; GFX8-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY4]], %subreg.sub0, [[COPY5]], %subreg.sub1 + ; GFX8-NEXT: [[BUFFER_LOAD_SBYTE_BOTHEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_SBYTE_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE]], [[COPY6]], 0, 0, 0, implicit $exec :: (dereferenceable load (s8), addrspace 8) + ; GFX8-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_SBYTE_BOTHEN]] + ; GFX8-NEXT: SI_RETURN_TO_EPILOG implicit $vgpr0 + ; + ; GFX12-LABEL: name: struct_buffer_load_i8_sext_wrong_width + ; GFX12: bb.1 (%ir-block.0): + ; GFX12-NEXT: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; GFX12-NEXT: {{ $}} + ; GFX12-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; GFX12-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; GFX12-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; GFX12-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; GFX12-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3 + ; GFX12-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0 + ; GFX12-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr1 + ; GFX12-NEXT: [[COPY6:%[0-9]+]]:sreg_32 = COPY $sgpr6 + ; GFX12-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY4]], %subreg.sub0, [[COPY5]], %subreg.sub1 + ; GFX12-NEXT: [[BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN [[REG_SEQUENCE1]], [[REG_SEQUENCE]], [[COPY6]], 0, 0, 0, implicit $exec :: (dereferenceable load (s8), addrspace 8) + ; GFX12-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_SBYTE_VBUFFER_BOTHEN]] + ; GFX12-NEXT: SI_RETURN_TO_EPILOG implicit $vgpr0 + %val = call i8 @llvm.amdgcn.struct.buffer.load.i8(<4 x i32> %rsrc, i32 %vindex, i32 %voffset, i32 %soffset, i32 0) + %trunc = trunc i8 %val to i4 + %ext = sext i4 %trunc to i32 + %cast = bitcast i32 %ext to float + ret float %cast +} + define amdgpu_ps float @struct_buffer_load_i16_zext__sgpr_rsrc__vgpr_vindex__vgpr_voffset__sgpr_soffset(<4 x i32> inreg %rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) { ; GFX8-LABEL: name: struct_buffer_load_i16_zext__sgpr_rsrc__vgpr_vindex__vgpr_voffset__sgpr_soffset ; GFX8: bb.1 (%ir-block.0): @@ -580,6 +621,47 @@ define amdgpu_ps float @struct_buffer_load_i16_sext__sgpr_rsrc__vgpr_vindex__vgp ret float %cast } +define amdgpu_ps float @struct_buffer_load_i16_sext_wrong_width(<4 x i32> inreg %rsrc, i32 %vindex, i32 %voffset, i32 inreg %soffset) { + ; GFX8-LABEL: name: struct_buffer_load_i16_sext_wrong_width + ; GFX8: bb.1 (%ir-block.0): + ; GFX8-NEXT: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $vgpr0, $vgpr1 + ; GFX8-NEXT: {{ $}} + ; GFX8-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr2 + ; GFX8-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr3 + ; GFX8-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr4 + ; GFX8-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr5 + ; GFX8-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY3]], %subreg.sub3 + ;
[llvm] [clang] [clang-tools-extra] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)
@@ -29,6 +29,7 @@ class SM_Pseudo patt let mayStore = 0; let mayLoad = 1; let hasSideEffects = 0; + let maybeAtomic = 0; jayfoad wrote: #77443 https://github.com/llvm/llvm-project/pull/75220 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [llvm] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/75220 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [AMDGPU] Flip the default value of maybeAtomic. NFCI. (PR #75220)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/75220 >From 429d0a22cd4208eb0c854ccf98df1ba86fd3b0cb Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Tue, 12 Dec 2023 17:15:26 + Subject: [PATCH] [AMDGPU] Flip the default value of maybeAtomic. NFCI. In practice maybeAtomic = 0 is used to prevent SIMemoryLegalizer from interfering with instructions that are mayLoad or mayStore but lack MachineMemOperands. These instructions should be the exception not the rule, so this patch sets maybeAtomic = 1 by default and only overrides it to 0 where necessary. --- llvm/lib/Target/AMDGPU/BUFInstructions.td| 4 llvm/lib/Target/AMDGPU/DSInstructions.td | 1 - llvm/lib/Target/AMDGPU/EXPInstructions.td| 1 + llvm/lib/Target/AMDGPU/FLATInstructions.td | 7 --- llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | 1 + llvm/lib/Target/AMDGPU/SIInstrFormats.td | 2 +- llvm/lib/Target/AMDGPU/SIInstructions.td | 2 +- llvm/lib/Target/AMDGPU/SMInstructions.td | 1 + 8 files changed, 5 insertions(+), 14 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 44fd4ef8641270..4696ea47f9cefd 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -477,7 +477,6 @@ class MUBUF_Load_Pseudo .ret; let mayLoad = 0; let mayStore = 1; - let maybeAtomic = 1; let elements = getMUBUFElements.ret; let tfe = isTFE; } @@ -618,7 +616,6 @@ class MUBUF_Pseudo_Store_Lds let LGKM_CNT = 1; let mayLoad = 1; let mayStore = 1; - let maybeAtomic = 1; let has_vdata = 0; let has_vaddr = 0; @@ -680,7 +677,6 @@ class MUBUF_Atomic_Pseudo patt // Most instruction load and store data, so set this as the default. let mayLoad = 1; let mayStore = 1; - let maybeAtomic = 1; let hasSideEffects = 0; let SchedRW = [WriteLDS]; diff --git a/llvm/lib/Target/AMDGPU/EXPInstructions.td b/llvm/lib/Target/AMDGPU/EXPInstructions.td index ff1d661ef6fe1d..4cfee7d013ef1a 100644 --- a/llvm/lib/Target/AMDGPU/EXPInstructions.td +++ b/llvm/lib/Target/AMDGPU/EXPInstructions.td @@ -20,6 +20,7 @@ class EXPCommon : InstSI< let EXP_CNT = 1; let mayLoad = done; let mayStore = 1; + let maybeAtomic = 0; let UseNamedOperandTable = 1; let Uses = !if(row, [EXEC, M0], [EXEC]); let SchedRW = [WriteExport]; diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td index c0251164faee8b..a1ff3af663352e 100644 --- a/llvm/lib/Target/AMDGPU/FLATInstructions.td +++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td @@ -173,7 +173,6 @@ class FLAT_Load_Pseudo { @@ -221,7 +219,6 @@ class FLAT_Global_Load_AddTid_Pseudo { @@ -450,7 +444,6 @@ class FLAT_AtomicNoRet_Pseudo : InstSI< let hasSideEffects = 0; let mayLoad = 1; let mayStore = 0; + let maybeAtomic = 0; string Mnemonic = opName; let UseNamedOperandTable = 1; diff --git a/llvm/lib/Target/AMDGPU/SIInstrFormats.td b/llvm/lib/Target/AMDGPU/SIInstrFormats.td index 585a3eb7861878..1b66d163714fbc 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrFormats.td +++ b/llvm/lib/Target/AMDGPU/SIInstrFormats.td @@ -91,7 +91,7 @@ class InstSI { let hasSideEffects = 1; - let maybeAtomic = 1; } let hasSideEffects = 0, mayLoad = 0, mayStore = 0, Uses = [EXEC] in { @@ -557,6 +556,7 @@ def SI_MASKED_UNREACHABLE : SPseudoInstSI <(outs), (ins), let hasNoSchedulingInfo = 1; let FixedSize = 1; let isMeta = 1; + let maybeAtomic = 0; } // Used as an isel pseudo to directly emit initialization with an diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td index c18846483cf95a..323f49ab91f01e 100644 --- a/llvm/lib/Target/AMDGPU/SMInstructions.td +++ b/llvm/lib/Target/AMDGPU/SMInstructions.td @@ -29,6 +29,7 @@ class SM_Pseudo patt let mayStore = 0; let mayLoad = 1; let hasSideEffects = 0; + let maybeAtomic = 0; let UseNamedOperandTable = 1; let SchedRW = [WriteSMEM]; ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [clang] [AMDGPU][GFX12] Default component broadcast store (PR #76212)
https://github.com/jayfoad approved this pull request. LGTM. @arsenm does this address your concerns? https://github.com/llvm/llvm-project/pull/76212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [libc] [mlir] [lldb] [flang] [llvm] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)
https://github.com/jayfoad closed https://github.com/llvm/llvm-project/pull/76149 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [flang] [lldb] [libc] [mlir] [llvm] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)
jayfoad wrote: Ping! https://github.com/llvm/llvm-project/pull/76149 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lldb] [llvm] [mlir] [openmp] [libc] [flang] [clang] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (PR #76149)
https://github.com/jayfoad updated https://github.com/llvm/llvm-project/pull/76149 >From b14a554a15e4de88c9afc428f9c6898090e6eb23 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Thu, 21 Dec 2023 12:00:26 + Subject: [PATCH] [AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic --- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 10 ++- llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | 1 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 1 + .../Target/AMDGPU/AMDGPUSearchableTables.td | 1 + llvm/lib/Target/AMDGPU/FLATInstructions.td| 11 +++- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 1 + ...vm.amdgcn.global.atomic.ordered.add.b64.ll | 65 +++ llvm/test/MC/AMDGPU/gfx11_unsupported.s | 3 + llvm/test/MC/AMDGPU/gfx12_asm_vflat.s | 24 +++ .../Disassembler/AMDGPU/gfx12_dasm_vflat.txt | 12 10 files changed, 124 insertions(+), 5 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.atomic.ordered.add.b64.ll diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 51bd9b63c127ed..3985c8871e1615 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -10,6 +10,8 @@ // //===--===// +def global_ptr_ty : LLVMQualPointerType<1>; + class AMDGPUReadPreloadRegisterIntrinsic : DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>; @@ -2353,10 +2355,10 @@ def int_amdgcn_s_get_waveid_in_workgroup : Intrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>; -class AMDGPUGlobalAtomicRtn : Intrinsic < +class AMDGPUGlobalAtomicRtn : Intrinsic < [vt], - [llvm_anyptr_ty,// vaddr - vt], // vdata(VGPR) + [pt, // vaddr + vt], // vdata(VGPR) [IntrArgMemOnly, IntrWillReturn, NoCapture>, IntrNoCallback, IntrNoFree], "", [SDNPMemOperand]>; @@ -2486,6 +2488,8 @@ def int_amdgcn_permlanex16_var : ClangBuiltin<"__builtin_amdgcn_permlanex16_var" [IntrNoMem, IntrConvergent, IntrWillReturn, ImmArg>, ImmArg>, IntrNoCallback, IntrNoFree]>; +def int_amdgcn_global_atomic_ordered_add_b64 : AMDGPUGlobalAtomicRtn; + def int_amdgcn_flat_atomic_fmin_num : AMDGPUGlobalAtomicRtn; def int_amdgcn_flat_atomic_fmax_num : AMDGPUGlobalAtomicRtn; def int_amdgcn_global_atomic_fmin_num : AMDGPUGlobalAtomicRtn; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td index eaf72d7157ee2d..36e07d944c942c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td @@ -642,6 +642,7 @@ defm int_amdgcn_global_atomic_fmax : noret_op; defm int_amdgcn_global_atomic_csub : noret_op; defm int_amdgcn_flat_atomic_fadd : local_addr_space_atomic_op; defm int_amdgcn_ds_fadd_v2bf16 : noret_op; +defm int_amdgcn_global_atomic_ordered_add_b64 : noret_op; defm int_amdgcn_flat_atomic_fmin_num : noret_op; defm int_amdgcn_flat_atomic_fmax_num : noret_op; defm int_amdgcn_global_atomic_fmin_num : noret_op; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index c9412f720c62ec..fba060464a6e74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -4690,6 +4690,7 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr ) const { case Intrinsic::amdgcn_flat_atomic_fmax_num: case Intrinsic::amdgcn_global_atomic_fadd_v2bf16: case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16: +case Intrinsic::amdgcn_global_atomic_ordered_add_b64: return getDefaultMappingAllVGPR(MI); case Intrinsic::amdgcn_ds_ordered_add: case Intrinsic::amdgcn_ds_ordered_swap: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td b/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td index beb670669581f1..4cc8871a00fe1f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td @@ -243,6 +243,7 @@ def : SourceOfDivergence; def : SourceOfDivergence; def : SourceOfDivergence; def : SourceOfDivergence; +def : SourceOfDivergence; def : SourceOfDivergence; def : SourceOfDivergence; def : SourceOfDivergence; diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td index 0dd2b3f5c2c912..615f8cd54d8f9c 100644 --- a/llvm/lib/Target/AMDGPU/FLATInstructions.td +++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td @@ -926,9 +926,11 @@ defm GLOBAL_LOAD_LDS_USHORT : FLAT_Global_Load_LDS_Pseudo <"global_load_lds_usho defm GLOBAL_LOAD_LDS_SSHORT : FLAT_Global_Load_LDS_Pseudo <"global_load_lds_sshort">; defm GLOBAL_LOAD_LDS_DWORD : FLAT_Global_Load_LDS_Pseudo <"global_load_lds_dword">; -} // End is_flat_global = 1 - +let
[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)
jayfoad wrote: > The referenced issue violates the spec for finite-only math only by > using a return value for a constant infinity. You mean this issue? https://github.com/llvm/llvm-project/commit/5a36904c515b#commitcomment-129847939 Can you explain how your patch "broke" it? If you return infinity from a function marked with `ninf`, I would expect your patch to have no effect, because `DemandedMask & Known.KnownFPClasses` will be empty so `getFPClassConstant` will return `nullptr`. https://github.com/llvm/llvm-project/pull/74056 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang-tools-extra] [lld] [llvm] [compiler-rt] [lldb] [libc] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
jayfoad wrote: > > How does this work in a case like this? > > ``` > > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr > > addrspace(3) @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0) > > call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr > > addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0) > > %val.3 = load float, ptr addrspace(3) @lds.3, align 4 > > ``` > > > > > > > > > > > > > > > > > > > > > > > > i.e. > > ``` > > * store to known lds address `@lds.3` (this will use slot 0 and another > > slot e.g. slot 3?) > > > > * store to unknown lds address (this will use slot 0?) > > > > * load from known lds address `@lds.3` (this will use slot 3?) > > ``` > > It does not know the pointer, so it uses default slot 0 and waits till 0. Test case: ``` @lds.0 = internal addrspace(3) global [64 x float] poison, align 16 @lds.1 = internal addrspace(3) global [64 x float] poison, align 16 declare void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) nocapture, i32 %size, i32 %voffset, i32 %soffset, i32 %offset, i32 %aux) define amdgpu_kernel void @f(<4 x i32> %rsrc, i32 %i1, i32 %i2, ptr addrspace(1) %out, ptr addrspace(3) %ptr) { main_body: call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) @lds.0, i32 4, i32 0, i32 0, i32 0, i32 0) call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0) %gep.0 = getelementptr float, ptr addrspace(3) @lds.0, i32 %i1 %gep.1 = getelementptr float, ptr addrspace(3) @lds.1, i32 %i2 %val.0 = load volatile float, ptr addrspace(3) %gep.0, align 4 %val.1 = load volatile float, ptr addrspace(3) %gep.1, align 4 %out.gep.1 = getelementptr float, ptr addrspace(1) %out, i32 1 store float %val.0, ptr addrspace(1) %out store float %val.1, ptr addrspace(1) %out.gep.1 ret void } ``` Generates: ``` s_load_dwordx8 s[4:11], s[0:1], 0x24 s_load_dword s2, s[0:1], 0x44 s_mov_b32 m0, 0 v_mov_b32_e32 v2, 0 s_waitcnt lgkmcnt(0) buffer_load_dword off, s[4:7], 0 lds s_mov_b32 m0, s2 s_lshl_b32 s0, s8, 2 buffer_load_dword off, s[4:7], 0 lds s_lshl_b32 s1, s9, 2 v_mov_b32_e32 v0, s0 v_mov_b32_e32 v1, s1 s_waitcnt vmcnt(1) ds_read_b32 v0, v0 s_waitcnt vmcnt(0) ds_read_b32 v1, v1 offset:256 s_waitcnt lgkmcnt(0) global_store_dwordx2 v2, v[0:1], s[10:11] s_endpgm ``` The `s_waitcnt vmcnt(1)` seems incorrect, because the second buffer-load-to-lds might clobber `@lds.0`. > I have to tell anyone interested here: before I even wrote this code it > didn't know of the dependency and did not wait for anything at all. Everyone > was happy. I am still happy, because buffer/flat/global-load-to-lds was removed in GFX11. https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [libcxx] [lldb] [clang] [lld] [flang] [compiler-rt] [clang-tools-extra] [libc] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)
jayfoad wrote: How does this work in a case like this? ``` call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0) call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0) %val.3 = load float, ptr addrspace(3) @lds.3, align 4 ``` i.e. - store to known lds address `@lds.3` (this will use slot 0 and another slot e.g. slot 3?) - store to unknown lds address (this will use slot 0?) - load from known lds address `@lds.3` (this will use slot 3?) https://github.com/llvm/llvm-project/pull/74537 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [llvm] [clang-tools-extra] [clang] [libcxx] [libc] [compiler-rt] [AMDGPU] Produce better memoperand for LDS DMA (PR #75247)
jayfoad wrote: > Use PoisonValue instead of nullptr for load memop as a Value. What is the effect of that? I thought nullptr was supposed to represent an unknown value, so you have to conservatively assume it might alias with anything. https://github.com/llvm/llvm-project/pull/75247 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [RISCV] Implement multi-lib reuse rule for RISC-V bare-metal toolchain (PR #73765)
jayfoad wrote: The new test is crashing in my Release+Asserts build: ``` FAIL: Clang :: Driver/riscv-toolchain-gcc-multilib-reuse.c (1081 of 1081) TEST 'Clang :: Driver/riscv-toolchain-gcc-multilib-reuse.c' FAILED Exit Code: 2 Command Output (stderr): -- RUN: at line 1: /home/jayfoad2/llvm-release/bin/clang /home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c -target riscv64-unknown-elf --gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk --print-multi-directory-march=rv32imc -mabi=ilp32| /home/jayfoad2/llvm-release/bin/FileCheck -check-prefix=GCC-MULTI-LIB-REUSE-RV32IMC-ILP32 /home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c + /home/jayfoad2/llvm-release/bin/FileCheck -check-prefix=GCC-MULTI-LIB-REUSE-RV32IMC-ILP32 /home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c + /home/jayfoad2/llvm-release/bin/clang /home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c -target riscv64-unknown-elf --gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk --print-multi-directory -march=rv32imc -mabi=ilp32 clang: /home/jayfoad2/git/llvm-project/clang/lib/Driver/ToolChains/CommonArgs.cpp:2189: void clang::driver::tools::addMultilibFlag(bool, const llvm::StringRef, Multilib::flags_list &): Assertion `Flag.front() == '-'' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/jayfoad2/llvm-release/bin/clang /home/jayfoad2/git/llvm-project/clang/test/Driver/riscv-toolchain-gcc-multilib-reuse.c -target riscv64-unknown-elf --gcc-toolchain=/home/jayfoad2/git/llvm-project/clang/test/Driver/Inputs/multilib_riscv_elf_sdk --print-multi-directory -march=rv32imc -mabi=ilp32 1. Compilation construction #0 0x070bfaf7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/jayfoad2/llvm-release/bin/clang+0x70bfaf7) #1 0x070bd6ae llvm::sys::RunSignalHandlers() (/home/jayfoad2/llvm-release/bin/clang+0x70bd6ae) #2 0x070c01ca SignalHandler(int) Signals.cpp:0:0 #3 0x7fc909c42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) #4 0x7fc909c969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #5 0x7fc909c969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10 #6 0x7fc909c969fc pthread_kill ./nptl/pthread_kill.c:89:10 #7 0x7fc909c42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6 #8 0x7fc909c287f3 abort ./stdlib/abort.c:81:7 #9 0x7fc909c2871b _nl_load_domain ./intl/loadmsgcat.c:1177:9 #10 0x7fc909c39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96) #11 0x07b32257 clang::driver::tools::addMultilibFlag(bool, llvm::StringRef, std::vector, std::allocator>, std::allocator, std::allocator>>>&) (/home/jayfoad2/llvm-release/bin/clang+0x7b32257) #12 0x07abb016 clang::driver::MultilibBuilder::flag(llvm::StringRef, bool) (/home/jayfoad2/llvm-release/bin/clang+0x7abb016) #13 0x07b9ddbf findRISCVMultilibs(clang::driver::Driver const&, llvm::Triple const&, llvm::StringRef, llvm::opt::ArgList const&, clang::driver::DetectedMultilibs&) Gnu.cpp:0:0 #14 0x07b95459 clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::ScanGCCForMultilibs(llvm::Triple const&, llvm::opt::ArgList const&, llvm::StringRef, bool) Gnu.cpp:0:0 #15 0x07b9b164 clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::ScanLibDirForGCCTriple(llvm::Triple const&, llvm::opt::ArgList const&, std::__cxx11::basic_string, std::allocator> const&, llvm::StringRef, bool, bool, bool) Gnu.cpp:0:0 #16 0x07b9324c clang::driver::toolchains::Generic_GCC::GCCInstallationDetector::init(llvm::Triple const&, llvm::opt::ArgList const&) Gnu.cpp:0:0 #17 0x07bf34ba clang::driver::toolchains::RISCVToolChain::RISCVToolChain(clang::driver::Driver const&, llvm::Triple const&, llvm::opt::ArgList const&) RISCVToolchain.cpp:0:0 #18 0x07a31458 clang::driver::Driver::getToolChain(llvm::opt::ArgList const&, llvm::Triple const&) const (/home/jayfoad2/llvm-release/bin/clang+0x7a31458) #19 0x07a38bbe clang::driver::Driver::BuildCompilation(llvm::ArrayRef) (/home/jayfoad2/llvm-release/bin/clang+0x7a38bbe) #20 0x04a8a25a clang_main(int, char**, llvm::ToolContext const&) (/home/jayfoad2/llvm-release/bin/clang+0x4a8a25a) #21 0x04a9bb61 main (/home/jayfoad2/llvm-release/bin/clang+0x4a9bb61) #22 0x7fc909c29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #23 0x7fc909c29e40 call_init ./csu/../csu/libc-start.c:128:20 #24 0x7fc909c29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5 #25 0x04a875a5 _start
[llvm] [flang] [compiler-rt] [lld] [libcxx] [clang] [libcxxabi] [clang-tools-extra] [lldb] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)
@@ -3164,6 +3164,18 @@ def : GCNPat < (as_i1timm $bound_ctrl)) >; +class SMPrefetchGetPcPat : GCNPat < jayfoad wrote: This pattern also interprets the "address" argument as being an offset from PC, so it should also be removed from this version of the patch. https://github.com/llvm/llvm-project/pull/74576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits