[llvm-branch-commits] [llvm] release/18.x: [PPCMergeStringPool] Avoid replacing constant with instruction (#88846) (PR #91557)
https://github.com/nikic created https://github.com/llvm/llvm-project/pull/91557 Backport of 3a3aeb8eba40e981d3a9ff92175f949c2f3d4434 to the release branch. >From 7764bb3a47f241ca4e4d3fe42e96ab6bdecbdbe0 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Thu, 9 May 2024 13:27:20 +0900 Subject: [PATCH] [PPCMergeStringPool] Avoid replacing constant with instruction (#88846) String pool merging currently, for a reason that's not entirely clear to me, tries to create GEP instructions instead of GEP constant expressions when replacing constant references. It only uses constant expressions in cases where this is required. However, it does not catch all cases where such a requirement exists. For example, the landingpad catch clause has to be a constant. Fix this by always using the constant expression variant, which also makes the implementation simpler. Additionally, there are some edge cases where even replacement with a constant GEP is not legal. The one I am aware of is the llvm.eh.typeid.for intrinsic, so add a special case to forbid replacements for it. Fixes https://github.com/llvm/llvm-project/issues/88844. (cherry picked from commit 3a3aeb8eba40e981d3a9ff92175f949c2f3d4434) --- .../lib/Target/PowerPC/PPCMergeStringPool.cpp | 57 ++- .../mergeable-string-pool-exceptions.ll | 47 +++ .../mergeable-string-pool-pass-only.mir | 18 +++--- 3 files changed, 73 insertions(+), 49 deletions(-) create mode 100644 llvm/test/CodeGen/PowerPC/mergeable-string-pool-exceptions.ll diff --git a/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp b/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp index d9465e86d8966..ebd876d50c44e 100644 --- a/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp +++ b/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp @@ -23,6 +23,7 @@ #include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Instructions.h" +#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/Module.h" #include "llvm/IR/ValueSymbolTable.h" #include "llvm/Pass.h" @@ -116,9 +117,20 @@ class PPCMergeStringPool : public ModulePass { // sure that they can be replaced. static bool hasReplaceableUsers(GlobalVariable ) { for (User *CurrentUser : GV.users()) { -// Instruction users are always valid. -if (isa(CurrentUser)) +if (auto *I = dyn_cast(CurrentUser)) { + // Do not merge globals in exception pads. + if (I->isEHPad()) +return false; + + if (auto *II = dyn_cast(I)) { +// Some intrinsics require a plain global. +if (II->getIntrinsicID() == Intrinsic::eh_typeid_for) + return false; + } + + // Other instruction users are always valid. continue; +} // We cannot replace GlobalValue users because they are not just nodes // in IR. To replace a user like this we would need to create a new @@ -302,14 +314,6 @@ void PPCMergeStringPool::replaceUsesWithGEP(GlobalVariable *GlobalToReplace, Users.push_back(CurrentUser); for (User *CurrentUser : Users) { -Instruction *UserInstruction = dyn_cast(CurrentUser); -Constant *UserConstant = dyn_cast(CurrentUser); - -// At this point we expect that the user is either an instruction or a -// constant. -assert((UserConstant || UserInstruction) && - "Expected the user to be an instruction or a constant."); - // The user was not found so it must have been replaced earlier. if (!userHasOperand(CurrentUser, GlobalToReplace)) continue; @@ -318,38 +322,13 @@ void PPCMergeStringPool::replaceUsesWithGEP(GlobalVariable *GlobalToReplace, if (isa(CurrentUser)) continue; -if (!UserInstruction) { - // User is a constant type. - Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr( - PooledStructType, GPool, Indices); - UserConstant->handleOperandChange(GlobalToReplace, ConstGEP); - continue; -} - -if (PHINode *UserPHI = dyn_cast(UserInstruction)) { - // GEP instructions cannot be added before PHI nodes. - // With getInBoundsGetElementPtr we create the GEP and then replace it - // inline into the PHI. - Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr( - PooledStructType, GPool, Indices); - UserPHI->replaceUsesOfWith(GlobalToReplace, ConstGEP); - continue; -} -// The user is a valid instruction that is not a PHINode. -GetElementPtrInst *GEPInst = -GetElementPtrInst::Create(PooledStructType, GPool, Indices); -GEPInst->insertBefore(UserInstruction); - -LLVM_DEBUG(dbgs() << "Inserting GEP before:\n"); -LLVM_DEBUG(UserInstruction->dump()); - +Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr( +PooledStructType, GPool, Indices); LLVM_DEBUG(dbgs() << "Replacing this global:\n"); LLVM_DEBUG(GlobalToReplace->dump()); LLVM_DEBUG(dbgs() << "with this:\n"); -LLVM_DEBUG(GEPInst->dump()); - -//
[llvm-branch-commits] [llvm] release/18.x: [PPCMergeStringPool] Avoid replacing constant with instruction (#88846) (PR #91557)
https://github.com/nikic milestoned https://github.com/llvm/llvm-project/pull/91557 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)
https://github.com/illwieckz updated https://github.com/llvm/llvm-project/pull/91553 >From dcb8d6bea11cabb60483bd3e12aa4df7b76ca204 Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Thu, 9 May 2024 05:18:35 +0200 Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader Fixes https://github.com/llvm/llvm-project/issues/91551 --- libclc/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt index fa1d8e4adbcc4..b7f8bb18c2288 100644 --- a/libclc/CMakeLists.txt +++ b/libclc/CMakeLists.txt @@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} ) set(LLVM_LINK_COMPONENTS BitReader BitWriter + IRReader Core Support ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)
https://github.com/illwieckz updated https://github.com/llvm/llvm-project/pull/91553 >From 604b95fa0ea0278eadfb631ee2ac15386f85edaf Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Thu, 9 May 2024 05:18:35 +0200 Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader Fixes https://github.com/llvm/llvm-project/issues/91551. --- libclc/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt index fa1d8e4adbcc4..b7f8bb18c2288 100644 --- a/libclc/CMakeLists.txt +++ b/libclc/CMakeLists.txt @@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} ) set(LLVM_LINK_COMPONENTS BitReader BitWriter + IRReader Core Support ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)
https://github.com/illwieckz updated https://github.com/llvm/llvm-project/pull/91553 >From 1326001c4386a0296f1e6230c6a5228d9109ee12 Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Thu, 9 May 2024 05:18:35 +0200 Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader Fixes https://github.com/llvm/llvm-project/issues/91551. --- libclc/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt index fa1d8e4adbcc4..b7f8bb18c2288 100644 --- a/libclc/CMakeLists.txt +++ b/libclc/CMakeLists.txt @@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} ) set(LLVM_LINK_COMPONENTS BitReader BitWriter + IRReader Core Support ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)
https://github.com/illwieckz edited https://github.com/llvm/llvm-project/pull/91553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libclc] [libclc] Fix linking against libIRReader (release/18.x) (PR #91553)
https://github.com/illwieckz created https://github.com/llvm/llvm-project/pull/91553 Fixes #91551: - https://github.com/llvm/llvm-project/issues/91551 The patch is not needed in `main` because another larger patch already merged in `main` includes this change: https://github.com/llvm/llvm-project/commit/61efea7142e904e6492e1ce0566ec23d9d221c1e . This one line patch is enough to fix the build on LLVM 18 branch so it's probably a good idea to merge it, it's obvious, non-intrusive and can't do harm. >From 8a040018e59c9cb9e745885f5292f0e7967197ee Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Thu, 9 May 2024 05:18:35 +0200 Subject: [PATCH] [libclc] Fix linking against libIRReader Fixes https://github.com/llvm/llvm-project/issues/91551. --- libclc/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt index fa1d8e4adbcc4..b7f8bb18c2288 100644 --- a/libclc/CMakeLists.txt +++ b/libclc/CMakeLists.txt @@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} ) set(LLVM_LINK_COMPONENTS BitReader BitWriter + IRReader Core Support ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: AtariDreams (AtariDreams) Changes As well as flipping the sense of the bit, GFX12 moved it from bit 0 to bit 1 in the encoded simm16 operand. (cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d) --- Full diff: https://github.com/llvm/llvm-project/pull/91034.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SOPInstructions.td (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll (+3-7) ``diff diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td index ae5ef0541929b..5762efde73f02 100644 --- a/llvm/lib/Target/AMDGPU/SOPInstructions.td +++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td @@ -1786,7 +1786,7 @@ def : GCNPat< let SubtargetPredicate = isNotGFX12Plus in def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 0))>; let SubtargetPredicate = isGFX12Plus in - def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 1))>; + def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 2))>; // The first 10 bits of the mode register are the core FP mode on all // subtargets. diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll index 08c77148f6ae1..433fefa434988 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll @@ -5,14 +5,10 @@ ; GCN-LABEL: {{^}}test_wait_event: ; GFX11: s_wait_event 0x0 -; GFX12: s_wait_event 0x1 +; GFX12: s_wait_event 0x2 -define amdgpu_ps void @test_wait_event() #0 { +define amdgpu_ps void @test_wait_event() { entry: - call void @llvm.amdgcn.s.wait.event.export.ready() #0 + call void @llvm.amdgcn.s.wait.event.export.ready() ret void } - -declare void @llvm.amdgcn.s.wait.event.export.ready() #0 - -attributes #0 = { nounwind } `` https://github.com/llvm/llvm-project/pull/91034 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91034 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bce9393 - [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622)
Author: Jay Foad Date: 2024-05-08T20:17:31-07:00 New Revision: bce9393291a2daa8006d1da629aa2765e00f4e70 URL: https://github.com/llvm/llvm-project/commit/bce9393291a2daa8006d1da629aa2765e00f4e70 DIFF: https://github.com/llvm/llvm-project/commit/bce9393291a2daa8006d1da629aa2765e00f4e70.diff LOG: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) As well as flipping the sense of the bit, GFX12 moved it from bit 0 to bit 1 in the encoded simm16 operand. (cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d) Added: Modified: llvm/lib/Target/AMDGPU/SOPInstructions.td llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td index ae5ef0541929b..5762efde73f02 100644 --- a/llvm/lib/Target/AMDGPU/SOPInstructions.td +++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td @@ -1786,7 +1786,7 @@ def : GCNPat< let SubtargetPredicate = isNotGFX12Plus in def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 0))>; let SubtargetPredicate = isGFX12Plus in - def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 1))>; + def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 2))>; // The first 10 bits of the mode register are the core FP mode on all // subtargets. diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll index 08c77148f6ae1..433fefa434988 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll @@ -5,14 +5,10 @@ ; GCN-LABEL: {{^}}test_wait_event: ; GFX11: s_wait_event 0x0 -; GFX12: s_wait_event 0x1 +; GFX12: s_wait_event 0x2 -define amdgpu_ps void @test_wait_event() #0 { +define amdgpu_ps void @test_wait_event() { entry: - call void @llvm.amdgcn.s.wait.event.export.ready() #0 + call void @llvm.amdgcn.s.wait.event.export.ready() ret void } - -declare void @llvm.amdgcn.s.wait.event.export.ready() #0 - -attributes #0 = { nounwind } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91034 >From bce9393291a2daa8006d1da629aa2765e00f4e70 Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Tue, 23 Apr 2024 14:38:45 +0100 Subject: [PATCH] [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) As well as flipping the sense of the bit, GFX12 moved it from bit 0 to bit 1 in the encoded simm16 operand. (cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d) --- llvm/lib/Target/AMDGPU/SOPInstructions.td| 2 +- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll | 10 +++--- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td index ae5ef0541929b..5762efde73f02 100644 --- a/llvm/lib/Target/AMDGPU/SOPInstructions.td +++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td @@ -1786,7 +1786,7 @@ def : GCNPat< let SubtargetPredicate = isNotGFX12Plus in def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 0))>; let SubtargetPredicate = isGFX12Plus in - def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 1))>; + def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 2))>; // The first 10 bits of the mode register are the core FP mode on all // subtargets. diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll index 08c77148f6ae1..433fefa434988 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll @@ -5,14 +5,10 @@ ; GCN-LABEL: {{^}}test_wait_event: ; GFX11: s_wait_event 0x0 -; GFX12: s_wait_event 0x1 +; GFX12: s_wait_event 0x2 -define amdgpu_ps void @test_wait_event() #0 { +define amdgpu_ps void @test_wait_event() { entry: - call void @llvm.amdgcn.s.wait.event.export.ready() #0 + call void @llvm.amdgcn.s.wait.event.export.ready() ret void } - -declare void @llvm.amdgcn.s.wait.event.export.ready() #0 - -attributes #0 = { nounwind } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)
llvmbot wrote: @llvm/pr-subscribers-llvm-selectiondag Author: AtariDreams (AtariDreams) Changes This is a fix for miscompiles reported in https://github.com/llvm/llvm-project/issues/89060 After argument copy elison the IR value for the eliminated alloca is aliasing with the fixed stack object. This patch is making sure that we mark the fixed stack object as being aliased with IR values to avoid that for example schedulers are reordering accesses to the fixed stack object. This could otherwise happen when there is a mix of MemOperands referring the shared fixed stack slow via both the IR value for the elided alloca, and via a fixed stack pseudo source value (as would be the case when lowering the arguments). (cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63) --- Full diff: https://github.com/llvm/llvm-project/pull/91035.diff 3 Files Affected: - (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7) - (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+2-1) - (added) llvm/test/CodeGen/Hexagon/arg-copy-elison.ll (+39) ``diff diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 7d11d63d4066f..c35faac09c4d9 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -697,6 +697,13 @@ class MachineFrameInfo { return Objects[ObjectIdx+NumFixedObjects].isAliased; } + /// Set "maybe pointed to by an LLVM IR value" for an object. + void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) { +assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() && + "Invalid Object Idx!"); +Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased; + } + /// Returns true if the specified index corresponds to an immutable object. bool isImmutableObjectIndex(int ObjectIdx) const { // Tail calling functions can clobber their function arguments. diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 5ce1013f30fd1..7406a8ac1611d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy( } // Perform the elision. Delete the old stack object and replace its only use - // in the variable info map. Mark the stack object as mutable. + // in the variable info map. Mark the stack object as mutable and aliased. LLVM_DEBUG({ dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n' << " Replacing frame index " << OldIndex << " with " << FixedIndex @@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy( }); MFI.RemoveStackObject(OldIndex); MFI.setIsImmutableObjectIndex(FixedIndex, false); + MFI.setIsAliasedObjectIndex(FixedIndex, true); AllocaIndex = FixedIndex; ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex}); for (SDValue ArgVal : ArgVals) diff --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll new file mode 100644 index 0..f0c30c301f446 --- /dev/null +++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll @@ -0,0 +1,39 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s + +; Reproducer for https://github.com/llvm/llvm-project/issues/89060 +; +; Problem was a bug in argument copy elison. Given that the %alloca is +; eliminated, the same frame index will be used for accessing %alloca and %a +; on the fixed stack. Care must be taken when setting up +; MachinePointerInfo/MemOperands for those accesses to either make sure that +; we always refer to the fixed stack slot the same way (not using the +; ir.alloca name), or make sure that we still detect that they alias each +; other if using different kinds of MemOperands to identify the same fixed +; stack entry. +; +define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 %q1, i32 %a, i32 %q2) { +; CHECK-LABEL: f: +; CHECK: .cfi_startproc +; CHECK-NEXT: // %bb.0: +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = memw(r29+#36) +; CHECK-NEXT: r1 = memw(r29+#28) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = sub(r1,r0) +; CHECK-NEXT: r2 = memw(r29+#32) +; CHECK-NEXT: memw(r29+#32) = ##666 +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = xor(r0,r2) +; CHECK-NEXT: jumpr r31 +; CHECK-NEXT:} + %alloca = alloca i32 + store i32 %a, ptr %alloca ; Should be elided. + store i32 666, ptr %alloca + %x = sub i32 %q1, %q2 + %y = xor i32 %x, %a ; Results in a load of %a from fixed stack. +; Using same frame index as elided %alloca. + ret i32 %y +} `` https://github.com/llvm/llvm-project/pull/91035
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91035 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f5f572f - [SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712)
Author: Björn Pettersson Date: 2024-05-08T20:16:03-07:00 New Revision: f5f572f54b32f6ff3ae450fa421ed6d478f09ec8 URL: https://github.com/llvm/llvm-project/commit/f5f572f54b32f6ff3ae450fa421ed6d478f09ec8 DIFF: https://github.com/llvm/llvm-project/commit/f5f572f54b32f6ff3ae450fa421ed6d478f09ec8.diff LOG: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712) This is a fix for miscompiles reported in https://github.com/llvm/llvm-project/issues/89060 After argument copy elison the IR value for the eliminated alloca is aliasing with the fixed stack object. This patch is making sure that we mark the fixed stack object as being aliased with IR values to avoid that for example schedulers are reordering accesses to the fixed stack object. This could otherwise happen when there is a mix of MemOperands refering the shared fixed stack slow via both the IR value for the elided alloca, and via a fixed stack pseudo source value (as would be the case when lowering the arguments). (cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63) Added: llvm/test/CodeGen/Hexagon/arg-copy-elison.ll Modified: llvm/include/llvm/CodeGen/MachineFrameInfo.h llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Removed: diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 7d11d63d4066f..c35faac09c4d9 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -697,6 +697,13 @@ class MachineFrameInfo { return Objects[ObjectIdx+NumFixedObjects].isAliased; } + /// Set "maybe pointed to by an LLVM IR value" for an object. + void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) { +assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() && + "Invalid Object Idx!"); +Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased; + } + /// Returns true if the specified index corresponds to an immutable object. bool isImmutableObjectIndex(int ObjectIdx) const { // Tail calling functions can clobber their function arguments. diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 5ce1013f30fd1..7406a8ac1611d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy( } // Perform the elision. Delete the old stack object and replace its only use - // in the variable info map. Mark the stack object as mutable. + // in the variable info map. Mark the stack object as mutable and aliased. LLVM_DEBUG({ dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n' << " Replacing frame index " << OldIndex << " with " << FixedIndex @@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy( }); MFI.RemoveStackObject(OldIndex); MFI.setIsImmutableObjectIndex(FixedIndex, false); + MFI.setIsAliasedObjectIndex(FixedIndex, true); AllocaIndex = FixedIndex; ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex}); for (SDValue ArgVal : ArgVals) diff --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll new file mode 100644 index 0..f0c30c301f446 --- /dev/null +++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll @@ -0,0 +1,39 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s + +; Reproducer for https://github.com/llvm/llvm-project/issues/89060 +; +; Problem was a bug in argument copy elison. Given that the %alloca is +; eliminated, the same frame index will be used for accessing %alloca and %a +; on the fixed stack. Care must be taken when setting up +; MachinePointerInfo/MemOperands for those accesses to either make sure that +; we always refer to the fixed stack slot the same way (not using the +; ir.alloca name), or make sure that we still detect that they alias each +; other if using diff erent kinds of MemOperands to identify the same fixed +; stack entry. +; +define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 %q1, i32 %a, i32 %q2) { +; CHECK-LABEL: f: +; CHECK: .cfi_startproc +; CHECK-NEXT: // %bb.0: +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = memw(r29+#36) +; CHECK-NEXT: r1 = memw(r29+#28) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = sub(r1,r0) +; CHECK-NEXT: r2 = memw(r29+#32) +; CHECK-NEXT: memw(r29+#32) = ##666 +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = xor(r0,r2) +; CHECK-NEXT: jumpr r31 +; CHECK-NEXT:} + %alloca = alloca i32 + store i32 %a, ptr %alloca ; Should be elided. + store i32 666, ptr %alloca + %x = sub i32
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91035 >From f5f572f54b32f6ff3ae450fa421ed6d478f09ec8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20Pettersson?= Date: Tue, 23 Apr 2024 13:49:18 +0200 Subject: [PATCH] [SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712) This is a fix for miscompiles reported in https://github.com/llvm/llvm-project/issues/89060 After argument copy elison the IR value for the eliminated alloca is aliasing with the fixed stack object. This patch is making sure that we mark the fixed stack object as being aliased with IR values to avoid that for example schedulers are reordering accesses to the fixed stack object. This could otherwise happen when there is a mix of MemOperands refering the shared fixed stack slow via both the IR value for the elided alloca, and via a fixed stack pseudo source value (as would be the case when lowering the arguments). (cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63) --- llvm/include/llvm/CodeGen/MachineFrameInfo.h | 7 .../SelectionDAG/SelectionDAGBuilder.cpp | 3 +- llvm/test/CodeGen/Hexagon/arg-copy-elison.ll | 39 +++ 3 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/Hexagon/arg-copy-elison.ll diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 7d11d63d4066f..c35faac09c4d9 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -697,6 +697,13 @@ class MachineFrameInfo { return Objects[ObjectIdx+NumFixedObjects].isAliased; } + /// Set "maybe pointed to by an LLVM IR value" for an object. + void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) { +assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() && + "Invalid Object Idx!"); +Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased; + } + /// Returns true if the specified index corresponds to an immutable object. bool isImmutableObjectIndex(int ObjectIdx) const { // Tail calling functions can clobber their function arguments. diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 5ce1013f30fd1..7406a8ac1611d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy( } // Perform the elision. Delete the old stack object and replace its only use - // in the variable info map. Mark the stack object as mutable. + // in the variable info map. Mark the stack object as mutable and aliased. LLVM_DEBUG({ dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n' << " Replacing frame index " << OldIndex << " with " << FixedIndex @@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy( }); MFI.RemoveStackObject(OldIndex); MFI.setIsImmutableObjectIndex(FixedIndex, false); + MFI.setIsAliasedObjectIndex(FixedIndex, true); AllocaIndex = FixedIndex; ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex}); for (SDValue ArgVal : ArgVals) diff --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll new file mode 100644 index 0..f0c30c301f446 --- /dev/null +++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll @@ -0,0 +1,39 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s + +; Reproducer for https://github.com/llvm/llvm-project/issues/89060 +; +; Problem was a bug in argument copy elison. Given that the %alloca is +; eliminated, the same frame index will be used for accessing %alloca and %a +; on the fixed stack. Care must be taken when setting up +; MachinePointerInfo/MemOperands for those accesses to either make sure that +; we always refer to the fixed stack slot the same way (not using the +; ir.alloca name), or make sure that we still detect that they alias each +; other if using different kinds of MemOperands to identify the same fixed +; stack entry. +; +define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 %q1, i32 %a, i32 %q2) { +; CHECK-LABEL: f: +; CHECK: .cfi_startproc +; CHECK-NEXT: // %bb.0: +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = memw(r29+#36) +; CHECK-NEXT: r1 = memw(r29+#28) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = sub(r1,r0) +; CHECK-NEXT: r2 = memw(r29+#32) +; CHECK-NEXT: memw(r29+#32) = ##666 +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = xor(r0,r2) +; CHECK-NEXT: jumpr r31 +; CHECK-NEXT:} + %alloca = alloca i32 + store i32 %a, ptr %alloca ; Should be elided. + store i32 666, ptr %alloca + %x = sub i32 %q1, %q2 + %y = xor i32 %x, %a
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91425 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] dfc89f8 - [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125)
Author: Phoebe Wang Date: 2024-05-08T20:14:03-07:00 New Revision: dfc89f89ed14ebf22effe9dd9605608a975c4ed8 URL: https://github.com/llvm/llvm-project/commit/dfc89f89ed14ebf22effe9dd9605608a975c4ed8 DIFF: https://github.com/llvm/llvm-project/commit/dfc89f89ed14ebf22effe9dd9605608a975c4ed8.diff LOG: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) AVX doesn't provide 16-bit BROADCAST instruction. Fixes #91005 Added: llvm/test/CodeGen/X86/pr91005.ll Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index c572b27fe401e..3e4ecab8443a9 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7295,7 +7295,7 @@ static SDValue lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp, // With pattern matching, the VBROADCAST node may become a VMOVDDUP. if (ScalarSize == 32 || (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) || -CVT == MVT::f16 || +(CVT == MVT::f16 && Subtarget.hasAVX2()) || (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( { const Constant *C = nullptr; if (ConstantSDNode *CI = dyn_cast(Ld)) diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll new file mode 100644 index 0..16b78bf1e7e17 --- /dev/null +++ b/llvm/test/CodeGen/X86/pr91005.ll @@ -0,0 +1,40 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s + +define void @PR91005(ptr %0) minsize { +; CHECK-LABEL: PR91005: +; CHECK: # %bb.0: +; CHECK-NEXT:xorl %eax, %eax +; CHECK-NEXT:testb %al, %al +; CHECK-NEXT:je .LBB0_2 +; CHECK-NEXT: # %bb.1: +; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpextrw $0, %xmm0, %eax +; CHECK-NEXT:movzwl %ax, %eax +; CHECK-NEXT:vmovd %eax, %xmm0 +; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0 +; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1 +; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0 +; CHECK-NEXT:vmovd %xmm0, %eax +; CHECK-NEXT:movw %ax, (%rdi) +; CHECK-NEXT: .LBB0_2: # %common.ret +; CHECK-NEXT:retq + %2 = bitcast <2 x half> poison to <2 x i16> + %3 = icmp eq <2 x i16> %2, + br i1 poison, label %4, label %common.ret + +common.ret: ; preds = %4, %1 + ret void + +4:; preds = %1 + %5 = select <2 x i1> %3, <2 x half> , <2 x half> zeroinitializer + %6 = fmul <2 x half> %5, zeroinitializer + %7 = fsub <2 x half> %6, zeroinitializer + %8 = extractelement <2 x half> %7, i64 0 + store half %8, ptr %0, align 2 + br label %common.ret +} + +declare <2 x half> @llvm.fabs.v2f16(<2 x half>) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91425 >From dfc89f89ed14ebf22effe9dd9605608a975c4ed8 Mon Sep 17 00:00:00 2001 From: Phoebe Wang Date: Mon, 6 May 2024 10:59:44 +0800 Subject: [PATCH] [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) AVX doesn't provide 16-bit BROADCAST instruction. Fixes #91005 --- llvm/lib/Target/X86/X86ISelLowering.cpp | 2 +- llvm/test/CodeGen/X86/pr91005.ll| 40 + 2 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/X86/pr91005.ll diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index c572b27fe401e..3e4ecab8443a9 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7295,7 +7295,7 @@ static SDValue lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp, // With pattern matching, the VBROADCAST node may become a VMOVDDUP. if (ScalarSize == 32 || (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) || -CVT == MVT::f16 || +(CVT == MVT::f16 && Subtarget.hasAVX2()) || (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( { const Constant *C = nullptr; if (ConstantSDNode *CI = dyn_cast(Ld)) diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll new file mode 100644 index 0..16b78bf1e7e17 --- /dev/null +++ b/llvm/test/CodeGen/X86/pr91005.ll @@ -0,0 +1,40 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s + +define void @PR91005(ptr %0) minsize { +; CHECK-LABEL: PR91005: +; CHECK: # %bb.0: +; CHECK-NEXT:xorl %eax, %eax +; CHECK-NEXT:testb %al, %al +; CHECK-NEXT:je .LBB0_2 +; CHECK-NEXT: # %bb.1: +; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpextrw $0, %xmm0, %eax +; CHECK-NEXT:movzwl %ax, %eax +; CHECK-NEXT:vmovd %eax, %xmm0 +; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0 +; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1 +; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0 +; CHECK-NEXT:vmovd %xmm0, %eax +; CHECK-NEXT:movw %ax, (%rdi) +; CHECK-NEXT: .LBB0_2: # %common.ret +; CHECK-NEXT:retq + %2 = bitcast <2 x half> poison to <2 x i16> + %3 = icmp eq <2 x i16> %2, + br i1 poison, label %4, label %common.ret + +common.ret: ; preds = %4, %1 + ret void + +4:; preds = %1 + %5 = select <2 x i1> %3, <2 x half> , <2 x half> zeroinitializer + %6 = fmul <2 x half> %5, zeroinitializer + %7 = fsub <2 x half> %6, zeroinitializer + %8 = extractelement <2 x half> %7, i64 0 + store half %8, ptr %0, align 2 + br label %common.ret +} + +declare <2 x half> @llvm.fabs.v2f16(<2 x half>) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91425 >From 2fc32a278e4fd46c6dd085845e69e84c321a3f75 Mon Sep 17 00:00:00 2001 From: Phoebe Wang Date: Mon, 6 May 2024 10:59:44 +0800 Subject: [PATCH 1/2] [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) AVX doesn't provide 16-bit BROADCAST instruction. Fixes #91005 --- llvm/lib/Target/X86/X86ISelLowering.cpp | 2 +- llvm/test/CodeGen/X86/pr91005.ll| 39 + 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/X86/pr91005.ll diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index c572b27fe401e..3e4ecab8443a9 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7295,7 +7295,7 @@ static SDValue lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp, // With pattern matching, the VBROADCAST node may become a VMOVDDUP. if (ScalarSize == 32 || (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) || -CVT == MVT::f16 || +(CVT == MVT::f16 && Subtarget.hasAVX2()) || (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( { const Constant *C = nullptr; if (ConstantSDNode *CI = dyn_cast(Ld)) diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll new file mode 100644 index 0..97fd1ce456882 --- /dev/null +++ b/llvm/test/CodeGen/X86/pr91005.ll @@ -0,0 +1,39 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s + +define void @PR91005(ptr %0) minsize { +; CHECK-LABEL: PR91005: +; CHECK: # %bb.0: +; CHECK-NEXT:xorl %eax, %eax +; CHECK-NEXT:testb %al, %al +; CHECK-NEXT:je .LBB0_2 +; CHECK-NEXT: # %bb.1: +; CHECK-NEXT:vbroadcastss {{.*#+}} xmm0 = [31744,31744,31744,31744] +; CHECK-NEXT:vpcmpeqw %xmm0, %xmm0, %xmm0 +; CHECK-NEXT:vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1 +; CHECK-NEXT:vpand %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0 +; CHECK-NEXT:vpxor %xmm1, %xmm1, %xmm1 +; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0 +; CHECK-NEXT:vmovd %xmm0, %eax +; CHECK-NEXT:movw %ax, (%rdi) +; CHECK-NEXT: .LBB0_2: # %common.ret +; CHECK-NEXT:retq + %2 = bitcast <2 x half> poison to <2 x i16> + %3 = icmp eq <2 x i16> %2, + br i1 poison, label %4, label %common.ret + +common.ret: ; preds = %4, %1 + ret void + +4:; preds = %1 + %5 = select <2 x i1> %3, <2 x half> , <2 x half> zeroinitializer + %6 = fmul <2 x half> %5, zeroinitializer + %7 = fsub <2 x half> %6, zeroinitializer + %8 = extractelement <2 x half> %7, i64 0 + store half %8, ptr %0, align 2 + br label %common.ret +} + +declare <2 x half> @llvm.fabs.v2f16(<2 x half>) >From 4d284b853f26a6cb848028720163561cabf63d95 Mon Sep 17 00:00:00 2001 From: Phoebe Wang Date: Wed, 8 May 2024 10:59:31 +0800 Subject: [PATCH 2/2] Fix difference with LLVM 18 release --- llvm/test/CodeGen/X86/pr91005.ll | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll index 97fd1ce456882..16b78bf1e7e17 100644 --- a/llvm/test/CodeGen/X86/pr91005.ll +++ b/llvm/test/CodeGen/X86/pr91005.ll @@ -8,12 +8,13 @@ define void @PR91005(ptr %0) minsize { ; CHECK-NEXT:testb %al, %al ; CHECK-NEXT:je .LBB0_2 ; CHECK-NEXT: # %bb.1: -; CHECK-NEXT:vbroadcastss {{.*#+}} xmm0 = [31744,31744,31744,31744] -; CHECK-NEXT:vpcmpeqw %xmm0, %xmm0, %xmm0 -; CHECK-NEXT:vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1 -; CHECK-NEXT:vpand %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; CHECK-NEXT:vpextrw $0, %xmm0, %eax +; CHECK-NEXT:movzwl %ax, %eax +; CHECK-NEXT:vmovd %eax, %xmm0 ; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0 -; CHECK-NEXT:vpxor %xmm1, %xmm1, %xmm1 +; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1 ; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0 ; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0 ; CHECK-NEXT:vmovd %xmm0, %eax ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArch64][GISEL] Consider fcmp true and fcmp false in cond code selection (#86972) (PR #91126)
tstellar wrote: @marcauberer You can just create manually create a pull request against the release/18.x branch with the fixes. https://github.com/llvm/llvm-project/pull/91126 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) (PR #91118)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91118 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 047cd91 - [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106)
Author: Phoebe Wang Date: 2024-05-08T20:10:38-07:00 New Revision: 047cd915b86a4f35543ad4e691953aaa5a91c4fe URL: https://github.com/llvm/llvm-project/commit/047cd915b86a4f35543ad4e691953aaa5a91c4fe DIFF: https://github.com/llvm/llvm-project/commit/047cd915b86a4f35543ad4e691953aaa5a91c4fe.diff LOG: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) With KNL/KNC being deprecated, we don't need to care about such no VLX cases anymore. We may remove such patterns in the future. Fixes #90844 (cherry picked from commit 7963d9a2b3c20561278a85b19e156e013231342c) Added: llvm/test/CodeGen/X86/pr90844.ll Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrAVX512.td Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 71fc6b5047eaa..c572b27fe401e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -29841,7 +29841,9 @@ static SDValue LowerRotate(SDValue Op, const X86Subtarget , return R; // AVX512 implicitly uses modulo rotation amounts. - if (Subtarget.hasAVX512() && 32 <= EltSizeInBits) { + if ((Subtarget.hasVLX() || + (Subtarget.hasAVX512() && Subtarget.hasEVEX512())) && + 32 <= EltSizeInBits) { // Attempt to rotate by immediate. if (IsCstSplat) { unsigned RotOpc = IsROTL ? X86ISD::VROTLI : X86ISD::VROTRI; diff --git a/llvm/lib/Target/X86/X86InstrAVX512.td b/llvm/lib/Target/X86/X86InstrAVX512.td index bb5e22c714279..0564f2167d8ee 100644 --- a/llvm/lib/Target/X86/X86InstrAVX512.td +++ b/llvm/lib/Target/X86/X86InstrAVX512.td @@ -814,7 +814,7 @@ defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", v32f16_info, v16f16x_info, // A 128-bit extract from bits [255:128] of a 512-bit vector should use a // smaller extract to enable EVEX->VEX. -let Predicates = [NoVLX] in { +let Predicates = [NoVLX, HasEVEX512] in { def : Pat<(v2i64 (extract_subvector (v8i64 VR512:$src), (iPTR 2))), (v2i64 (VEXTRACTI128rr (v4i64 (EXTRACT_SUBREG (v8i64 VR512:$src), sub_ymm)), @@ -3068,7 +3068,7 @@ def : Pat<(Narrow.KVT (and Narrow.KRC:$mask, addr:$src2, (X86cmpm_imm_commute timm:$cc)), Narrow.KRC)>; } -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { defm : axv512_icmp_packed_cc_no_vlx_lowering; defm : axv512_icmp_packed_cc_no_vlx_lowering; @@ -3099,7 +3099,7 @@ let Predicates = [HasAVX512, NoVLX] in { defm : axv512_cmp_packed_cc_no_vlx_lowering<"VCMPPD", v2f64x_info, v8f64_info>; } -let Predicates = [HasBWI, NoVLX] in { +let Predicates = [HasBWI, NoVLX, HasEVEX512] in { defm : axv512_icmp_packed_cc_no_vlx_lowering; defm : axv512_icmp_packed_cc_no_vlx_lowering; @@ -3493,7 +3493,7 @@ multiclass mask_move_lowering; defm : mask_move_lowering<"VMOVDQA32Z", v4i32x_info, v16i32_info>; defm : mask_move_lowering<"VMOVAPSZ", v8f32x_info, v16f32_info>; @@ -3505,7 +3505,7 @@ let Predicates = [HasAVX512, NoVLX] in { defm : mask_move_lowering<"VMOVDQA64Z", v4i64x_info, v8i64_info>; } -let Predicates = [HasBWI, NoVLX] in { +let Predicates = [HasBWI, NoVLX, HasEVEX512] in { defm : mask_move_lowering<"VMOVDQU8Z", v16i8x_info, v64i8_info>; defm : mask_move_lowering<"VMOVDQU8Z", v32i8x_info, v64i8_info>; @@ -4998,8 +4998,8 @@ defm VPMINUD : avx512_binop_rm_vl_d<0x3B, "vpminud", umin, defm VPMINUQ : avx512_binop_rm_vl_q<0x3B, "vpminuq", umin, SchedWriteVecALU, HasAVX512, 1>, T8; -// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX. -let Predicates = [HasDQI, NoVLX] in { +// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX, HasEVEX512. +let Predicates = [HasDQI, NoVLX, HasEVEX512] in { def : Pat<(v4i64 (mul (v4i64 VR256X:$src1), (v4i64 VR256X:$src2))), (EXTRACT_SUBREG (VPMULLQZrr @@ -5055,7 +5055,7 @@ multiclass avx512_min_max_lowering { sub_xmm)>; } -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { defm : avx512_min_max_lowering<"VPMAXUQZ", umax>; defm : avx512_min_max_lowering<"VPMINUQZ", umin>; defm : avx512_min_max_lowering<"VPMAXSQZ", smax>; @@ -6032,7 +6032,7 @@ defm VPSRL : avx512_shift_types<0xD2, 0xD3, 0xD1, "vpsrl", X86vsrl, SchedWriteVecShift>; // Use 512bit VPSRA/VPSRAI version to implement v2i64/v4i64 in case NoVLX. -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { def : Pat<(v4i64 (X86vsra (v4i64 VR256X:$src1), (v2i64 VR128X:$src2))), (EXTRACT_SUBREG (v8i64 (VPSRAQZrr @@ -6161,14 +6161,14 @@ defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", X86vsrlv, SchedWriteVarVecS defm VPRORV :
[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) (PR #91118)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91118 >From 047cd915b86a4f35543ad4e691953aaa5a91c4fe Mon Sep 17 00:00:00 2001 From: Phoebe Wang Date: Sun, 5 May 2024 18:40:27 +0800 Subject: [PATCH] [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) With KNL/KNC being deprecated, we don't need to care about such no VLX cases anymore. We may remove such patterns in the future. Fixes #90844 (cherry picked from commit 7963d9a2b3c20561278a85b19e156e013231342c) --- llvm/lib/Target/X86/X86ISelLowering.cpp | 4 ++- llvm/lib/Target/X86/X86InstrAVX512.td | 42 - llvm/test/CodeGen/X86/pr90844.ll| 19 +++ 3 files changed, 43 insertions(+), 22 deletions(-) create mode 100644 llvm/test/CodeGen/X86/pr90844.ll diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 71fc6b5047eaa..c572b27fe401e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -29841,7 +29841,9 @@ static SDValue LowerRotate(SDValue Op, const X86Subtarget , return R; // AVX512 implicitly uses modulo rotation amounts. - if (Subtarget.hasAVX512() && 32 <= EltSizeInBits) { + if ((Subtarget.hasVLX() || + (Subtarget.hasAVX512() && Subtarget.hasEVEX512())) && + 32 <= EltSizeInBits) { // Attempt to rotate by immediate. if (IsCstSplat) { unsigned RotOpc = IsROTL ? X86ISD::VROTLI : X86ISD::VROTRI; diff --git a/llvm/lib/Target/X86/X86InstrAVX512.td b/llvm/lib/Target/X86/X86InstrAVX512.td index bb5e22c714279..0564f2167d8ee 100644 --- a/llvm/lib/Target/X86/X86InstrAVX512.td +++ b/llvm/lib/Target/X86/X86InstrAVX512.td @@ -814,7 +814,7 @@ defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", v32f16_info, v16f16x_info, // A 128-bit extract from bits [255:128] of a 512-bit vector should use a // smaller extract to enable EVEX->VEX. -let Predicates = [NoVLX] in { +let Predicates = [NoVLX, HasEVEX512] in { def : Pat<(v2i64 (extract_subvector (v8i64 VR512:$src), (iPTR 2))), (v2i64 (VEXTRACTI128rr (v4i64 (EXTRACT_SUBREG (v8i64 VR512:$src), sub_ymm)), @@ -3068,7 +3068,7 @@ def : Pat<(Narrow.KVT (and Narrow.KRC:$mask, addr:$src2, (X86cmpm_imm_commute timm:$cc)), Narrow.KRC)>; } -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { defm : axv512_icmp_packed_cc_no_vlx_lowering; defm : axv512_icmp_packed_cc_no_vlx_lowering; @@ -3099,7 +3099,7 @@ let Predicates = [HasAVX512, NoVLX] in { defm : axv512_cmp_packed_cc_no_vlx_lowering<"VCMPPD", v2f64x_info, v8f64_info>; } -let Predicates = [HasBWI, NoVLX] in { +let Predicates = [HasBWI, NoVLX, HasEVEX512] in { defm : axv512_icmp_packed_cc_no_vlx_lowering; defm : axv512_icmp_packed_cc_no_vlx_lowering; @@ -3493,7 +3493,7 @@ multiclass mask_move_lowering; defm : mask_move_lowering<"VMOVDQA32Z", v4i32x_info, v16i32_info>; defm : mask_move_lowering<"VMOVAPSZ", v8f32x_info, v16f32_info>; @@ -3505,7 +3505,7 @@ let Predicates = [HasAVX512, NoVLX] in { defm : mask_move_lowering<"VMOVDQA64Z", v4i64x_info, v8i64_info>; } -let Predicates = [HasBWI, NoVLX] in { +let Predicates = [HasBWI, NoVLX, HasEVEX512] in { defm : mask_move_lowering<"VMOVDQU8Z", v16i8x_info, v64i8_info>; defm : mask_move_lowering<"VMOVDQU8Z", v32i8x_info, v64i8_info>; @@ -4998,8 +4998,8 @@ defm VPMINUD : avx512_binop_rm_vl_d<0x3B, "vpminud", umin, defm VPMINUQ : avx512_binop_rm_vl_q<0x3B, "vpminuq", umin, SchedWriteVecALU, HasAVX512, 1>, T8; -// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX. -let Predicates = [HasDQI, NoVLX] in { +// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX, HasEVEX512. +let Predicates = [HasDQI, NoVLX, HasEVEX512] in { def : Pat<(v4i64 (mul (v4i64 VR256X:$src1), (v4i64 VR256X:$src2))), (EXTRACT_SUBREG (VPMULLQZrr @@ -5055,7 +5055,7 @@ multiclass avx512_min_max_lowering { sub_xmm)>; } -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { defm : avx512_min_max_lowering<"VPMAXUQZ", umax>; defm : avx512_min_max_lowering<"VPMINUQZ", umin>; defm : avx512_min_max_lowering<"VPMAXSQZ", smax>; @@ -6032,7 +6032,7 @@ defm VPSRL : avx512_shift_types<0xD2, 0xD3, 0xD1, "vpsrl", X86vsrl, SchedWriteVecShift>; // Use 512bit VPSRA/VPSRAI version to implement v2i64/v4i64 in case NoVLX. -let Predicates = [HasAVX512, NoVLX] in { +let Predicates = [HasAVX512, NoVLX, HasEVEX512] in { def : Pat<(v4i64 (X86vsra (v4i64 VR256X:$src1), (v2i64 VR128X:$src2))), (EXTRACT_SUBREG (v8i64 (VPSRAQZrr @@ -6161,14 +6161,14 @@ defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", X86vsrlv, SchedWriteVarVecS defm VPRORV : avx512_var_shift_types<0x14, "vprorv", rotr,
[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/90719 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 58e44d3 - [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)
Author: David Stuttard Date: 2024-05-08T20:08:59-07:00 New Revision: 58e44d3c6f67d5402ec38913d4262b94e73ac123 URL: https://github.com/llvm/llvm-project/commit/58e44d3c6f67d5402ec38913d4262b94e73ac123 DIFF: https://github.com/llvm/llvm-project/commit/58e44d3c6f67d5402ec38913d4262b94e73ac123.diff LOG: [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures). Added: Modified: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.h llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1d..7a3198612f86f 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -1832,7 +1832,7 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr , // not, we need to ensure the subtarget is capable of backing off barrier // instructions in case there are any outstanding memory operations that may // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here. - if (MI.getOpcode() == AMDGPU::S_BARRIER && + if (TII->isBarrierStart(MI.getOpcode()) && !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) { Wait = Wait.combined( AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt())); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index 1c9dacc09f815..626d903c0c695 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform; } + // Check to see if opcode is for a barrier start. Pre gfx12 this is just the + // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we want + // to check for the barrier start (S_BARRIER_SIGNAL*) + bool isBarrierStart(unsigned Opcode) const { +return Opcode == AMDGPU::S_BARRIER || + Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM; + } + static bool doesNotReadTiedSource(const MachineInstr ) { return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead; } diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll index a7d3115af29bf..47c021769aa56 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll @@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT4-NEXT:s_wait_kmcnt 0x0 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1] +; VARIANT4-NEXT:s_wait_storecnt 0x0 ; VARIANT4-NEXT:s_barrier_signal -1 ; VARIANT4-NEXT:s_barrier_wait -1 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1 @@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1] +; VARIANT6-NEXT:s_wait_storecnt 0x0 ; VARIANT6-NEXT:s_barrier_signal -1 ; VARIANT6-NEXT:s_barrier_wait -1 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1 diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll index 4ab5e97964a85..38a34ec6daf73 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll @@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GCN-NEXT:s_wait_kmcnt 0x0 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1] +; GCN-NEXT:s_wait_storecnt 0x0 ; GCN-NEXT:s_barrier_signal -1 ; GCN-NEXT:s_barrier_wait -1 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1] @@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v2, s[0:1] +; GLOBAL-ISEL-NEXT:s_wait_storecnt 0x0 ; GLOBAL-ISEL-NEXT:s_barrier_signal -1 ; GLOBAL-ISEL-NEXT:
[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/90719 >From 58e44d3c6f67d5402ec38913d4262b94e73ac123 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Wed, 1 May 2024 11:37:13 +0100 Subject: [PATCH] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures). --- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 2 +- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 11 ++ .../CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll | 2 ++ .../AMDGPU/llvm.amdgcn.s.barrier.wait.ll | 22 +++ 4 files changed, 36 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 6ecb1c8bf6e1d..7a3198612f86f 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -1832,7 +1832,7 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr , // not, we need to ensure the subtarget is capable of backing off barrier // instructions in case there are any outstanding memory operations that may // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here. - if (MI.getOpcode() == AMDGPU::S_BARRIER && + if (TII->isBarrierStart(MI.getOpcode()) && !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) { Wait = Wait.combined( AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt())); diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index 1c9dacc09f815..626d903c0c695 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform; } + // Check to see if opcode is for a barrier start. Pre gfx12 this is just the + // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we want + // to check for the barrier start (S_BARRIER_SIGNAL*) + bool isBarrierStart(unsigned Opcode) const { +return Opcode == AMDGPU::S_BARRIER || + Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 || + Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM || + Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM; + } + static bool doesNotReadTiedSource(const MachineInstr ) { return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead; } diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll index a7d3115af29bf..47c021769aa56 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll @@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT4-NEXT:s_wait_kmcnt 0x0 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1] +; VARIANT4-NEXT:s_wait_storecnt 0x0 ; VARIANT4-NEXT:s_barrier_signal -1 ; VARIANT4-NEXT:s_barrier_wait -1 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1 @@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) %out, i32 %size) #0 { ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1] +; VARIANT6-NEXT:s_wait_storecnt 0x0 ; VARIANT6-NEXT:s_barrier_signal -1 ; VARIANT6-NEXT:s_barrier_wait -1 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1 diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll index 4ab5e97964a85..38a34ec6daf73 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll @@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GCN-NEXT:s_wait_kmcnt 0x0 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1] +; GCN-NEXT:s_wait_storecnt 0x0 ; GCN-NEXT:s_barrier_signal -1 ; GCN-NEXT:s_barrier_wait -1 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1] @@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr addrspace(1) %out) #0 { ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v2, s[0:1] +; GLOBAL-ISEL-NEXT:s_wait_storecnt 0x0 ; GLOBAL-ISEL-NEXT:s_barrier_signal -1 ; GLOBAL-ISEL-NEXT:s_barrier_wait -1 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v0, s[0:1] @@ -56,6 +58,7 @@ define amdgpu_kernel
[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91095 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] ce88e86 - [CMake][Release] Refactor cache file and use two stages for non-PGO builds (#89812)
Author: Tom Stellard Date: 2024-05-08T19:47:50-07:00 New Revision: ce88e86e428be7eea517201ddee8d62150ae8de4 URL: https://github.com/llvm/llvm-project/commit/ce88e86e428be7eea517201ddee8d62150ae8de4 DIFF: https://github.com/llvm/llvm-project/commit/ce88e86e428be7eea517201ddee8d62150ae8de4.diff LOG: [CMake][Release] Refactor cache file and use two stages for non-PGO builds (#89812) Completely refactor the cache file to simplify it and remove unnecessary variables. The main functional change here is that the non-PGO builds now use two stages, so `ninja -C build stage2-package` can be used with both PGO and non-PGO builds. (cherry picked from commit 6473fbf2d68c8486d168f29afc35d3e8a6fabe69) Added: Modified: clang/cmake/caches/Release.cmake Removed: diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index fa972636553f1..c164d5497275f 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -1,95 +1,93 @@ # Plain options configure the first build. # BOOTSTRAP_* options configure the second build. # BOOTSTRAP_BOOTSTRAP_* options configure the third build. +# PGO Builds have 3 stages (stage1, stage2-instrumented, stage2) +# non-PGO Builds have 2 stages (stage1, stage2) -# General Options + +function (set_final_stage_var name value type) + if (LLVM_RELEASE_ENABLE_PGO) +set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "") + else() +set(BOOTSTRAP_${name} ${value} CACHE ${type} "") + endif() +endfunction() + +function (set_instrument_and_final_stage_var name value type) + # This sets the varaible for the final stage in non-PGO builds and in + # the stage2-instrumented stage for PGO builds. + set(BOOTSTRAP_${name} ${value} CACHE ${type} "") + if (LLVM_RELEASE_ENABLE_PGO) +# Set the variable in the final stage for PGO builds. +set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "") + endif() +endfunction() + +# General Options: +# If you want to override any of the LLVM_RELEASE_* variables you can set them +# on the command line via -D, but you need to do this before you pass this +# cache file to CMake via -C. e.g. +# +# cmake -D LLVM_RELEASE_ENABLE_PGO=ON -C Release.cmake set(LLVM_RELEASE_ENABLE_LTO THIN CACHE STRING "") set(LLVM_RELEASE_ENABLE_PGO OFF CACHE BOOL "") - +set(LLVM_RELEASE_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" CACHE STRING "") +set(LLVM_RELEASE_ENABLE_PROJECTS "clang;lld;lldb;clang-tools-extra;bolt;polly;mlir;flang" CACHE STRING "") +# Note we don't need to add install here, since it is one of the pre-defined +# steps. +set(LLVM_RELEASE_FINAL_STAGE_TARGETS "clang;package;check-all;check-llvm;check-clang" CACHE STRING "") set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "") -# Stage 1 Bootstrap Setup +# Stage 1 Options +set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "") set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "") + +set(STAGE1_PROJECTS "clang") +set(STAGE1_RUNTIMES "") + if (LLVM_RELEASE_ENABLE_PGO) + list(APPEND STAGE1_PROJECTS "lld") + list(APPEND STAGE1_RUNTIMES "compiler-rt") set(CLANG_BOOTSTRAP_TARGETS generate-profdata -stage2 stage2-package stage2-clang -stage2-distribution stage2-install -stage2-install-distribution -stage2-install-distribution-toolchain stage2-check-all stage2-check-llvm -stage2-check-clang -stage2-test-suite CACHE STRING "") -else() - set(CLANG_BOOTSTRAP_TARGETS -clang -check-all -check-llvm -check-clang -test-suite -stage3 -stage3-clang -stage3-check-all -stage3-check-llvm -stage3-check-clang -stage3-install -stage3-test-suite CACHE STRING "") -endif() +stage2-check-clang CACHE STRING "") -# Stage 1 Options -set(STAGE1_PROJECTS "clang") -set(STAGE1_RUNTIMES "") + # Configuration for stage2-instrumented + set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "") + # This enables the build targets for the final stage which is called stage2. + set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS ${LLVM_RELEASE_FINAL_STAGE_TARGETS} CACHE STRING "") + set(BOOTSTRAP_LLVM_BUILD_INSTRUMENTED IR CACHE STRING "") + set(BOOTSTRAP_LLVM_ENABLE_RUNTIMES "compiler-rt" CACHE STRING "") + set(BOOTSTRAP_LLVM_ENABLE_PROJECTS "clang;lld" CACHE STRING "") -if (LLVM_RELEASE_ENABLE_PGO) - list(APPEND STAGE1_PROJECTS "lld") - list(APPEND STAGE1_RUNTIMES "compiler-rt") +else() + if (LLVM_RELEASE_ENABLE_LTO) +list(APPEND STAGE1_PROJECTS "lld") + endif() + # Any targets added here will be given the target name stage2-${target}, so + # if you want to run them you can just use: + # ninja -C $BUILDDIR stage2-${target} + set(CLANG_BOOTSTRAP_TARGETS ${LLVM_RELEASE_FINAL_STAGE_TARGETS} CACHE STRING "") endif() +# Stage 1 Common Config set(LLVM_ENABLE_RUNTIMES ${STAGE1_RUNTIMES} CACHE STRING "") set(LLVM_ENABLE_PROJECTS ${STAGE1_PROJECTS} CACHE STRING "")
[llvm-branch-commits] [clang] b7e2397 - [CMake][Release] Enable CMAKE_POSITION_INDEPENDENT_CODE (#90139)
Author: Tom Stellard Date: 2024-05-08T19:47:50-07:00 New Revision: b7e2397c54b7cddac8fa188e68073f78e895a57a URL: https://github.com/llvm/llvm-project/commit/b7e2397c54b7cddac8fa188e68073f78e895a57a DIFF: https://github.com/llvm/llvm-project/commit/b7e2397c54b7cddac8fa188e68073f78e895a57a.diff LOG: [CMake][Release] Enable CMAKE_POSITION_INDEPENDENT_CODE (#90139) Set this in the cache file directly instead of via the test-release.sh script so that the release builds can be reproduced with just the cache file. (cherry picked from commit 53ff002c6f7ec64a75ab0990b1314cc6b4bb67cf) Added: Modified: clang/cmake/caches/Release.cmake llvm/utils/release/test-release.sh Removed: diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index c164d5497275f..c0bfcbdfc1c2a 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -82,6 +82,7 @@ set(LLVM_ENABLE_PROJECTS ${STAGE1_PROJECTS} CACHE STRING "") # stage2-instrumented and Final Stage Config: # Options that need to be set in both the instrumented stage (if we are doing # a pgo build) and the final stage. +set_instrument_and_final_stage_var(CMAKE_POSITION_INDEPENDENT_CODE "ON" STRING) set_instrument_and_final_stage_var(LLVM_ENABLE_LTO "${LLVM_RELEASE_ENABLE_LTO}" STRING) if (LLVM_RELEASE_ENABLE_LTO) set_instrument_and_final_stage_var(LLVM_ENABLE_LLD "ON" BOOL) diff --git a/llvm/utils/release/test-release.sh b/llvm/utils/release/test-release.sh index 4314b565e11b0..050004aa08c49 100755 --- a/llvm/utils/release/test-release.sh +++ b/llvm/utils/release/test-release.sh @@ -353,8 +353,7 @@ function build_with_cmake_cache() { env CC="$c_compiler" CXX="$cxx_compiler" \ cmake -G "$generator" -B $CMakeBuildDir -S $SrcDir/llvm \ -C $SrcDir/clang/cmake/caches/Release.cmake \ - -DCLANG_BOOTSTRAP_PASSTHROUGH="CMAKE_POSITION_INDEPENDENT_CODE;LLVM_LIT_ARGS" \ --DCMAKE_POSITION_INDEPENDENT_CODE=ON \ + -DCLANG_BOOTSTRAP_PASSTHROUGH="LLVM_LIT_ARGS" \ -DLLVM_LIT_ARGS="-j $NumJobs $LitVerbose" \ $ExtraConfigureFlags 2>&1 | tee $LogDir/llvm.configure-$Flavor.log ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] f2c5a10 - [CMake][Release] Add stage2-package target (#89517)
Author: Tom Stellard Date: 2024-05-08T19:47:50-07:00 New Revision: f2c5a10e1f27768b031b8b54cb056fd4e261ad8f URL: https://github.com/llvm/llvm-project/commit/f2c5a10e1f27768b031b8b54cb056fd4e261ad8f DIFF: https://github.com/llvm/llvm-project/commit/f2c5a10e1f27768b031b8b54cb056fd4e261ad8f.diff LOG: [CMake][Release] Add stage2-package target (#89517) This target will be used to generate the release binary package for uploading to GitHub. (cherry picked from commit a38f201f1ec70c2b1f3cf46e7f291c53bb16753e) Added: Modified: clang/cmake/caches/Release.cmake Removed: diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index bd1f688d61a7e..fa972636553f1 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -14,6 +14,7 @@ if (LLVM_RELEASE_ENABLE_PGO) set(CLANG_BOOTSTRAP_TARGETS generate-profdata stage2 +stage2-package stage2-clang stage2-distribution stage2-install @@ -57,6 +58,7 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "") set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "") set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS clang + package check-all check-llvm check-clang CACHE STRING "") ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91095 >From f2c5a10e1f27768b031b8b54cb056fd4e261ad8f Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Wed, 24 Apr 2024 07:47:42 -0700 Subject: [PATCH 1/7] [CMake][Release] Add stage2-package target (#89517) This target will be used to generate the release binary package for uploading to GitHub. (cherry picked from commit a38f201f1ec70c2b1f3cf46e7f291c53bb16753e) --- clang/cmake/caches/Release.cmake | 2 ++ 1 file changed, 2 insertions(+) diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index bd1f688d61a7e..fa972636553f1 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -14,6 +14,7 @@ if (LLVM_RELEASE_ENABLE_PGO) set(CLANG_BOOTSTRAP_TARGETS generate-profdata stage2 +stage2-package stage2-clang stage2-distribution stage2-install @@ -57,6 +58,7 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "") set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "") set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS clang + package check-all check-llvm check-clang CACHE STRING "") >From ce88e86e428be7eea517201ddee8d62150ae8de4 Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Thu, 25 Apr 2024 15:32:08 -0700 Subject: [PATCH 2/7] [CMake][Release] Refactor cache file and use two stages for non-PGO builds (#89812) Completely refactor the cache file to simplify it and remove unnecessary variables. The main functional change here is that the non-PGO builds now use two stages, so `ninja -C build stage2-package` can be used with both PGO and non-PGO builds. (cherry picked from commit 6473fbf2d68c8486d168f29afc35d3e8a6fabe69) --- clang/cmake/caches/Release.cmake | 134 +++ 1 file changed, 66 insertions(+), 68 deletions(-) diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index fa972636553f1..c164d5497275f 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -1,95 +1,93 @@ # Plain options configure the first build. # BOOTSTRAP_* options configure the second build. # BOOTSTRAP_BOOTSTRAP_* options configure the third build. +# PGO Builds have 3 stages (stage1, stage2-instrumented, stage2) +# non-PGO Builds have 2 stages (stage1, stage2) -# General Options + +function (set_final_stage_var name value type) + if (LLVM_RELEASE_ENABLE_PGO) +set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "") + else() +set(BOOTSTRAP_${name} ${value} CACHE ${type} "") + endif() +endfunction() + +function (set_instrument_and_final_stage_var name value type) + # This sets the varaible for the final stage in non-PGO builds and in + # the stage2-instrumented stage for PGO builds. + set(BOOTSTRAP_${name} ${value} CACHE ${type} "") + if (LLVM_RELEASE_ENABLE_PGO) +# Set the variable in the final stage for PGO builds. +set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "") + endif() +endfunction() + +# General Options: +# If you want to override any of the LLVM_RELEASE_* variables you can set them +# on the command line via -D, but you need to do this before you pass this +# cache file to CMake via -C. e.g. +# +# cmake -D LLVM_RELEASE_ENABLE_PGO=ON -C Release.cmake set(LLVM_RELEASE_ENABLE_LTO THIN CACHE STRING "") set(LLVM_RELEASE_ENABLE_PGO OFF CACHE BOOL "") - +set(LLVM_RELEASE_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" CACHE STRING "") +set(LLVM_RELEASE_ENABLE_PROJECTS "clang;lld;lldb;clang-tools-extra;bolt;polly;mlir;flang" CACHE STRING "") +# Note we don't need to add install here, since it is one of the pre-defined +# steps. +set(LLVM_RELEASE_FINAL_STAGE_TARGETS "clang;package;check-all;check-llvm;check-clang" CACHE STRING "") set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "") -# Stage 1 Bootstrap Setup +# Stage 1 Options +set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "") set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "") + +set(STAGE1_PROJECTS "clang") +set(STAGE1_RUNTIMES "") + if (LLVM_RELEASE_ENABLE_PGO) + list(APPEND STAGE1_PROJECTS "lld") + list(APPEND STAGE1_RUNTIMES "compiler-rt") set(CLANG_BOOTSTRAP_TARGETS generate-profdata -stage2 stage2-package stage2-clang -stage2-distribution stage2-install -stage2-install-distribution -stage2-install-distribution-toolchain stage2-check-all stage2-check-llvm -stage2-check-clang -stage2-test-suite CACHE STRING "") -else() - set(CLANG_BOOTSTRAP_TARGETS -clang -check-all -check-llvm -check-clang -test-suite -stage3 -stage3-clang -stage3-check-all -stage3-check-llvm -stage3-check-clang -stage3-install -stage3-test-suite CACHE STRING "") -endif() +stage2-check-clang CACHE STRING "") -# Stage 1 Options -set(STAGE1_PROJECTS "clang") -set(STAGE1_RUNTIMES "") + # Configuration for stage2-instrumented + set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "") + # This
[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 5) + set(LLVM_VERSION_PATCH 6) tstellar wrote: I just merged this commit in another PR. https://github.com/llvm/llvm-project/pull/91095 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bump version to 18.1.6 (PR #91094)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/91094 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] dd3aa6d - Bump version to 18.1.6 (#91094)
Author: Tom Stellard Date: 2024-05-08T19:41:30-07:00 New Revision: dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0 URL: https://github.com/llvm/llvm-project/commit/dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0 DIFF: https://github.com/llvm/llvm-project/commit/dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0.diff LOG: Bump version to 18.1.6 (#91094) Added: Modified: llvm/CMakeLists.txt llvm/utils/lit/lit/__init__.py Removed: diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index f82be164ac9c4..26b7b01bb1f8d 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 5) + set(LLVM_VERSION_PATCH 6) endif() if(NOT DEFINED LLVM_VERSION_SUFFIX) set(LLVM_VERSION_SUFFIX) diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index 1cfcc7d37813b..d8b0e3bd1c69e 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (18, 1, 5) +__versioninfo__ = (18, 1, 6) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91550 >From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Fri, 16 Feb 2024 21:34:02 + Subject: [PATCH 1/3] [workflows] Rework pre-commit CI for the release branch This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. --- .github/workflows/ci-tests.yml| 156 + .../compute-projects-to-test/action.yml | 21 ++ .../compute-projects-to-test.sh | 221 ++ .github/workflows/continue-timeout-job.yml| 75 ++ .github/workflows/get-job-id/action.yml | 30 +++ .github/workflows/lld-tests.yml | 38 --- .../workflows/pr-sccache-restore/action.yml | 26 +++ .github/workflows/pr-sccache-save/action.yml | 50 .github/workflows/timeout-restore/action.yml | 33 +++ .github/workflows/timeout-save/action.yml | 94 .../unprivileged-download-artifact/action.yml | 77 ++ 11 files changed, 783 insertions(+), 38 deletions(-) create mode 100644 .github/workflows/ci-tests.yml create mode 100644 .github/workflows/compute-projects-to-test/action.yml create mode 100755 .github/workflows/compute-projects-to-test/compute-projects-to-test.sh create mode 100644 .github/workflows/continue-timeout-job.yml create mode 100644 .github/workflows/get-job-id/action.yml delete mode 100644 .github/workflows/lld-tests.yml create mode 100644 .github/workflows/pr-sccache-restore/action.yml create mode 100644 .github/workflows/pr-sccache-save/action.yml create mode 100644 .github/workflows/timeout-restore/action.yml create mode 100644 .github/workflows/timeout-save/action.yml create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml new file mode 100644 index 0..e1d1c02755939 --- /dev/null +++ b/.github/workflows/ci-tests.yml @@ -0,0 +1,156 @@ +name: "CI Tests" + +permissions: + contents: read + +on: + pull_request: +types: + - opened + - synchronize + - reopened + # When a PR is closed, we still start this workflow, but then skip + # all the jobs, which makes it effectively a no-op. The reason to + # do this is that it allows us to take advantage of concurrency groups + # to cancel in progress CI jobs whenever the PR is closed. + - closed +branches: + - 'release/**' + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: True + +jobs: + compute-test-configs: +name: "Compute Configurations to Test" +if: >- + github.repository_owner == 'llvm' && + github.event.action != 'closed' +runs-on: ubuntu-22.04 +outputs: + projects: ${{ steps.vars.outputs.projects }} + check-targets: ${{ steps.vars.outputs.check-targets }} + test-build: ${{ steps.vars.outputs.check-targets != '' }} + test-platforms: ${{ steps.platforms.outputs.result }} +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 +with: + fetch-depth: 2 + + - name: Compute projects to test +id: vars +uses: ./.github/workflows/compute-projects-to-test + + - name: Compute platforms to test +uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1 +id: platforms +with: + script: | +linuxConfig = { + name: "linux-x86_64", + runs_on: "ubuntu-22.04" +} +windowsConfig = { + name: "windows-x86_64", + runs_on: "windows-2022" +} +macConfig = { + name: "macos-x86_64", + runs_on: "macos-13" +} +macArmConfig = { + name: "macos-aarch64", + runs_on: "macos-14" +} + +configs = [] + +const base_ref = process.env.GITHUB_BASE_REF; +if (base_ref.startsWith('release/')) { + // This is a pull request against a release branch. + configs.push(macConfig) + configs.push(macArmConfig) +} + +return configs; + + ci-build-test: +# If this job name is changed, then we need to update the job-name +# paramater for the timeout-save step below. +name: "Build" +needs: + - compute-test-configs +permissions: + actions: write #pr-sccache-save may delete artifacts. +runs-on: ${{ matrix.runs_on }} +strategy: + fail-fast: false +
[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91550 >From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Fri, 16 Feb 2024 21:34:02 + Subject: [PATCH 1/2] [workflows] Rework pre-commit CI for the release branch This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. --- .github/workflows/ci-tests.yml| 156 + .../compute-projects-to-test/action.yml | 21 ++ .../compute-projects-to-test.sh | 221 ++ .github/workflows/continue-timeout-job.yml| 75 ++ .github/workflows/get-job-id/action.yml | 30 +++ .github/workflows/lld-tests.yml | 38 --- .../workflows/pr-sccache-restore/action.yml | 26 +++ .github/workflows/pr-sccache-save/action.yml | 50 .github/workflows/timeout-restore/action.yml | 33 +++ .github/workflows/timeout-save/action.yml | 94 .../unprivileged-download-artifact/action.yml | 77 ++ 11 files changed, 783 insertions(+), 38 deletions(-) create mode 100644 .github/workflows/ci-tests.yml create mode 100644 .github/workflows/compute-projects-to-test/action.yml create mode 100755 .github/workflows/compute-projects-to-test/compute-projects-to-test.sh create mode 100644 .github/workflows/continue-timeout-job.yml create mode 100644 .github/workflows/get-job-id/action.yml delete mode 100644 .github/workflows/lld-tests.yml create mode 100644 .github/workflows/pr-sccache-restore/action.yml create mode 100644 .github/workflows/pr-sccache-save/action.yml create mode 100644 .github/workflows/timeout-restore/action.yml create mode 100644 .github/workflows/timeout-save/action.yml create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml new file mode 100644 index 0..e1d1c02755939 --- /dev/null +++ b/.github/workflows/ci-tests.yml @@ -0,0 +1,156 @@ +name: "CI Tests" + +permissions: + contents: read + +on: + pull_request: +types: + - opened + - synchronize + - reopened + # When a PR is closed, we still start this workflow, but then skip + # all the jobs, which makes it effectively a no-op. The reason to + # do this is that it allows us to take advantage of concurrency groups + # to cancel in progress CI jobs whenever the PR is closed. + - closed +branches: + - 'release/**' + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: True + +jobs: + compute-test-configs: +name: "Compute Configurations to Test" +if: >- + github.repository_owner == 'llvm' && + github.event.action != 'closed' +runs-on: ubuntu-22.04 +outputs: + projects: ${{ steps.vars.outputs.projects }} + check-targets: ${{ steps.vars.outputs.check-targets }} + test-build: ${{ steps.vars.outputs.check-targets != '' }} + test-platforms: ${{ steps.platforms.outputs.result }} +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 +with: + fetch-depth: 2 + + - name: Compute projects to test +id: vars +uses: ./.github/workflows/compute-projects-to-test + + - name: Compute platforms to test +uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1 +id: platforms +with: + script: | +linuxConfig = { + name: "linux-x86_64", + runs_on: "ubuntu-22.04" +} +windowsConfig = { + name: "windows-x86_64", + runs_on: "windows-2022" +} +macConfig = { + name: "macos-x86_64", + runs_on: "macos-13" +} +macArmConfig = { + name: "macos-aarch64", + runs_on: "macos-14" +} + +configs = [] + +const base_ref = process.env.GITHUB_BASE_REF; +if (base_ref.startsWith('release/')) { + // This is a pull request against a release branch. + configs.push(macConfig) + configs.push(macArmConfig) +} + +return configs; + + ci-build-test: +# If this job name is changed, then we need to update the job-name +# paramater for the timeout-save step below. +name: "Build" +needs: + - compute-test-configs +permissions: + actions: write #pr-sccache-save may delete artifacts. +runs-on: ${{ matrix.runs_on }} +strategy: + fail-fast: false +
[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)
https://github.com/tstellar updated https://github.com/llvm/llvm-project/pull/91550 >From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Fri, 16 Feb 2024 21:34:02 + Subject: [PATCH] [workflows] Rework pre-commit CI for the release branch This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. --- .github/workflows/ci-tests.yml| 156 + .../compute-projects-to-test/action.yml | 21 ++ .../compute-projects-to-test.sh | 221 ++ .github/workflows/continue-timeout-job.yml| 75 ++ .github/workflows/get-job-id/action.yml | 30 +++ .github/workflows/lld-tests.yml | 38 --- .../workflows/pr-sccache-restore/action.yml | 26 +++ .github/workflows/pr-sccache-save/action.yml | 50 .github/workflows/timeout-restore/action.yml | 33 +++ .github/workflows/timeout-save/action.yml | 94 .../unprivileged-download-artifact/action.yml | 77 ++ 11 files changed, 783 insertions(+), 38 deletions(-) create mode 100644 .github/workflows/ci-tests.yml create mode 100644 .github/workflows/compute-projects-to-test/action.yml create mode 100755 .github/workflows/compute-projects-to-test/compute-projects-to-test.sh create mode 100644 .github/workflows/continue-timeout-job.yml create mode 100644 .github/workflows/get-job-id/action.yml delete mode 100644 .github/workflows/lld-tests.yml create mode 100644 .github/workflows/pr-sccache-restore/action.yml create mode 100644 .github/workflows/pr-sccache-save/action.yml create mode 100644 .github/workflows/timeout-restore/action.yml create mode 100644 .github/workflows/timeout-save/action.yml create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml new file mode 100644 index 0..e1d1c02755939 --- /dev/null +++ b/.github/workflows/ci-tests.yml @@ -0,0 +1,156 @@ +name: "CI Tests" + +permissions: + contents: read + +on: + pull_request: +types: + - opened + - synchronize + - reopened + # When a PR is closed, we still start this workflow, but then skip + # all the jobs, which makes it effectively a no-op. The reason to + # do this is that it allows us to take advantage of concurrency groups + # to cancel in progress CI jobs whenever the PR is closed. + - closed +branches: + - 'release/**' + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: True + +jobs: + compute-test-configs: +name: "Compute Configurations to Test" +if: >- + github.repository_owner == 'llvm' && + github.event.action != 'closed' +runs-on: ubuntu-22.04 +outputs: + projects: ${{ steps.vars.outputs.projects }} + check-targets: ${{ steps.vars.outputs.check-targets }} + test-build: ${{ steps.vars.outputs.check-targets != '' }} + test-platforms: ${{ steps.platforms.outputs.result }} +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 +with: + fetch-depth: 2 + + - name: Compute projects to test +id: vars +uses: ./.github/workflows/compute-projects-to-test + + - name: Compute platforms to test +uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1 +id: platforms +with: + script: | +linuxConfig = { + name: "linux-x86_64", + runs_on: "ubuntu-22.04" +} +windowsConfig = { + name: "windows-x86_64", + runs_on: "windows-2022" +} +macConfig = { + name: "macos-x86_64", + runs_on: "macos-13" +} +macArmConfig = { + name: "macos-aarch64", + runs_on: "macos-14" +} + +configs = [] + +const base_ref = process.env.GITHUB_BASE_REF; +if (base_ref.startsWith('release/')) { + // This is a pull request against a release branch. + configs.push(macConfig) + configs.push(macArmConfig) +} + +return configs; + + ci-build-test: +# If this job name is changed, then we need to update the job-name +# paramater for the timeout-save step below. +name: "Build" +needs: + - compute-test-configs +permissions: + actions: write #pr-sccache-save may delete artifacts. +runs-on: ${{ matrix.runs_on }} +strategy: + fail-fast: false +
[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)
llvmbot wrote: @llvm/pr-subscribers-github-workflow Author: Tom Stellard (tstellar) Changes This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. --- Patch is 25.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91550.diff 10 Files Affected: - (added) .github/workflows/ci-tests.yml (+154) - (added) .github/workflows/compute-projects-to-test/action.yml (+21) - (added) .github/workflows/compute-projects-to-test/compute-projects-to-test.sh (+221) - (added) .github/workflows/continue-timeout-job.yml (+75) - (added) .github/workflows/get-job-id/action.yml (+30) - (added) .github/workflows/pr-sccache-restore/action.yml (+26) - (added) .github/workflows/pr-sccache-save/action.yml (+50) - (added) .github/workflows/timeout-restore/action.yml (+33) - (added) .github/workflows/timeout-save/action.yml (+94) - (added) .github/workflows/unprivileged-download-artifact/action.yml (+77) ``diff diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml new file mode 100644 index 0..22e39174abee7 --- /dev/null +++ b/.github/workflows/ci-tests.yml @@ -0,0 +1,154 @@ +name: "CI Tests" + +permissions: + contents: read + +on: + pull_request: +types: + - opened + - synchronize + - reopened + # When a PR is closed, we still start this workflow, but then skip + # all the jobs, which makes it effectively a no-op. The reason to + # do this is that it allows us to take advantage of concurrency groups + # to cancel in progress CI jobs whenever the PR is closed. + - closed +branches: + - main + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: True + +jobs: + compute-test-configs: +name: "Compute Configurations to Test" +if: github.event.action != 'closed' +runs-on: ubuntu-22.04 +outputs: + projects: ${{ steps.vars.outputs.projects }} + check-targets: ${{ steps.vars.outputs.check-targets }} + test-build: ${{ steps.vars.outputs.check-targets != '' }} + test-platforms: ${{ steps.platforms.outputs.result }} +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 +with: + fetch-depth: 2 + + - name: Compute projects to test +id: vars +uses: ./.github/workflows/compute-projects-to-test + + - name: Compute platforms to test +uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1 +id: platforms +with: + script: | +linuxConfig = { + name: "linux-x86_64", + runs_on: "ubuntu-22.04" +} +windowsConfig = { + name: "windows-x86_64", + runs_on: "windows-2022" +} +macConfig = { + name: "macos-x86_64", + runs_on: "macos-13" +} +macArmConfig = { + name: "macos-aarch64", + runs_on: "macos-14" +} + +configs = [] + +const base_ref = process.env.GITHUB_BASE_REF; +if (base_ref.startsWith('release/')) { + // This is a pull request against a release branch. + configs.push(macConfig) + configs.push(macArmConfig) +} + +return configs; + + ci-build-test: +# If this job name is changed, then we need to update the job-name +# paramater for the timeout-save step below. +name: "Build" +needs: + - compute-test-configs +permissions: + actions: write #pr-sccache-save may delete artifacts. +runs-on: ${{ matrix.runs_on }} +strategy: + fail-fast: false + matrix: +include: ${{ fromJson(needs.compute-test-configs.outputs.test-platforms) }} +if: needs.compute-test-configs.outputs.test-build == 'true' +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 + + - name: Timeout Restore +id: timeout +uses: ./.github/workflows/timeout-restore +with: + artifact-name-suffix: ${{ matrix.name }} + + - name: Setup Windows +uses: llvm/actions/setup-windows@main +if: ${{ runner.os == 'Windows' }} +with: + arch: amd64 + + - name: Install Ninja +uses: llvm/actions/install-ninja@main + + - name: Setup sccache +uses: hendrikmuhs/ccache-action@v1 +with: + max-size: 2G + variant: sccache + key: ci-${{ matrix.name }} + + - name: Restore sccache from previous PR run +
[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)
https://github.com/tstellar created https://github.com/llvm/llvm-project/pull/91550 This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. >From a590088cbdf37d3c4d274c5ab9d6d4e4de9c922c Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Fri, 16 Feb 2024 21:34:02 + Subject: [PATCH] [workflows] Rework pre-commit CI for the release branch This rewrites the pre-commit CI for the release branch so that it behaves almost exactly like the current buildkite builders. It builds every project and uses a better filtering method for selecting which projects to build. In addition, with this change we drop the Linux and Windows test configs, since these are already covered by buildkite and add a config for macos/aarch64. --- .github/workflows/ci-tests.yml| 154 .../compute-projects-to-test/action.yml | 21 ++ .../compute-projects-to-test.sh | 221 ++ .github/workflows/continue-timeout-job.yml| 75 ++ .github/workflows/get-job-id/action.yml | 30 +++ .../workflows/pr-sccache-restore/action.yml | 26 +++ .github/workflows/pr-sccache-save/action.yml | 50 .github/workflows/timeout-restore/action.yml | 33 +++ .github/workflows/timeout-save/action.yml | 94 .../unprivileged-download-artifact/action.yml | 77 ++ 10 files changed, 781 insertions(+) create mode 100644 .github/workflows/ci-tests.yml create mode 100644 .github/workflows/compute-projects-to-test/action.yml create mode 100755 .github/workflows/compute-projects-to-test/compute-projects-to-test.sh create mode 100644 .github/workflows/continue-timeout-job.yml create mode 100644 .github/workflows/get-job-id/action.yml create mode 100644 .github/workflows/pr-sccache-restore/action.yml create mode 100644 .github/workflows/pr-sccache-save/action.yml create mode 100644 .github/workflows/timeout-restore/action.yml create mode 100644 .github/workflows/timeout-save/action.yml create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml new file mode 100644 index 0..22e39174abee7 --- /dev/null +++ b/.github/workflows/ci-tests.yml @@ -0,0 +1,154 @@ +name: "CI Tests" + +permissions: + contents: read + +on: + pull_request: +types: + - opened + - synchronize + - reopened + # When a PR is closed, we still start this workflow, but then skip + # all the jobs, which makes it effectively a no-op. The reason to + # do this is that it allows us to take advantage of concurrency groups + # to cancel in progress CI jobs whenever the PR is closed. + - closed +branches: + - main + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: True + +jobs: + compute-test-configs: +name: "Compute Configurations to Test" +if: github.event.action != 'closed' +runs-on: ubuntu-22.04 +outputs: + projects: ${{ steps.vars.outputs.projects }} + check-targets: ${{ steps.vars.outputs.check-targets }} + test-build: ${{ steps.vars.outputs.check-targets != '' }} + test-platforms: ${{ steps.platforms.outputs.result }} +steps: + - name: Fetch LLVM sources +uses: actions/checkout@v4 +with: + fetch-depth: 2 + + - name: Compute projects to test +id: vars +uses: ./.github/workflows/compute-projects-to-test + + - name: Compute platforms to test +uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1 +id: platforms +with: + script: | +linuxConfig = { + name: "linux-x86_64", + runs_on: "ubuntu-22.04" +} +windowsConfig = { + name: "windows-x86_64", + runs_on: "windows-2022" +} +macConfig = { + name: "macos-x86_64", + runs_on: "macos-13" +} +macArmConfig = { + name: "macos-aarch64", + runs_on: "macos-14" +} + +configs = [] + +const base_ref = process.env.GITHUB_BASE_REF; +if (base_ref.startsWith('release/')) { + // This is a pull request against a release branch. + configs.push(macConfig) + configs.push(macArmConfig) +} + +return configs; + + ci-build-test: +# If this job name is changed, then we need to update the job-name +# paramater for the timeout-save step below. +name: "Build" +needs:
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
ilovepi wrote: This tends to bite anyone using LTO with RISCV. In particular I’m concerned about the impact on Rust, since they’ll pin LLVM until the LLVM 19 release. About 60% of Fuchsia is implemented in rust. More if you count only count userland. We’re hoping to avoid a situation where we can’t use LTO on RISCV Fuchsia targets, as we’re starting to rely more on LTO configurations, to enable features like control flow integrity. https://github.com/llvm/llvm-project/pull/91514 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [InstSimplify] Do not simplify freeze in `simplifyWithOpReplaced` (#91215) (PR #91419)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/91419 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
teresajohnson wrote: > #87600 is a functional change and the diffbase of this patch, and > `llvm/test/ThinLTO/X86/import_callee_declaration.ll` should be a test case > for both patches. > > In the [diffbase](https://github.com/llvm/llvm-project/pull/87600), bitcode > writer takes maps as additional parameters to populate import status, and > it's not straightforward to construct regression tests there without this > patch. I wonder if I shall introduce `cl::list` in > llvm-lto/llvm-lto2 (as a repeated arg) to specify `filename:GUID` to test the > diffbase alone. Rather than add an option just for testing that one alone, I have a suggestion for splitting up the PRs slightly differently. What if you submitted this one first, minus the modified calls to writeIndexToFile and the part of the test that checks the disassembled index (just have the testing for this one check the number of declarations imported and other debug messages). Then move the modified calls to writeIndexToFile and the index disassembly checking to PR87600 that can be committed as a follow on? That way each change comes with a test. https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [AArch64][GISEL] Consider fcmp true and fcmp false in cond code selection (#86972) (PR #91126)
marcauberer wrote: @arsenm How are lit test failures handled in case of cherry-picks? Seems like GISEL behaves a bit different on `release/18.x`. I have a fix prepared locally, but how should I push it? I can't push to a llvmbot branch, can I? https://github.com/llvm/llvm-project/pull/91126 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -1670,11 +1798,15 @@ Expected FunctionImporter::importFunctions( if (!GV.hasName()) continue; auto GUID = GV.getGUID(); - auto Import = ImportGUIDs.count(GUID); - LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing global " -<< GUID << " " << GV.getName() << " from " -<< SrcModule->getSourceFileName() << "\n"); - if (Import) { + auto ImportType = maybeGetImportType(ImportGUIDs, GUID); + if (!ImportType) teresajohnson wrote: Or do what I suggested above which goes back to only needing one LLVM_DEBUG https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -245,8 +256,10 @@ static auto qualifyCalleeCandidates( } /// Given a list of possible callee implementation for a call site, select one -/// that fits the \p Threshold. If none are found, the Reason will give the last -/// reason for the failure (last, in the order of CalleeSummaryList entries). +/// that fits the \p Threshold for function definition import. If none are +/// found, the Reason will give the last reason for the failure (last, in the +/// order of CalleeSummaryList entries). If caller wants to select eligible +/// summary teresajohnson wrote: dangling sentence? https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
https://github.com/teresajohnson commented: I only had time for a cursory review, some comments / suggestions below. I also have a suggestion for the testing issue wrt to the other patch, will note that separately https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -1634,17 +1752,27 @@ Expected FunctionImporter::importFunctions( return std::move(Err); auto = FunctionsToImportPerModule->second; + // Find the globals to import SetVector GlobalsToImport; for (Function : *SrcModule) { if (!F.hasName()) continue; auto GUID = F.getGUID(); - auto Import = ImportGUIDs.count(GUID); - LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing function " -<< GUID << " " << F.getName() << " from " -<< SrcModule->getSourceFileName() << "\n"); - if (Import) { + auto ImportType = maybeGetImportType(ImportGUIDs, GUID); + + if (!ImportType) { teresajohnson wrote: You could combine this with the below ImportDefinition checking to keep the same flow as before with one debug message, e.g.: ``` auto ImportType = maybeGetImportType(...); auto ImportDefinition = false; if (ImportType) { ImportDefinition = ...; } LLVM_DEBUG(dbgs() << (ImportDefinition ... if (ImportDefinition) { ... ``` https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -1634,17 +1752,27 @@ Expected FunctionImporter::importFunctions( return std::move(Err); auto = FunctionsToImportPerModule->second; + // Find the globals to import SetVector GlobalsToImport; for (Function : *SrcModule) { if (!F.hasName()) continue; auto GUID = F.getGUID(); - auto Import = ImportGUIDs.count(GUID); - LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing function " -<< GUID << " " << F.getName() << " from " -<< SrcModule->getSourceFileName() << "\n"); - if (Import) { + auto ImportType = maybeGetImportType(ImportGUIDs, GUID); + + if (!ImportType) { teresajohnson wrote: Also consider indicating which are imported as declarations in the debug message? https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -158,7 +158,7 @@ void llvm::computeLTOCacheKey( std::vector ExportsGUID; ExportsGUID.reserve(ExportList.size()); - for (const auto : ExportList) { + for (const auto &[VI, UnusedImportType] : ExportList) { teresajohnson wrote: We should probably include the new import type result in the cache key. Because if that changes then presumably the cached object should be invalidated as it would be different? https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
https://github.com/teresajohnson edited https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)
@@ -1670,11 +1798,15 @@ Expected FunctionImporter::importFunctions( if (!GV.hasName()) continue; auto GUID = GV.getGUID(); - auto Import = ImportGUIDs.count(GUID); - LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing global " -<< GUID << " " << GV.getName() << " from " -<< SrcModule->getSourceFileName() << "\n"); - if (Import) { + auto ImportType = maybeGetImportType(ImportGUIDs, GUID); + if (!ImportType) teresajohnson wrote: Do we need to emit a debug message in this case like you are doing for functions above? Ditto for aliases below https://github.com/llvm/llvm-project/pull/88024 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFCI] Use heuristic for matching split global functions (PR #90429)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/90429 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][BAT] Fix translate for branches added by BOLT (PR #90811)
https://github.com/aaupov closed https://github.com/llvm/llvm-project/pull/90811 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFCI] Allow non-simple functions to be in disassembled state (PR #90806)
https://github.com/aaupov closed https://github.com/llvm/llvm-project/pull/90806 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
andrewrk wrote: Thanks! https://github.com/llvm/llvm-project/pull/91425 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
topperc wrote: > Can you briefly summarize why this is important to backport? At first glance, > this is only relevant for LTO with mixed architecture specifications, > which... I can see someone might want it, I guess, but it seems pretty easy > to work around not having it. It's not just mixed architecture specifications. Even in a non-mixed situation the Compressed instruction flag in the ELF header doesn't get set correctly for LTO. Prior to these patches, the flag is set using the subtarget features from the TargetMachine which are empty in an LTO build. The linker needs this flag to do linker relaxation for alignment correctly. The workaround is to pass `-Wl,-plugin-opt=-mattr=+c`. CC @ilovepi who asked me to try to backport it. https://github.com/llvm/llvm-project/pull/91514 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
efriedma-quic wrote: Can you briefly summarize why this is important to backport? At first glance, this is only relevant for LTO with mixed architecture specifications, which... I can see someone might want it, I guess, but it seems pretty easy to work around not having it. https://github.com/llvm/llvm-project/pull/91514 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/90807 >From acf58ceb37d2aa917e8d84d243faadc58f5f3a7d Mon Sep 17 00:00:00 2001 From: Amir Ayupov Date: Mon, 6 May 2024 13:35:04 -0700 Subject: [PATCH 1/3] Simplify IsReturn check Created using spr 1.3.4 --- bolt/lib/Profile/DataAggregator.cpp | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp index e4a7324c38175..d02e4499014ed 100644 --- a/bolt/lib/Profile/DataAggregator.cpp +++ b/bolt/lib/Profile/DataAggregator.cpp @@ -778,13 +778,13 @@ bool DataAggregator::doBranch(uint64_t From, uint64_t To, uint64_t Count, if (BinaryFunction *Func = getBinaryFunctionContainingAddress(Addr)) { Addr -= Func->getAddress(); if (IsFrom) { -if (Func->hasInstructions()) { - if (MCInst *Inst = Func->getInstructionAtOffset(Addr)) -IsReturn = BC->MIB->isReturn(*Inst); -} else if (std::optional Inst = -Func->disassembleInstructionAtOffset(Addr)) { - IsReturn = BC->MIB->isReturn(*Inst); -} +auto checkReturn = [&](auto MaybeInst) { + IsReturn = MaybeInst && BC->MIB->isReturn(*MaybeInst); +}; +if (Func->hasInstructions()) + checkReturn(Func->getInstructionAtOffset(Addr)); +else + checkReturn(Func->disassembleInstructionAtOffset(Addr)); } if (BAT) >From 22052e461e5671f376fe2dcb733446b0a63e956d Mon Sep 17 00:00:00 2001 From: Amir Ayupov Date: Tue, 7 May 2024 18:30:48 -0700 Subject: [PATCH 2/3] drop const from disassembleInstructionAtOffset Created using spr 1.3.4 --- bolt/include/bolt/Core/BinaryFunction.h | 3 +-- bolt/lib/Core/BinaryFunction.cpp| 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h index b21312f92c485..3c641581e247a 100644 --- a/bolt/include/bolt/Core/BinaryFunction.h +++ b/bolt/include/bolt/Core/BinaryFunction.h @@ -930,8 +930,7 @@ class BinaryFunction { return const_cast(this)->getInstructionAtOffset(Offset); } - const std::optional - disassembleInstructionAtOffset(uint64_t Offset) const; + std::optional disassembleInstructionAtOffset(uint64_t Offset) const; /// Return offset for the first instruction. If there is data at the /// beginning of a function then offset of the first instruction could diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp index 5f3c0cb1ad754..fb81fc3f2ba7b 100644 --- a/bolt/lib/Core/BinaryFunction.cpp +++ b/bolt/lib/Core/BinaryFunction.cpp @@ -1167,7 +1167,7 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst , } } -const std::optional +std::optional BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const { assert(CurrentState == State::Empty); assert(Offset < MaxSize && "invalid offset"); >From 63725510bf85c9e3862800830f5881099ab4b21f Mon Sep 17 00:00:00 2001 From: Amir Ayupov Date: Wed, 8 May 2024 11:59:59 -0700 Subject: [PATCH 3/3] Assert messages Created using spr 1.3.4 --- bolt/lib/Core/BinaryFunction.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp index fb81fc3f2ba7b..4721a247ee2e2 100644 --- a/bolt/lib/Core/BinaryFunction.cpp +++ b/bolt/lib/Core/BinaryFunction.cpp @@ -1169,10 +1169,10 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst , std::optional BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const { - assert(CurrentState == State::Empty); - assert(Offset < MaxSize && "invalid offset"); + assert(CurrentState == State::Empty && "Function should not be disassembled"); + assert(Offset < MaxSize && "Invalid offset"); ErrorOr> FunctionData = getData(); - assert(FunctionData && "cannot get function as data"); + assert(FunctionData && "Cannot get function as data"); MCInst Instr; uint64_t InstrSize = 0; const uint64_t InstrAddress = getAddress() + Offset; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
https://github.com/topperc updated https://github.com/llvm/llvm-project/pull/91514 >From ee109e3627e5b93297bfc7908f684eedb5feb5ec Mon Sep 17 00:00:00 2001 From: Craig Topper Date: Tue, 13 Feb 2024 16:17:50 -0800 Subject: [PATCH 1/3] [RISCV] Add canonical ISA string as Module metadata in IR. (#80760) In an LTO build, we don't set the ELF attributes to indicate what extensions were compiled with. The target CPU/Attrs in RISCVTargetMachine do not get set for an LTO build. Each function gets a target-cpu/feature attribute, but this isn't usable to set ELF attributs since we wouldn't know what function to use. We can't just once since it might have been compiler with an attribute likes target_verson. This patch adds the ISA as Module metadata so we can retrieve it in the backend. Individual translation units can still be compiled with different strings so we need to collect the unique set when Modules are merged. The backend will need to combine the unique ISA strings to produce a single value for the ELF attributes. This will be done in a separate patch. --- clang/lib/CodeGen/CodeGenModule.cpp | 14 + .../RISCV/ntlh-intrinsics/riscv32-zihintntl.c | 350 +- .../test/CodeGen/RISCV/riscv-metadata-arch.c | 20 + 3 files changed, 209 insertions(+), 175 deletions(-) create mode 100644 clang/test/CodeGen/RISCV/riscv-metadata-arch.c diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 1280bcd36de94..eb13cd40eb8a2 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -67,6 +67,7 @@ #include "llvm/Support/CommandLine.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/RISCVISAInfo.h" #include "llvm/Support/TimeProfiler.h" #include "llvm/Support/xxhash.h" #include "llvm/TargetParser/Triple.h" @@ -1059,6 +1060,19 @@ void CodeGenModule::Release() { llvm::LLVMContext = TheModule.getContext(); getModule().addModuleFlag(llvm::Module::Error, "target-abi", llvm::MDString::get(Ctx, ABIStr)); + +// Add the canonical ISA string as metadata so the backend can set the ELF +// attributes correctly. We use AppendUnique so LTO will keep all of the +// unique ISA strings that were linked together. +const std::vector = +getTarget().getTargetOpts().Features; +auto ParseResult = llvm::RISCVISAInfo::parseFeatures( +Arch == llvm::Triple::riscv64 ? 64 : 32, Features); +if (!errorToBool(ParseResult.takeError())) + getModule().addModuleFlag( + llvm::Module::AppendUnique, "riscv-isa", + llvm::MDNode::get( + Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(; } if (CodeGenOpts.SanitizeCfiCrossDso) { diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c index 897edbc6450af..b11c2ca010e7c 100644 --- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c +++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c @@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2; // clang-format off void ntl_all_sizes() { // CHECK-LABEL: ntl_all_sizes - uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4,
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
llvmbot wrote: @llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) Changes Resolves #91513 --- Patch is 57.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91514.diff 10 Files Affected: - (modified) clang/lib/CodeGen/CodeGenModule.cpp (+14) - (modified) clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c (+175-175) - (added) clang/test/CodeGen/RISCV/riscv-metadata-arch.c (+20) - (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVELFStreamer.cpp (+4-4) - (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVELFStreamer.h (-1) - (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp (+5) - (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.h (+5) - (modified) llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp (+28-4) - (added) llvm/test/CodeGen/RISCV/attributes-module-flag.ll (+17) - (added) llvm/test/CodeGen/RISCV/module-elf-flags.ll (+13) ``diff diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 1280bcd36de94..f576cd8b853c2 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -67,6 +67,7 @@ #include "llvm/Support/CommandLine.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/RISCVISAInfo.h" #include "llvm/Support/TimeProfiler.h" #include "llvm/Support/xxhash.h" #include "llvm/TargetParser/Triple.h" @@ -1059,6 +1060,19 @@ void CodeGenModule::Release() { llvm::LLVMContext = TheModule.getContext(); getModule().addModuleFlag(llvm::Module::Error, "target-abi", llvm::MDString::get(Ctx, ABIStr)); + +// Add the canonical ISA string as metadata so the backend can set the ELF +// attributes correctly. We use AppendUnique so LTO will keep all of the +// unique ISA strings that were linked together. +const std::vector = +getTarget().getTargetOpts().Features; +auto ParseResult = +llvm::RISCVISAInfo::parseFeatures(T.isRISCV64() ? 64 : 32, Features); +if (!errorToBool(ParseResult.takeError())) + getModule().addModuleFlag( + llvm::Module::AppendUnique, "riscv-isa", + llvm::MDNode::get( + Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(; } if (CodeGenOpts.SanitizeCfiCrossDso) { diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c index 897edbc6450af..b11c2ca010e7c 100644 --- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c +++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c @@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2; // clang-format off void ntl_all_sizes() { // CHECK-LABEL: ntl_all_sizes - uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain !5 - v8ss1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load <8 x i16>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain !5 - v16sc1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load <16 x i8>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain !5 - *scvi1 = __riscv_ntl_load(scvi2, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
https://github.com/topperc created https://github.com/llvm/llvm-project/pull/91514 Resolves #91513 >From f45df1cf1b74957e2f9609b982e964654f9af824 Mon Sep 17 00:00:00 2001 From: Craig Topper Date: Tue, 13 Feb 2024 16:17:50 -0800 Subject: [PATCH 1/3] [RISCV] Add canonical ISA string as Module metadata in IR. (#80760) In an LTO build, we don't set the ELF attributes to indicate what extensions were compiled with. The target CPU/Attrs in RISCVTargetMachine do not get set for an LTO build. Each function gets a target-cpu/feature attribute, but this isn't usable to set ELF attributs since we wouldn't know what function to use. We can't just once since it might have been compiler with an attribute likes target_verson. This patch adds the ISA as Module metadata so we can retrieve it in the backend. Individual translation units can still be compiled with different strings so we need to collect the unique set when Modules are merged. The backend will need to combine the unique ISA strings to produce a single value for the ELF attributes. This will be done in a separate patch. --- clang/lib/CodeGen/CodeGenModule.cpp | 14 + .../RISCV/ntlh-intrinsics/riscv32-zihintntl.c | 350 +- .../test/CodeGen/RISCV/riscv-metadata-arch.c | 20 + 3 files changed, 209 insertions(+), 175 deletions(-) create mode 100644 clang/test/CodeGen/RISCV/riscv-metadata-arch.c diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 1280bcd36de94..f576cd8b853c2 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -67,6 +67,7 @@ #include "llvm/Support/CommandLine.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/RISCVISAInfo.h" #include "llvm/Support/TimeProfiler.h" #include "llvm/Support/xxhash.h" #include "llvm/TargetParser/Triple.h" @@ -1059,6 +1060,19 @@ void CodeGenModule::Release() { llvm::LLVMContext = TheModule.getContext(); getModule().addModuleFlag(llvm::Module::Error, "target-abi", llvm::MDString::get(Ctx, ABIStr)); + +// Add the canonical ISA string as metadata so the backend can set the ELF +// attributes correctly. We use AppendUnique so LTO will keep all of the +// unique ISA strings that were linked together. +const std::vector = +getTarget().getTargetOpts().Features; +auto ParseResult = +llvm::RISCVISAInfo::parseFeatures(T.isRISCV64() ? 64 : 32, Features); +if (!errorToBool(ParseResult.takeError())) + getModule().addModuleFlag( + llvm::Module::AppendUnique, "riscv-isa", + llvm::MDNode::get( + Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(; } if (CodeGenOpts.SanitizeCfiCrossDso) { diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c index 897edbc6450af..b11c2ca010e7c 100644 --- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c +++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c @@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2; // clang-format off void ntl_all_sizes() { // CHECK-LABEL: ntl_all_sizes - uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5 - us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5 - f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5 - d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5 - v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4,
[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)
https://github.com/topperc milestoned https://github.com/llvm/llvm-project/pull/91514 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
https://github.com/maksfb edited https://github.com/llvm/llvm-project/pull/90807 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst , } } +std::optional +BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const { + assert(CurrentState == State::Empty); + assert(Offset < MaxSize && "invalid offset"); maksfb wrote: nit: capitalize the message. https://github.com/llvm/llvm-project/pull/90807 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst , } } +std::optional +BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const { + assert(CurrentState == State::Empty); + assert(Offset < MaxSize && "invalid offset"); + ErrorOr> FunctionData = getData(); + assert(FunctionData && "cannot get function as data"); maksfb wrote: nit: ditto. https://github.com/llvm/llvm-project/pull/90807 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst , } } +std::optional +BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const { + assert(CurrentState == State::Empty); maksfb wrote: nit: add message. https://github.com/llvm/llvm-project/pull/90807 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)
https://github.com/maksfb approved this pull request. Please address the nits. Otherwise - good to go. https://github.com/llvm/llvm-project/pull/90807 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -4991,3 +4971,38 @@ OMPClause *Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc); return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data); } + +bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , + SmallVectorImpl , + bool ReqIntConst) { + assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) && + "Expected parsing to start at clause name"); + ClauseNameLoc = ConsumeToken(); + + // Parse inside of '(' and ')'. + BalancedDelimiterTracker T(*this, tok::l_paren, tok::annot_pragma_openmp_end); + if (T.consumeOpen()) { +Diag(Tok, diag::err_expected) << tok::l_paren; +return true; + } + + // Parse the list with interleaved commas. + do { +ExprResult Val = +ReqIntConst ? ParseConstantExpression() : ParseAssignmentExpression(); +if (!Val.isUsable()) { + // Encountered something other than an expression; abort to ')'. + T.skipToEnd(); + return true; Meinersbur wrote: Callers should not use the output parameters when returning true. This seems to be common for `Parse...` methods, such as `ParseOpenACCIntExprList`, `parseOpenMPDeclareMapperVarDecl`, `ParseOpenMPParensExpr`, ... https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -17432,16 +17457,54 @@ OMPClause *SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs, SourceLocation StartLoc, SourceLocation LParenLoc, SourceLocation EndLoc) { - for (Expr *SizeExpr : SizeExprs) { -ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause( -SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true); -if (!NumForLoopsResult.isUsable()) - return nullptr; + SmallVector SanitizedSizeExprs; + llvm::append_range(SanitizedSizeExprs, SizeExprs); + + for (Expr * : SanitizedSizeExprs) { +// Skip if already sanitized, e.g. during a partial template instantiation. +if (!SizeExpr) + continue; + +bool IsValid = isNonNegativeIntegerValue(SizeExpr, SemaRef, OMPC_sizes, + /*StrictlyPositive=*/true); + +// isNonNegativeIntegerValue returns true for non-integral types (but still +// emits error diagnostic), so check for the expected type explicitly. +QualType SizeTy = SizeExpr->getType(); +if (!SizeTy->isIntegerType()) + IsValid = false; + +// Handling in templates is tricky. There are four possibilities to +// consider: +// +// 1a. The expression is valid and we are in a instantiated template or not +// in a template: +// Pass valid expression to be further analysed later in Sema. +// 1b. The expression is valid and we are in a template (including partial +// instantiation): +// isNonNegativeIntegerValue skipped any checks so there is no +// guarantee it will be correct after instantiation. +// ActOnOpenMPSizesClause will be called again at instantiation when +// it is not in a dependent context anymore. This may cause warnings +// to be emitted multiple times. +// 2a. The expression is invalid and we are in an instantiated template or +// not in a template: +// Invalidate the expression with a clearly wrong value (nullptr) so +// later in Sema we do not have to do the same validity analysis again +// or crash from unexpected data. Error diagnostics have already been +// emitted. +// 2b. The expression is invalid and we are in a template (including partial +// instantiation): +// Pass the invalid expression as-is, template instantiation may +// replace unexpected types/values with valid ones. The directives +// with this clause must not try to use these expressions in dependent +// contexts. Meinersbur wrote: Case 2b is already adhered to in `ActOnOpenMPTileDirective`: ``` // Delay tiling to when template is completely instantiated. if (SemaRef.CurContext->isDependentContext()) return OMPTileDirective::Create(Context, StartLoc, EndLoc, Clauses, NumLoops, AStmt, nullptr, nullptr); ``` Delaying further analysis seems generally to be what OpenMP does, e.g. ``` isNonNegativeIntegerValue(...) { if (!ValExpr->isTypeDependent() && !ValExpr->isValueDependent() && !ValExpr->isInstantiationDependent()) { ``` https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -15197,6 +15202,36 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses, // Once the original iteration values are set, append the innermost body. Stmt *Inner = Body; + auto MakeDimTileSize = [ = this->SemaRef, , , + SizesClause, CurScope](int I) -> Expr * { +Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I]; +if (isa(DimTileSizeExpr)) + return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr)); + +// When the tile size is not a constant but a variable, it is possible to +// pass non-positive numbers. To preserve the invariant that every loop Meinersbur wrote: ``` int a = 0; #pragma omp tile sizes(a) for (int i = 0; i < 42; ++i) body(i); ``` While I don't think it can be expected this gives any useful tiling, it should still execute `body` 42 times. https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/91345 >From a2aa6950ce3880b8e669025d95ac9e72245e26a7 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Tue, 7 May 2024 16:42:41 +0200 Subject: [PATCH 1/3] Allow non-constant tile sizes --- clang/include/clang/Parse/Parser.h| 17 ++ clang/lib/Parse/ParseOpenMP.cpp | 65 -- clang/lib/Sema/SemaOpenMP.cpp | 113 +++-- clang/test/OpenMP/tile_ast_print.cpp | 17 ++ clang/test/OpenMP/tile_codegen.cpp| 216 -- clang/test/OpenMP/tile_messages.cpp | 50 +++- openmp/runtime/test/transform/tile/intfor.c | 187 +++ .../test/transform/tile/negtile_intfor.c | 44 .../tile/parallel-wsloop-collapse-intfor.cpp | 100 9 files changed, 737 insertions(+), 72 deletions(-) create mode 100644 openmp/runtime/test/transform/tile/intfor.c create mode 100644 openmp/runtime/test/transform/tile/negtile_intfor.c create mode 100644 openmp/runtime/test/transform/tile/parallel-wsloop-collapse-intfor.cpp diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h index daefd4f28f011..1b500c11457f4 100644 --- a/clang/include/clang/Parse/Parser.h +++ b/clang/include/clang/Parse/Parser.h @@ -3553,6 +3553,23 @@ class Parser : public CodeCompletionHandler { OMPClause *ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OpenMPClauseKind Kind, bool ParseOnly); + /// Parses a clause consisting of a list of expressions. + /// + /// \param Kind The clause to parse. + /// \param ClauseNameLoc [out] The location of the clause name. + /// \param OpenLoc [out] The location of '('. + /// \param CloseLoc [out] The location of ')'. + /// \param Exprs [out] The parsed expressions. + /// \param ReqIntConst If true, each expression must be an integer constant. + /// + /// \return Whether the clause was parsed successfully. + bool ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , + SmallVectorImpl , + bool ReqIntConst = false); + /// Parses and creates OpenMP 5.0 iterators expression: /// = 'iterator' '(' { [ ] identifier = /// }+ ')' diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp index 18ba1185ee8de..b8b32f9546c4f 100644 --- a/clang/lib/Parse/ParseOpenMP.cpp +++ b/clang/lib/Parse/ParseOpenMP.cpp @@ -3107,34 +3107,14 @@ bool Parser::ParseOpenMPSimpleVarList( } OMPClause *Parser::ParseOpenMPSizesClause() { - SourceLocation ClauseNameLoc = ConsumeToken(); + SourceLocation ClauseNameLoc, OpenLoc, CloseLoc; SmallVector ValExprs; - - BalancedDelimiterTracker T(*this, tok::l_paren, tok::annot_pragma_openmp_end); - if (T.consumeOpen()) { -Diag(Tok, diag::err_expected) << tok::l_paren; + if (ParseOpenMPExprListClause(OMPC_sizes, ClauseNameLoc, OpenLoc, CloseLoc, +ValExprs)) return nullptr; - } - - while (true) { -ExprResult Val = ParseConstantExpression(); -if (!Val.isUsable()) { - T.skipToEnd(); - return nullptr; -} - -ValExprs.push_back(Val.get()); - -if (Tok.is(tok::r_paren) || Tok.is(tok::annot_pragma_openmp_end)) - break; - -ExpectAndConsume(tok::comma); - } - - T.consumeClose(); - return Actions.OpenMP().ActOnOpenMPSizesClause( - ValExprs, ClauseNameLoc, T.getOpenLocation(), T.getCloseLocation()); + return Actions.OpenMP().ActOnOpenMPSizesClause(ValExprs, ClauseNameLoc, + OpenLoc, CloseLoc); } OMPClause *Parser::ParseOpenMPUsesAllocatorClause(OpenMPDirectiveKind DKind) { @@ -4991,3 +4971,38 @@ OMPClause *Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc); return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data); } + +bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , + SmallVectorImpl , + bool ReqIntConst) { + assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) && + "Expected parsing to start at clause name"); + ClauseNameLoc = ConsumeToken(); + + // Parse inside of '(' and ')'. + BalancedDelimiterTracker T(*this, tok::l_paren, tok::annot_pragma_openmp_end); + if (T.consumeOpen()) { +Diag(Tok, diag::err_expected) << tok::l_paren; +return true; + } + + // Parse the list with interleaved commas. + do { +ExprResult Val = +ReqIntConst ?
[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)
llvmbot wrote: @jhuber6 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/91479 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/91479 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/91479 Backport 73bb8d9 Requested by: @jpeyton52 >From 8665ddef7954319a892cc7ce46099d1d31f59a1c Mon Sep 17 00:00:00 2001 From: Jonathan Peyton Date: Wed, 8 May 2024 09:23:50 -0500 Subject: [PATCH] [OpenMP] Fix child processes to use affinity_none (#91391) When a child process is forked with OpenMP already initialized, the child process resets its affinity mask and sets proc-bind-var to false so that the entire original affinity mask is used. This patch corrects an issue with the affinity initialization code setting affinity to compact instead of none for this special case of forked children. The test trying to catch this only testing explicit setting of KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting. Fixes: #91098 (cherry picked from commit 73bb8d9d92f689863c94d48517e89d35dae0ebcf) --- openmp/runtime/src/kmp_settings.cpp | 2 ++ openmp/runtime/test/affinity/redetect.c | 1 + 2 files changed, 3 insertions(+) diff --git a/openmp/runtime/src/kmp_settings.cpp b/openmp/runtime/src/kmp_settings.cpp index ec86ee07472c1..58f19ea5b8ab7 100644 --- a/openmp/runtime/src/kmp_settings.cpp +++ b/openmp/runtime/src/kmp_settings.cpp @@ -6426,6 +6426,8 @@ void __kmp_env_initialize(char const *string) { } if ((__kmp_nested_proc_bind.bind_types[0] != proc_bind_intel) && (__kmp_nested_proc_bind.bind_types[0] != proc_bind_default)) { + if (__kmp_nested_proc_bind.bind_types[0] == proc_bind_false) +__kmp_affinity.type = affinity_none; if (__kmp_affinity.type == affinity_default) { __kmp_affinity.type = affinity_compact; __kmp_affinity.flags.dups = FALSE; diff --git a/openmp/runtime/test/affinity/redetect.c b/openmp/runtime/test/affinity/redetect.c index dba83b72cc42e..4b96d1bd92ee7 100644 --- a/openmp/runtime/test/affinity/redetect.c +++ b/openmp/runtime/test/affinity/redetect.c @@ -1,4 +1,5 @@ // RUN: %libomp-compile +// RUN: %libomp-run // RUN: env KMP_AFFINITY=none %libomp-run // REQUIRES: linux ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -15197,6 +15202,36 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses, // Once the original iteration values are set, append the innermost body. Stmt *Inner = Body; + auto MakeDimTileSize = [ = this->SemaRef, , , + SizesClause, CurScope](int I) -> Expr * { +Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I]; +if (isa(DimTileSizeExpr)) + return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr)); + +// When the tile size is not a constant but a variable, it is possible to +// pass non-positive numbers. To preserve the invariant that every loop alexey-bataev wrote: ```suggestion // pass positive numbers. To preserve the invariant that every loop ``` https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -17432,16 +17457,54 @@ OMPClause *SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs, SourceLocation StartLoc, SourceLocation LParenLoc, SourceLocation EndLoc) { - for (Expr *SizeExpr : SizeExprs) { -ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause( -SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true); -if (!NumForLoopsResult.isUsable()) - return nullptr; + SmallVector SanitizedSizeExprs; + llvm::append_range(SanitizedSizeExprs, SizeExprs); alexey-bataev wrote: ```suggestion SmallVector SanitizedSizeExprs(SizeExprs.begin(), SizeExprs.end()); ``` https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -17432,16 +17457,54 @@ OMPClause *SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs, SourceLocation StartLoc, SourceLocation LParenLoc, SourceLocation EndLoc) { - for (Expr *SizeExpr : SizeExprs) { -ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause( -SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true); -if (!NumForLoopsResult.isUsable()) - return nullptr; + SmallVector SanitizedSizeExprs; + llvm::append_range(SanitizedSizeExprs, SizeExprs); + + for (Expr * : SanitizedSizeExprs) { +// Skip if already sanitized, e.g. during a partial template instantiation. +if (!SizeExpr) + continue; + +bool IsValid = isNonNegativeIntegerValue(SizeExpr, SemaRef, OMPC_sizes, + /*StrictlyPositive=*/true); + +// isNonNegativeIntegerValue returns true for non-integral types (but still +// emits error diagnostic), so check for the expected type explicitly. +QualType SizeTy = SizeExpr->getType(); +if (!SizeTy->isIntegerType()) + IsValid = false; + +// Handling in templates is tricky. There are four possibilities to +// consider: +// +// 1a. The expression is valid and we are in a instantiated template or not +// in a template: +// Pass valid expression to be further analysed later in Sema. +// 1b. The expression is valid and we are in a template (including partial +// instantiation): +// isNonNegativeIntegerValue skipped any checks so there is no +// guarantee it will be correct after instantiation. +// ActOnOpenMPSizesClause will be called again at instantiation when +// it is not in a dependent context anymore. This may cause warnings +// to be emitted multiple times. +// 2a. The expression is invalid and we are in an instantiated template or +// not in a template: +// Invalidate the expression with a clearly wrong value (nullptr) so +// later in Sema we do not have to do the same validity analysis again +// or crash from unexpected data. Error diagnostics have already been +// emitted. +// 2b. The expression is invalid and we are in a template (including partial +// instantiation): +// Pass the invalid expression as-is, template instantiation may +// replace unexpected types/values with valid ones. The directives +// with this clause must not try to use these expressions in dependent +// contexts. alexey-bataev wrote: This must be fixed, even if tricky https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -15111,13 +15111,11 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses, ASTContext = getASTContext(); Scope *CurScope = SemaRef.getCurScope(); - auto SizesClauses = - OMPExecutableDirective::getClausesOfKind(Clauses); - if (SizesClauses.empty()) { -// A missing 'sizes' clause is already reported by the parser. + const OMPSizesClause *SizesClause = alexey-bataev wrote: ```suggestion const auto *SizesClause = ``` https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -15197,6 +15202,36 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses, // Once the original iteration values are set, append the innermost body. Stmt *Inner = Body; + auto MakeDimTileSize = [ = this->SemaRef, , , + SizesClause, CurScope](int I) -> Expr * { +Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I]; +if (isa(DimTileSizeExpr)) + return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr)); + +// When the tile size is not a constant but a variable, it is possible to +// pass non-positive numbers. To preserve the invariant that every loop alexey-bataev wrote: Or what do you mean here? https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
@@ -4991,3 +4971,38 @@ OMPClause *Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc); return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data); } + +bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , + SmallVectorImpl , + bool ReqIntConst) { + assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) && + "Expected parsing to start at clause name"); + ClauseNameLoc = ConsumeToken(); + + // Parse inside of '(' and ')'. + BalancedDelimiterTracker T(*this, tok::l_paren, tok::annot_pragma_openmp_end); + if (T.consumeOpen()) { +Diag(Tok, diag::err_expected) << tok::l_paren; +return true; + } + + // Parse the list with interleaved commas. + do { +ExprResult Val = +ReqIntConst ? ParseConstantExpression() : ParseAssignmentExpression(); +if (!Val.isUsable()) { + // Encountered something other than an expression; abort to ')'. + T.skipToEnd(); + return true; alexey-bataev wrote: Do you need to clear output parameters if early exited? https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)
@@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots( // lazily and cached to avoid expensive recomputation. BlockIndexCache blockIndexCache; - for (PromotableAllocationOpInterface allocator : allocators) { -for (MemorySlot slot : allocator.getPromotableSlots()) { - if (slot.ptr.use_empty()) -continue; - - MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); - std::optional info = analyzer.computeInfo(); - if (info) { -MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, - std::move(*info), statistics, blockIndexCache) -.promoteSlot(); -promotedAny = true; + SmallVector workList(allocators.begin(), +allocators.end()); + + SmallVector newWorkList; + newWorkList.reserve(workList.size()); + while (true) { +for (PromotableAllocationOpInterface allocator : workList) { + for (MemorySlot slot : allocator.getPromotableSlots()) { +if (slot.ptr.use_empty()) + continue; + +MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); +std::optional info = analyzer.computeInfo(); +if (info) { + MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, + std::move(*info), statistics, blockIndexCache) + .promoteSlot(); + promotedAny = true; + continue; +} +newWorkList.push_back(allocator); gysit wrote: I think we may add the same allocator multiple times here, if the allocator returns multiple slots? https://github.com/llvm/llvm-project/pull/91464 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)
llvmbot wrote: @llvm/pr-subscribers-mlir-core Author: Christian Ulmann (Dinistro) Changes This commit modifies Mem2Reg's API to always attempt a full promotion on all the passed in "allocators". This ensures that the pass does not require unnecessary walks over the regions and improves caching benefits. --- Full diff: https://github.com/llvm/llvm-project/pull/91464.diff 2 Files Affected: - (modified) mlir/include/mlir/Transforms/Mem2Reg.h (+3-3) - (modified) mlir/lib/Transforms/Mem2Reg.cpp (+36-26) ``diff diff --git a/mlir/include/mlir/Transforms/Mem2Reg.h b/mlir/include/mlir/Transforms/Mem2Reg.h index fee7fb312750..6986cad9ae12 100644 --- a/mlir/include/mlir/Transforms/Mem2Reg.h +++ b/mlir/include/mlir/Transforms/Mem2Reg.h @@ -9,7 +9,6 @@ #ifndef MLIR_TRANSFORMS_MEM2REG_H #define MLIR_TRANSFORMS_MEM2REG_H -#include "mlir/IR/PatternMatch.h" #include "mlir/Interfaces/MemorySlotInterfaces.h" #include "llvm/ADT/Statistic.h" @@ -23,8 +22,9 @@ struct Mem2RegStatistics { llvm::Statistic *newBlockArgumentAmount = nullptr; }; -/// Attempts to promote the memory slots of the provided allocators. Succeeds if -/// at least one memory slot was promoted. +/// Attempts to promote the memory slots of the provided allocators. Iteratively +/// retries the promotion of all slots as promoting one slot might enable +/// subsequent promotions. Succeeds if at least one memory slot was promoted. LogicalResult tryToPromoteMemorySlots(ArrayRef allocators, OpBuilder , const DataLayout , diff --git a/mlir/lib/Transforms/Mem2Reg.cpp b/mlir/lib/Transforms/Mem2Reg.cpp index 8adbbcd01cb4..390d2a3f54b6 100644 --- a/mlir/lib/Transforms/Mem2Reg.cpp +++ b/mlir/lib/Transforms/Mem2Reg.cpp @@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots( // lazily and cached to avoid expensive recomputation. BlockIndexCache blockIndexCache; - for (PromotableAllocationOpInterface allocator : allocators) { -for (MemorySlot slot : allocator.getPromotableSlots()) { - if (slot.ptr.use_empty()) -continue; - - MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); - std::optional info = analyzer.computeInfo(); - if (info) { -MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, - std::move(*info), statistics, blockIndexCache) -.promoteSlot(); -promotedAny = true; + SmallVector workList(allocators.begin(), +allocators.end()); + + SmallVector newWorkList; + newWorkList.reserve(workList.size()); + while (true) { +for (PromotableAllocationOpInterface allocator : workList) { + for (MemorySlot slot : allocator.getPromotableSlots()) { +if (slot.ptr.use_empty()) + continue; + +MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); +std::optional info = analyzer.computeInfo(); +if (info) { + MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, + std::move(*info), statistics, blockIndexCache) + .promoteSlot(); + promotedAny = true; + continue; +} +newWorkList.push_back(allocator); } } +if (workList.size() == newWorkList.size()) + break; + +// Swap the vector's backing memory and clear the entries in newWorkList +// afterwards. This ensures that additional heap allocations can be avoided. +workList.swap(newWorkList); +newWorkList.clear(); } return success(promotedAny); @@ -677,22 +693,16 @@ struct Mem2Reg : impl::Mem2RegBase { OpBuilder builder((), region.front().begin()); - // Promoting a slot can allow for further promotion of other slots, - // promotion is tried until no promotion succeeds. - while (true) { -SmallVector allocators; -// Build a list of allocators to attempt to promote the slots of. -region.walk([&](PromotableAllocationOpInterface allocator) { - allocators.emplace_back(allocator); -}); - -// Attempt promoting until no promotion succeeds. -if (failed(tryToPromoteMemorySlots(allocators, builder, dataLayout, - dominance, statistics))) - break; + SmallVector allocators; + // Build a list of allocators to attempt to promote the slots of. + region.walk([&](PromotableAllocationOpInterface allocator) { +allocators.emplace_back(allocator); + }); + // Attempt promoting as many of the slots as possible. + if (succeeded(tryToPromoteMemorySlots(allocators, builder, dataLayout, +dominance, statistics))) changed = true; - } } if (!changed) markAllAnalysesPreserved(); `` https://github.com/llvm/llvm-project/pull/91464
[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)
https://github.com/Dinistro created https://github.com/llvm/llvm-project/pull/91464 This commit modifies Mem2Reg's API to always attempt a full promotion on all the passed in "allocators". This ensures that the pass does not require unnecessary walks over the regions and improves caching benefits. >From c5a6fd716c09d3445db41337c1bfbc9d6626e4da Mon Sep 17 00:00:00 2001 From: Christian Ulmann Date: Wed, 8 May 2024 12:03:56 + Subject: [PATCH] [MLIR][Mem2Reg] Change API to always retry promotion after changes This commit modifies the Mem2Reg's API to always attempt a full promotion on all the passed in "allocators". This ensures that the pass does not require unnecessary walks over the regions and improves caching benefits. --- mlir/include/mlir/Transforms/Mem2Reg.h | 6 +-- mlir/lib/Transforms/Mem2Reg.cpp| 62 +++--- 2 files changed, 39 insertions(+), 29 deletions(-) diff --git a/mlir/include/mlir/Transforms/Mem2Reg.h b/mlir/include/mlir/Transforms/Mem2Reg.h index fee7fb312750..6986cad9ae12 100644 --- a/mlir/include/mlir/Transforms/Mem2Reg.h +++ b/mlir/include/mlir/Transforms/Mem2Reg.h @@ -9,7 +9,6 @@ #ifndef MLIR_TRANSFORMS_MEM2REG_H #define MLIR_TRANSFORMS_MEM2REG_H -#include "mlir/IR/PatternMatch.h" #include "mlir/Interfaces/MemorySlotInterfaces.h" #include "llvm/ADT/Statistic.h" @@ -23,8 +22,9 @@ struct Mem2RegStatistics { llvm::Statistic *newBlockArgumentAmount = nullptr; }; -/// Attempts to promote the memory slots of the provided allocators. Succeeds if -/// at least one memory slot was promoted. +/// Attempts to promote the memory slots of the provided allocators. Iteratively +/// retries the promotion of all slots as promoting one slot might enable +/// subsequent promotions. Succeeds if at least one memory slot was promoted. LogicalResult tryToPromoteMemorySlots(ArrayRef allocators, OpBuilder , const DataLayout , diff --git a/mlir/lib/Transforms/Mem2Reg.cpp b/mlir/lib/Transforms/Mem2Reg.cpp index 8adbbcd01cb4..390d2a3f54b6 100644 --- a/mlir/lib/Transforms/Mem2Reg.cpp +++ b/mlir/lib/Transforms/Mem2Reg.cpp @@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots( // lazily and cached to avoid expensive recomputation. BlockIndexCache blockIndexCache; - for (PromotableAllocationOpInterface allocator : allocators) { -for (MemorySlot slot : allocator.getPromotableSlots()) { - if (slot.ptr.use_empty()) -continue; - - MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); - std::optional info = analyzer.computeInfo(); - if (info) { -MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, - std::move(*info), statistics, blockIndexCache) -.promoteSlot(); -promotedAny = true; + SmallVector workList(allocators.begin(), +allocators.end()); + + SmallVector newWorkList; + newWorkList.reserve(workList.size()); + while (true) { +for (PromotableAllocationOpInterface allocator : workList) { + for (MemorySlot slot : allocator.getPromotableSlots()) { +if (slot.ptr.use_empty()) + continue; + +MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout); +std::optional info = analyzer.computeInfo(); +if (info) { + MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout, + std::move(*info), statistics, blockIndexCache) + .promoteSlot(); + promotedAny = true; + continue; +} +newWorkList.push_back(allocator); } } +if (workList.size() == newWorkList.size()) + break; + +// Swap the vector's backing memory and clear the entries in newWorkList +// afterwards. This ensures that additional heap allocations can be avoided. +workList.swap(newWorkList); +newWorkList.clear(); } return success(promotedAny); @@ -677,22 +693,16 @@ struct Mem2Reg : impl::Mem2RegBase { OpBuilder builder((), region.front().begin()); - // Promoting a slot can allow for further promotion of other slots, - // promotion is tried until no promotion succeeds. - while (true) { -SmallVector allocators; -// Build a list of allocators to attempt to promote the slots of. -region.walk([&](PromotableAllocationOpInterface allocator) { - allocators.emplace_back(allocator); -}); - -// Attempt promoting until no promotion succeeds. -if (failed(tryToPromoteMemorySlots(allocators, builder, dataLayout, - dominance, statistics))) - break; + SmallVector allocators; + // Build a list of allocators to attempt to promote the slots of. + region.walk([&](PromotableAllocationOpInterface allocator) { +allocators.emplace_back(allocator); + }); + // Attempt
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Michael Kruse (Meinersbur) Changes Allow non-constants in the `sizes` clause such as ``` #pragma omp tile sizes(a) for (int i = 0; i n; ++i) ``` This is permitted since tile was introduced in [OpenMP 5.1](https://www.openmp.org/spec-html/5.1/openmpsu53.html#x78-860002.11.9). It is possible to sneak-in negative numbers at runtime as in ``` int a = -1; #pragma omp tile sizes(a) ``` Even though it is not well-formed, it should still result in every loop iteration to be executed exactly once, an invariant of the tile construct that we should ensure. `ParseOpenMPExprListClause` is extracted-out to be reused by the `permutation` clause if the `interchange` construct. Some care was put in to ensure correct behavior in template contexts. This patch also adds end-to-end tests. This is to avoid errors like the off-by-one error I caused with the initial implementation of the unroll construct. --- Patch is 41.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91345.diff 9 Files Affected: - (modified) clang/include/clang/Parse/Parser.h (+17) - (modified) clang/lib/Parse/ParseOpenMP.cpp (+40-25) - (modified) clang/lib/Sema/SemaOpenMP.cpp (+88-25) - (modified) clang/test/OpenMP/tile_ast_print.cpp (+17) - (modified) clang/test/OpenMP/tile_codegen.cpp (+201-15) - (modified) clang/test/OpenMP/tile_messages.cpp (+43-7) - (added) openmp/runtime/test/transform/tile/intfor.c (+191) - (added) openmp/runtime/test/transform/tile/negtile_intfor.c (+44) - (added) openmp/runtime/test/transform/tile/parallel-wsloop-collapse-intfor.cpp (+100) ``diff diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h index daefd4f28f011..1b500c11457f4 100644 --- a/clang/include/clang/Parse/Parser.h +++ b/clang/include/clang/Parse/Parser.h @@ -3553,6 +3553,23 @@ class Parser : public CodeCompletionHandler { OMPClause *ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OpenMPClauseKind Kind, bool ParseOnly); + /// Parses a clause consisting of a list of expressions. + /// + /// \param Kind The clause to parse. + /// \param ClauseNameLoc [out] The location of the clause name. + /// \param OpenLoc [out] The location of '('. + /// \param CloseLoc [out] The location of ')'. + /// \param Exprs [out] The parsed expressions. + /// \param ReqIntConst If true, each expression must be an integer constant. + /// + /// \return Whether the clause was parsed successfully. + bool ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , + SmallVectorImpl , + bool ReqIntConst = false); + /// Parses and creates OpenMP 5.0 iterators expression: /// = 'iterator' '(' { [ ] identifier = /// }+ ')' diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp index 18ba1185ee8de..b8b32f9546c4f 100644 --- a/clang/lib/Parse/ParseOpenMP.cpp +++ b/clang/lib/Parse/ParseOpenMP.cpp @@ -3107,34 +3107,14 @@ bool Parser::ParseOpenMPSimpleVarList( } OMPClause *Parser::ParseOpenMPSizesClause() { - SourceLocation ClauseNameLoc = ConsumeToken(); + SourceLocation ClauseNameLoc, OpenLoc, CloseLoc; SmallVector ValExprs; - - BalancedDelimiterTracker T(*this, tok::l_paren, tok::annot_pragma_openmp_end); - if (T.consumeOpen()) { -Diag(Tok, diag::err_expected) << tok::l_paren; + if (ParseOpenMPExprListClause(OMPC_sizes, ClauseNameLoc, OpenLoc, CloseLoc, +ValExprs)) return nullptr; - } - - while (true) { -ExprResult Val = ParseConstantExpression(); -if (!Val.isUsable()) { - T.skipToEnd(); - return nullptr; -} - -ValExprs.push_back(Val.get()); - -if (Tok.is(tok::r_paren) || Tok.is(tok::annot_pragma_openmp_end)) - break; - -ExpectAndConsume(tok::comma); - } - - T.consumeClose(); - return Actions.OpenMP().ActOnOpenMPSizesClause( - ValExprs, ClauseNameLoc, T.getOpenLocation(), T.getCloseLocation()); + return Actions.OpenMP().ActOnOpenMPSizesClause(ValExprs, ClauseNameLoc, + OpenLoc, CloseLoc); } OMPClause *Parser::ParseOpenMPUsesAllocatorClause(OpenMPDirectiveKind DKind) { @@ -4991,3 +4971,38 @@ OMPClause *Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind, OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc); return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data); } + +bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind, + SourceLocation , + SourceLocation , + SourceLocation , +
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
https://github.com/Meinersbur ready_for_review https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)
Meinersbur wrote: Test failure is from unrelated `DataFlowSanitizer-x86_64 :: release_shadow_space.c` https://github.com/llvm/llvm-project/pull/91345 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/91425 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits