date:20240508

[llvm-branch-commits] [llvm] release/18.x: [PPCMergeStringPool] Avoid replacing constant with instruction (#88846) (PR #91557)

2024-05-08 Thread Nikita Popov via llvm-branch-commits


https://github.com/nikic created https://github.com/llvm/llvm-project/pull/91557

Backport of 3a3aeb8eba40e981d3a9ff92175f949c2f3d4434 to the release branch.

>From 7764bb3a47f241ca4e4d3fe42e96ab6bdecbdbe0 Mon Sep 17 00:00:00 2001
From: Nikita Popov 
Date: Thu, 9 May 2024 13:27:20 +0900
Subject: [PATCH] [PPCMergeStringPool] Avoid replacing constant with
 instruction (#88846)

String pool merging currently, for a reason that's not entirely clear to
me, tries to create GEP instructions instead of GEP constant expressions
when replacing constant references. It only uses constant expressions in
cases where this is required. However, it does not catch all cases where
such a requirement exists. For example, the landingpad catch clause has
to be a constant.

Fix this by always using the constant expression variant, which also
makes the implementation simpler.

Additionally, there are some edge cases where even replacement with a
constant GEP is not legal. The one I am aware of is the
llvm.eh.typeid.for intrinsic, so add a special case to forbid
replacements for it.

Fixes https://github.com/llvm/llvm-project/issues/88844.

(cherry picked from commit 3a3aeb8eba40e981d3a9ff92175f949c2f3d4434)
---
 .../lib/Target/PowerPC/PPCMergeStringPool.cpp | 57 ++-
 .../mergeable-string-pool-exceptions.ll   | 47 +++
 .../mergeable-string-pool-pass-only.mir   | 18 +++---
 3 files changed, 73 insertions(+), 49 deletions(-)
 create mode 100644 
llvm/test/CodeGen/PowerPC/mergeable-string-pool-exceptions.ll

diff --git a/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp 
b/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp
index d9465e86d8966..ebd876d50c44e 100644
--- a/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp
+++ b/llvm/lib/Target/PowerPC/PPCMergeStringPool.cpp
@@ -23,6 +23,7 @@
 #include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/ValueSymbolTable.h"
 #include "llvm/Pass.h"
@@ -116,9 +117,20 @@ class PPCMergeStringPool : public ModulePass {
 // sure that they can be replaced.
 static bool hasReplaceableUsers(GlobalVariable ) {
   for (User *CurrentUser : GV.users()) {
-// Instruction users are always valid.
-if (isa(CurrentUser))
+if (auto *I = dyn_cast(CurrentUser)) {
+  // Do not merge globals in exception pads.
+  if (I->isEHPad())
+return false;
+
+  if (auto *II = dyn_cast(I)) {
+// Some intrinsics require a plain global.
+if (II->getIntrinsicID() == Intrinsic::eh_typeid_for)
+  return false;
+  }
+
+  // Other instruction users are always valid.
   continue;
+}
 
 // We cannot replace GlobalValue users because they are not just nodes
 // in IR. To replace a user like this we would need to create a new
@@ -302,14 +314,6 @@ void PPCMergeStringPool::replaceUsesWithGEP(GlobalVariable 
*GlobalToReplace,
 Users.push_back(CurrentUser);
 
   for (User *CurrentUser : Users) {
-Instruction *UserInstruction = dyn_cast(CurrentUser);
-Constant *UserConstant = dyn_cast(CurrentUser);
-
-// At this point we expect that the user is either an instruction or a
-// constant.
-assert((UserConstant || UserInstruction) &&
-   "Expected the user to be an instruction or a constant.");
-
 // The user was not found so it must have been replaced earlier.
 if (!userHasOperand(CurrentUser, GlobalToReplace))
   continue;
@@ -318,38 +322,13 @@ void 
PPCMergeStringPool::replaceUsesWithGEP(GlobalVariable *GlobalToReplace,
 if (isa(CurrentUser))
   continue;
 
-if (!UserInstruction) {
-  // User is a constant type.
-  Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr(
-  PooledStructType, GPool, Indices);
-  UserConstant->handleOperandChange(GlobalToReplace, ConstGEP);
-  continue;
-}
-
-if (PHINode *UserPHI = dyn_cast(UserInstruction)) {
-  // GEP instructions cannot be added before PHI nodes.
-  // With getInBoundsGetElementPtr we create the GEP and then replace it
-  // inline into the PHI.
-  Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr(
-  PooledStructType, GPool, Indices);
-  UserPHI->replaceUsesOfWith(GlobalToReplace, ConstGEP);
-  continue;
-}
-// The user is a valid instruction that is not a PHINode.
-GetElementPtrInst *GEPInst =
-GetElementPtrInst::Create(PooledStructType, GPool, Indices);
-GEPInst->insertBefore(UserInstruction);
-
-LLVM_DEBUG(dbgs() << "Inserting GEP before:\n");
-LLVM_DEBUG(UserInstruction->dump());
-
+Constant *ConstGEP = ConstantExpr::getInBoundsGetElementPtr(
+PooledStructType, GPool, Indices);
 LLVM_DEBUG(dbgs() << "Replacing this global:\n");
 LLVM_DEBUG(GlobalToReplace->dump());
 LLVM_DEBUG(dbgs() << "with this:\n");
-LLVM_DEBUG(GEPInst->dump());
-
-//

[llvm-branch-commits] [llvm] release/18.x: [PPCMergeStringPool] Avoid replacing constant with instruction (#88846) (PR #91557)

2024-05-08 Thread Nikita Popov via llvm-branch-commits


https://github.com/nikic milestoned 
https://github.com/llvm/llvm-project/pull/91557
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)

2024-05-08 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz updated 
https://github.com/llvm/llvm-project/pull/91553

>From dcb8d6bea11cabb60483bd3e12aa4df7b76ca204 Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Thu, 9 May 2024 05:18:35 +0200
Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader

Fixes https://github.com/llvm/llvm-project/issues/91551
---
 libclc/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt
index fa1d8e4adbcc4..b7f8bb18c2288 100644
--- a/libclc/CMakeLists.txt
+++ b/libclc/CMakeLists.txt
@@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} )
 set(LLVM_LINK_COMPONENTS
   BitReader
   BitWriter
+  IRReader
   Core
   Support
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)

2024-05-08 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz updated 
https://github.com/llvm/llvm-project/pull/91553

>From 604b95fa0ea0278eadfb631ee2ac15386f85edaf Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Thu, 9 May 2024 05:18:35 +0200
Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader

Fixes https://github.com/llvm/llvm-project/issues/91551.
---
 libclc/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt
index fa1d8e4adbcc4..b7f8bb18c2288 100644
--- a/libclc/CMakeLists.txt
+++ b/libclc/CMakeLists.txt
@@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} )
 set(LLVM_LINK_COMPONENTS
   BitReader
   BitWriter
+  IRReader
   Core
   Support
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)

2024-05-08 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz updated 
https://github.com/llvm/llvm-project/pull/91553

>From 1326001c4386a0296f1e6230c6a5228d9109ee12 Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Thu, 9 May 2024 05:18:35 +0200
Subject: [PATCH] release/18.x: [libclc] Fix linking against libIRReader

Fixes https://github.com/llvm/llvm-project/issues/91551.
---
 libclc/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt
index fa1d8e4adbcc4..b7f8bb18c2288 100644
--- a/libclc/CMakeLists.txt
+++ b/libclc/CMakeLists.txt
@@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} )
 set(LLVM_LINK_COMPONENTS
   BitReader
   BitWriter
+  IRReader
   Core
   Support
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/18.x: [libclc] Fix linking against libIRReader (PR #91553)

2024-05-08 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz edited 
https://github.com/llvm/llvm-project/pull/91553
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] [libclc] Fix linking against libIRReader (release/18.x) (PR #91553)

2024-05-08 Thread Thomas Debesse via llvm-branch-commits


https://github.com/illwieckz created 
https://github.com/llvm/llvm-project/pull/91553

Fixes #91551:

- https://github.com/llvm/llvm-project/issues/91551

The patch is not needed in `main` because another larger patch already merged 
in `main` includes this change: 
https://github.com/llvm/llvm-project/commit/61efea7142e904e6492e1ce0566ec23d9d221c1e
 .

This one line patch is enough to fix the build on LLVM 18 branch so it's 
probably a good idea to merge it, it's obvious, non-intrusive and can't do harm.

>From 8a040018e59c9cb9e745885f5292f0e7967197ee Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Thu, 9 May 2024 05:18:35 +0200
Subject: [PATCH] [libclc] Fix linking against libIRReader

Fixes https://github.com/llvm/llvm-project/issues/91551.
---
 libclc/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libclc/CMakeLists.txt b/libclc/CMakeLists.txt
index fa1d8e4adbcc4..b7f8bb18c2288 100644
--- a/libclc/CMakeLists.txt
+++ b/libclc/CMakeLists.txt
@@ -114,6 +114,7 @@ include_directories( ${LLVM_INCLUDE_DIRS} )
 set(LLVM_LINK_COMPONENTS
   BitReader
   BitWriter
+  IRReader
   Core
   Support
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: AtariDreams (AtariDreams)


Changes

As well as flipping the sense of the bit, GFX12 moved it from bit 0 to bit 1 in 
the encoded simm16 operand.

(cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d)

---
Full diff: https://github.com/llvm/llvm-project/pull/91034.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SOPInstructions.td (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll (+3-7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index ae5ef0541929b..5762efde73f02 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1786,7 +1786,7 @@ def : GCNPat<
 let SubtargetPredicate = isNotGFX12Plus in
   def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
0))>;
 let SubtargetPredicate = isGFX12Plus in
-  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
1))>;
+  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
2))>;
 
 // The first 10 bits of the mode register are the core FP mode on all
 // subtargets.
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
index 08c77148f6ae1..433fefa434988 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
@@ -5,14 +5,10 @@
 
 ; GCN-LABEL: {{^}}test_wait_event:
 ; GFX11: s_wait_event 0x0
-; GFX12: s_wait_event 0x1
+; GFX12: s_wait_event 0x2
 
-define amdgpu_ps void @test_wait_event() #0 {
+define amdgpu_ps void @test_wait_event() {
 entry:
-  call void @llvm.amdgcn.s.wait.event.export.ready() #0
+  call void @llvm.amdgcn.s.wait.event.export.ready()
   ret void
 }
-
-declare void @llvm.amdgcn.s.wait.event.export.ready() #0
-
-attributes #0 = { nounwind }

``




https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] bce9393 - [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Jay Foad
Date: 2024-05-08T20:17:31-07:00
New Revision: bce9393291a2daa8006d1da629aa2765e00f4e70

URL: 
https://github.com/llvm/llvm-project/commit/bce9393291a2daa8006d1da629aa2765e00f4e70
DIFF: 
https://github.com/llvm/llvm-project/commit/bce9393291a2daa8006d1da629aa2765e00f4e70.diff

LOG: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622)

As well as flipping the sense of the bit, GFX12 moved it from bit 0 to
bit 1 in the encoded simm16 operand.

(cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d)

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SOPInstructions.td
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index ae5ef0541929b..5762efde73f02 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1786,7 +1786,7 @@ def : GCNPat<
 let SubtargetPredicate = isNotGFX12Plus in
   def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
0))>;
 let SubtargetPredicate = isGFX12Plus in
-  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
1))>;
+  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
2))>;
 
 // The first 10 bits of the mode register are the core FP mode on all
 // subtargets.

diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
index 08c77148f6ae1..433fefa434988 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
@@ -5,14 +5,10 @@
 
 ; GCN-LABEL: {{^}}test_wait_event:
 ; GFX11: s_wait_event 0x0
-; GFX12: s_wait_event 0x1
+; GFX12: s_wait_event 0x2
 
-define amdgpu_ps void @test_wait_event() #0 {
+define amdgpu_ps void @test_wait_event() {
 entry:
-  call void @llvm.amdgcn.s.wait.event.export.ready() #0
+  call void @llvm.amdgcn.s.wait.event.export.ready()
   ret void
 }
-
-declare void @llvm.amdgcn.s.wait.event.export.ready() #0
-
-attributes #0 = { nounwind }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91034

>From bce9393291a2daa8006d1da629aa2765e00f4e70 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Tue, 23 Apr 2024 14:38:45 +0100
Subject: [PATCH] [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready
 (#89622)

As well as flipping the sense of the bit, GFX12 moved it from bit 0 to
bit 1 in the encoded simm16 operand.

(cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d)
---
 llvm/lib/Target/AMDGPU/SOPInstructions.td|  2 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll | 10 +++---
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index ae5ef0541929b..5762efde73f02 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1786,7 +1786,7 @@ def : GCNPat<
 let SubtargetPredicate = isNotGFX12Plus in
   def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
0))>;
 let SubtargetPredicate = isGFX12Plus in
-  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
1))>;
+  def : GCNPat <(int_amdgcn_s_wait_event_export_ready), (S_WAIT_EVENT (i16 
2))>;
 
 // The first 10 bits of the mode register are the core FP mode on all
 // subtargets.
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
index 08c77148f6ae1..433fefa434988 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
@@ -5,14 +5,10 @@
 
 ; GCN-LABEL: {{^}}test_wait_event:
 ; GFX11: s_wait_event 0x0
-; GFX12: s_wait_event 0x1
+; GFX12: s_wait_event 0x2
 
-define amdgpu_ps void @test_wait_event() #0 {
+define amdgpu_ps void @test_wait_event() {
 entry:
-  call void @llvm.amdgcn.s.wait.event.export.ready() #0
+  call void @llvm.amdgcn.s.wait.event.export.ready()
   ret void
 }
-
-declare void @llvm.amdgcn.s.wait.event.export.ready() #0
-
-attributes #0 = { nounwind }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-selectiondag

Author: AtariDreams (AtariDreams)


Changes

This is a fix for miscompiles reported in
  https://github.com/llvm/llvm-project/issues/89060

After argument copy elison the IR value for the eliminated alloca is aliasing 
with the fixed stack object. This patch is making sure that we mark the fixed 
stack object as being aliased with IR values to avoid that for example 
schedulers are reordering accesses to the fixed stack object. This could 
otherwise happen when there is a mix of MemOperands referring the shared fixed 
stack slow via both the IR value for the elided alloca, and via a fixed stack 
pseudo source value (as would be the case when lowering the arguments).

(cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63)

---
Full diff: https://github.com/llvm/llvm-project/pull/91035.diff


3 Files Affected:

- (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+2-1) 
- (added) llvm/test/CodeGen/Hexagon/arg-copy-elison.ll (+39) 


``diff
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f..c35faac09c4d9 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -697,6 +697,13 @@ class MachineFrameInfo {
 return Objects[ObjectIdx+NumFixedObjects].isAliased;
   }
 
+  /// Set "maybe pointed to by an LLVM IR value" for an object.
+  void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) {
+assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() &&
+   "Invalid Object Idx!");
+Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased;
+  }
+
   /// Returns true if the specified index corresponds to an immutable object.
   bool isImmutableObjectIndex(int ObjectIdx) const {
 // Tail calling functions can clobber their function arguments.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5ce1013f30fd1..7406a8ac1611d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy(
   }
 
   // Perform the elision. Delete the old stack object and replace its only use
-  // in the variable info map. Mark the stack object as mutable.
+  // in the variable info map. Mark the stack object as mutable and aliased.
   LLVM_DEBUG({
 dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n'
<< "  Replacing frame index " << OldIndex << " with " << FixedIndex
@@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy(
   });
   MFI.RemoveStackObject(OldIndex);
   MFI.setIsImmutableObjectIndex(FixedIndex, false);
+  MFI.setIsAliasedObjectIndex(FixedIndex, true);
   AllocaIndex = FixedIndex;
   ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex});
   for (SDValue ArgVal : ArgVals)
diff --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll 
b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
new file mode 100644
index 0..f0c30c301f446
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s
+
+; Reproducer for https://github.com/llvm/llvm-project/issues/89060
+;
+; Problem was a bug in argument copy elison. Given that the %alloca is
+; eliminated, the same frame index will be used for accessing %alloca and %a
+; on the fixed stack. Care must be taken when setting up
+; MachinePointerInfo/MemOperands for those accesses to either make sure that
+; we always refer to the fixed stack slot the same way (not using the
+; ir.alloca name), or make sure that we still detect that they alias each
+; other if using different kinds of MemOperands to identify the same fixed
+; stack entry.
+;
+define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, 
i32 %q1, i32 %a, i32 %q2) {
+; CHECK-LABEL: f:
+; CHECK: .cfi_startproc
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = memw(r29+#36)
+; CHECK-NEXT: r1 = memw(r29+#28)
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = sub(r1,r0)
+; CHECK-NEXT: r2 = memw(r29+#32)
+; CHECK-NEXT: memw(r29+#32) = ##666
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = xor(r0,r2)
+; CHECK-NEXT: jumpr r31
+; CHECK-NEXT:}
+  %alloca = alloca i32
+  store i32 %a, ptr %alloca ; Should be elided.
+  store i32 666, ptr %alloca
+  %x = sub i32 %q1, %q2
+  %y = xor i32 %x, %a   ; Results in a load of %a from fixed stack.
+; Using same frame index as elided %alloca.
+  ret i32 %y
+}

``




https://github.com/llvm/llvm-project/pull/91035

[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91035
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] f5f572f - [SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Björn Pettersson
Date: 2024-05-08T20:16:03-07:00
New Revision: f5f572f54b32f6ff3ae450fa421ed6d478f09ec8

URL: 
https://github.com/llvm/llvm-project/commit/f5f572f54b32f6ff3ae450fa421ed6d478f09ec8
DIFF: 
https://github.com/llvm/llvm-project/commit/f5f572f54b32f6ff3ae450fa421ed6d478f09ec8.diff

LOG: [SelectionDAG] Mark frame index as "aliased" at argument copy elison 
(#89712)

This is a fix for miscompiles reported in
  https://github.com/llvm/llvm-project/issues/89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).

(cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63)

Added: 
llvm/test/CodeGen/Hexagon/arg-copy-elison.ll

Modified: 
llvm/include/llvm/CodeGen/MachineFrameInfo.h
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f..c35faac09c4d9 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -697,6 +697,13 @@ class MachineFrameInfo {
 return Objects[ObjectIdx+NumFixedObjects].isAliased;
   }
 
+  /// Set "maybe pointed to by an LLVM IR value" for an object.
+  void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) {
+assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() &&
+   "Invalid Object Idx!");
+Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased;
+  }
+
   /// Returns true if the specified index corresponds to an immutable object.
   bool isImmutableObjectIndex(int ObjectIdx) const {
 // Tail calling functions can clobber their function arguments.

diff  --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5ce1013f30fd1..7406a8ac1611d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy(
   }
 
   // Perform the elision. Delete the old stack object and replace its only use
-  // in the variable info map. Mark the stack object as mutable.
+  // in the variable info map. Mark the stack object as mutable and aliased.
   LLVM_DEBUG({
 dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n'
<< "  Replacing frame index " << OldIndex << " with " << FixedIndex
@@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy(
   });
   MFI.RemoveStackObject(OldIndex);
   MFI.setIsImmutableObjectIndex(FixedIndex, false);
+  MFI.setIsAliasedObjectIndex(FixedIndex, true);
   AllocaIndex = FixedIndex;
   ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex});
   for (SDValue ArgVal : ArgVals)

diff  --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll 
b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
new file mode 100644
index 0..f0c30c301f446
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s
+
+; Reproducer for https://github.com/llvm/llvm-project/issues/89060
+;
+; Problem was a bug in argument copy elison. Given that the %alloca is
+; eliminated, the same frame index will be used for accessing %alloca and %a
+; on the fixed stack. Care must be taken when setting up
+; MachinePointerInfo/MemOperands for those accesses to either make sure that
+; we always refer to the fixed stack slot the same way (not using the
+; ir.alloca name), or make sure that we still detect that they alias each
+; other if using 
diff erent kinds of MemOperands to identify the same fixed
+; stack entry.
+;
+define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, 
i32 %q1, i32 %a, i32 %q2) {
+; CHECK-LABEL: f:
+; CHECK: .cfi_startproc
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = memw(r29+#36)
+; CHECK-NEXT: r1 = memw(r29+#28)
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = sub(r1,r0)
+; CHECK-NEXT: r2 = memw(r29+#32)
+; CHECK-NEXT: memw(r29+#32) = ##666
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = xor(r0,r2)
+; CHECK-NEXT: jumpr r31
+; CHECK-NEXT:}
+  %alloca = alloca i32
+  store i32 %a, ptr %alloca ; Should be elided.
+  store i32 666, ptr %alloca
+  %x = sub i32

[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Mark frame index as "aliased" at argument copy elison (PR #91035)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91035

>From f5f572f54b32f6ff3ae450fa421ed6d478f09ec8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Pettersson?= 
Date: Tue, 23 Apr 2024 13:49:18 +0200
Subject: [PATCH] [SelectionDAG] Mark frame index as "aliased" at argument copy
 elison (#89712)

This is a fix for miscompiles reported in
  https://github.com/llvm/llvm-project/issues/89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).

(cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63)
---
 llvm/include/llvm/CodeGen/MachineFrameInfo.h  |  7 
 .../SelectionDAG/SelectionDAGBuilder.cpp  |  3 +-
 llvm/test/CodeGen/Hexagon/arg-copy-elison.ll  | 39 +++
 3 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/Hexagon/arg-copy-elison.ll

diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f..c35faac09c4d9 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -697,6 +697,13 @@ class MachineFrameInfo {
 return Objects[ObjectIdx+NumFixedObjects].isAliased;
   }
 
+  /// Set "maybe pointed to by an LLVM IR value" for an object.
+  void setIsAliasedObjectIndex(int ObjectIdx, bool IsAliased) {
+assert(unsigned(ObjectIdx+NumFixedObjects) < Objects.size() &&
+   "Invalid Object Idx!");
+Objects[ObjectIdx+NumFixedObjects].isAliased = IsAliased;
+  }
+
   /// Returns true if the specified index corresponds to an immutable object.
   bool isImmutableObjectIndex(int ObjectIdx) const {
 // Tail calling functions can clobber their function arguments.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5ce1013f30fd1..7406a8ac1611d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -10888,7 +10888,7 @@ static void tryToElideArgumentCopy(
   }
 
   // Perform the elision. Delete the old stack object and replace its only use
-  // in the variable info map. Mark the stack object as mutable.
+  // in the variable info map. Mark the stack object as mutable and aliased.
   LLVM_DEBUG({
 dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n'
<< "  Replacing frame index " << OldIndex << " with " << FixedIndex
@@ -10896,6 +10896,7 @@ static void tryToElideArgumentCopy(
   });
   MFI.RemoveStackObject(OldIndex);
   MFI.setIsImmutableObjectIndex(FixedIndex, false);
+  MFI.setIsAliasedObjectIndex(FixedIndex, true);
   AllocaIndex = FixedIndex;
   ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex});
   for (SDValue ArgVal : ArgVals)
diff --git a/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll 
b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
new file mode 100644
index 0..f0c30c301f446
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/arg-copy-elison.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple hexagon-- -o - %s | FileCheck %s
+
+; Reproducer for https://github.com/llvm/llvm-project/issues/89060
+;
+; Problem was a bug in argument copy elison. Given that the %alloca is
+; eliminated, the same frame index will be used for accessing %alloca and %a
+; on the fixed stack. Care must be taken when setting up
+; MachinePointerInfo/MemOperands for those accesses to either make sure that
+; we always refer to the fixed stack slot the same way (not using the
+; ir.alloca name), or make sure that we still detect that they alias each
+; other if using different kinds of MemOperands to identify the same fixed
+; stack entry.
+;
+define i32 @f(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, 
i32 %q1, i32 %a, i32 %q2) {
+; CHECK-LABEL: f:
+; CHECK: .cfi_startproc
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = memw(r29+#36)
+; CHECK-NEXT: r1 = memw(r29+#28)
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = sub(r1,r0)
+; CHECK-NEXT: r2 = memw(r29+#32)
+; CHECK-NEXT: memw(r29+#32) = ##666
+; CHECK-NEXT:}
+; CHECK-NEXT:{
+; CHECK-NEXT: r0 = xor(r0,r2)
+; CHECK-NEXT: jumpr r31
+; CHECK-NEXT:}
+  %alloca = alloca i32
+  store i32 %a, ptr %alloca ; Should be elided.
+  store i32 666, ptr %alloca
+  %x = sub i32 %q1, %q2
+  %y = xor i32 %x, %a

[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91425
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] dfc89f8 - [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Phoebe Wang
Date: 2024-05-08T20:14:03-07:00
New Revision: dfc89f89ed14ebf22effe9dd9605608a975c4ed8

URL: 
https://github.com/llvm/llvm-project/commit/dfc89f89ed14ebf22effe9dd9605608a975c4ed8
DIFF: 
https://github.com/llvm/llvm-project/commit/dfc89f89ed14ebf22effe9dd9605608a975c4ed8.diff

LOG: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125)

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes #91005

Added: 
llvm/test/CodeGen/X86/pr91005.ll

Modified: 
llvm/lib/Target/X86/X86ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index c572b27fe401e..3e4ecab8443a9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7295,7 +7295,7 @@ static SDValue 
lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp,
 // With pattern matching, the VBROADCAST node may become a VMOVDDUP.
 if (ScalarSize == 32 ||
 (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) ||
-CVT == MVT::f16 ||
+(CVT == MVT::f16 && Subtarget.hasAVX2()) ||
 (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( {
   const Constant *C = nullptr;
   if (ConstantSDNode *CI = dyn_cast(Ld))

diff  --git a/llvm/test/CodeGen/X86/pr91005.ll 
b/llvm/test/CodeGen/X86/pr91005.ll
new file mode 100644
index 0..16b78bf1e7e17
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr91005.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s
+
+define void @PR91005(ptr %0) minsize {
+; CHECK-LABEL: PR91005:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:xorl %eax, %eax
+; CHECK-NEXT:testb %al, %al
+; CHECK-NEXT:je .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpextrw $0, %xmm0, %eax
+; CHECK-NEXT:movzwl %ax, %eax
+; CHECK-NEXT:vmovd %eax, %xmm0
+; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0
+; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0
+; CHECK-NEXT:vmovd %xmm0, %eax
+; CHECK-NEXT:movw %ax, (%rdi)
+; CHECK-NEXT:  .LBB0_2: # %common.ret
+; CHECK-NEXT:retq
+  %2 = bitcast <2 x half> poison to <2 x i16>
+  %3 = icmp eq <2 x i16> %2, 
+  br i1 poison, label %4, label %common.ret
+
+common.ret:   ; preds = %4, %1
+  ret void
+
+4:; preds = %1
+  %5 = select <2 x i1> %3, <2 x half> , <2 x half> 
zeroinitializer
+  %6 = fmul <2 x half> %5, zeroinitializer
+  %7 = fsub <2 x half> %6, zeroinitializer
+  %8 = extractelement <2 x half> %7, i64 0
+  store half %8, ptr %0, align 2
+  br label %common.ret
+}
+
+declare <2 x half> @llvm.fabs.v2f16(<2 x half>)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91425

>From dfc89f89ed14ebf22effe9dd9605608a975c4ed8 Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Mon, 6 May 2024 10:59:44 +0800
Subject: [PATCH] [X86][FP16] Do not create VBROADCAST_LOAD for f16 without
 AVX2 (#91125)

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes #91005
---
 llvm/lib/Target/X86/X86ISelLowering.cpp |  2 +-
 llvm/test/CodeGen/X86/pr91005.ll| 40 +
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/X86/pr91005.ll

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index c572b27fe401e..3e4ecab8443a9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7295,7 +7295,7 @@ static SDValue 
lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp,
 // With pattern matching, the VBROADCAST node may become a VMOVDDUP.
 if (ScalarSize == 32 ||
 (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) ||
-CVT == MVT::f16 ||
+(CVT == MVT::f16 && Subtarget.hasAVX2()) ||
 (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( {
   const Constant *C = nullptr;
   if (ConstantSDNode *CI = dyn_cast(Ld))
diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll
new file mode 100644
index 0..16b78bf1e7e17
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr91005.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s
+
+define void @PR91005(ptr %0) minsize {
+; CHECK-LABEL: PR91005:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:xorl %eax, %eax
+; CHECK-NEXT:testb %al, %al
+; CHECK-NEXT:je .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpextrw $0, %xmm0, %eax
+; CHECK-NEXT:movzwl %ax, %eax
+; CHECK-NEXT:vmovd %eax, %xmm0
+; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0
+; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0
+; CHECK-NEXT:vmovd %xmm0, %eax
+; CHECK-NEXT:movw %ax, (%rdi)
+; CHECK-NEXT:  .LBB0_2: # %common.ret
+; CHECK-NEXT:retq
+  %2 = bitcast <2 x half> poison to <2 x i16>
+  %3 = icmp eq <2 x i16> %2, 
+  br i1 poison, label %4, label %common.ret
+
+common.ret:   ; preds = %4, %1
+  ret void
+
+4:; preds = %1
+  %5 = select <2 x i1> %3, <2 x half> , <2 x half> 
zeroinitializer
+  %6 = fmul <2 x half> %5, zeroinitializer
+  %7 = fsub <2 x half> %6, zeroinitializer
+  %8 = extractelement <2 x half> %7, i64 0
+  store half %8, ptr %0, align 2
+  br label %common.ret
+}
+
+declare <2 x half> @llvm.fabs.v2f16(<2 x half>)

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91425

>From 2fc32a278e4fd46c6dd085845e69e84c321a3f75 Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Mon, 6 May 2024 10:59:44 +0800
Subject: [PATCH 1/2] [X86][FP16] Do not create VBROADCAST_LOAD for f16 without
 AVX2 (#91125)

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes #91005
---
 llvm/lib/Target/X86/X86ISelLowering.cpp |  2 +-
 llvm/test/CodeGen/X86/pr91005.ll| 39 +
 2 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/X86/pr91005.ll

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index c572b27fe401e..3e4ecab8443a9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7295,7 +7295,7 @@ static SDValue 
lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp,
 // With pattern matching, the VBROADCAST node may become a VMOVDDUP.
 if (ScalarSize == 32 ||
 (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) ||
-CVT == MVT::f16 ||
+(CVT == MVT::f16 && Subtarget.hasAVX2()) ||
 (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2( {
   const Constant *C = nullptr;
   if (ConstantSDNode *CI = dyn_cast(Ld))
diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll
new file mode 100644
index 0..97fd1ce456882
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr91005.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 4
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s
+
+define void @PR91005(ptr %0) minsize {
+; CHECK-LABEL: PR91005:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:xorl %eax, %eax
+; CHECK-NEXT:testb %al, %al
+; CHECK-NEXT:je .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:vbroadcastss {{.*#+}} xmm0 = [31744,31744,31744,31744]
+; CHECK-NEXT:vpcmpeqw %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:vpand %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0
+; CHECK-NEXT:vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0
+; CHECK-NEXT:vmovd %xmm0, %eax
+; CHECK-NEXT:movw %ax, (%rdi)
+; CHECK-NEXT:  .LBB0_2: # %common.ret
+; CHECK-NEXT:retq
+  %2 = bitcast <2 x half> poison to <2 x i16>
+  %3 = icmp eq <2 x i16> %2, 
+  br i1 poison, label %4, label %common.ret
+
+common.ret:   ; preds = %4, %1
+  ret void
+
+4:; preds = %1
+  %5 = select <2 x i1> %3, <2 x half> , <2 x half> 
zeroinitializer
+  %6 = fmul <2 x half> %5, zeroinitializer
+  %7 = fsub <2 x half> %6, zeroinitializer
+  %8 = extractelement <2 x half> %7, i64 0
+  store half %8, ptr %0, align 2
+  br label %common.ret
+}
+
+declare <2 x half> @llvm.fabs.v2f16(<2 x half>)

>From 4d284b853f26a6cb848028720163561cabf63d95 Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Wed, 8 May 2024 10:59:31 +0800
Subject: [PATCH 2/2] Fix difference with LLVM 18 release

---
 llvm/test/CodeGen/X86/pr91005.ll | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll
index 97fd1ce456882..16b78bf1e7e17 100644
--- a/llvm/test/CodeGen/X86/pr91005.ll
+++ b/llvm/test/CodeGen/X86/pr91005.ll
@@ -8,12 +8,13 @@ define void @PR91005(ptr %0) minsize {
 ; CHECK-NEXT:testb %al, %al
 ; CHECK-NEXT:je .LBB0_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:vbroadcastss {{.*#+}} xmm0 = [31744,31744,31744,31744]
-; CHECK-NEXT:vpcmpeqw %xmm0, %xmm0, %xmm0
-; CHECK-NEXT:vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
-; CHECK-NEXT:vpand %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:vpcmpeqw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:vpextrw $0, %xmm0, %eax
+; CHECK-NEXT:movzwl %ax, %eax
+; CHECK-NEXT:vmovd %eax, %xmm0
 ; CHECK-NEXT:vcvtph2ps %xmm0, %xmm0
-; CHECK-NEXT:vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:vxorps %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:vmulss %xmm1, %xmm0, %xmm0
 ; CHECK-NEXT:vcvtps2ph $4, %xmm0, %xmm0
 ; CHECK-NEXT:vmovd %xmm0, %eax

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [AArch64][GISEL] Consider fcmp true and fcmp false in cond code selection (#86972) (PR #91126)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


tstellar wrote:

@marcauberer You can just create manually create a pull request against the 
release/18.x branch with the fixes.

https://github.com/llvm/llvm-project/pull/91126
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) (PR #91118)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91118
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 047cd91 - [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Phoebe Wang
Date: 2024-05-08T20:10:38-07:00
New Revision: 047cd915b86a4f35543ad4e691953aaa5a91c4fe

URL: 
https://github.com/llvm/llvm-project/commit/047cd915b86a4f35543ad4e691953aaa5a91c4fe
DIFF: 
https://github.com/llvm/llvm-project/commit/047cd915b86a4f35543ad4e691953aaa5a91c4fe.diff

LOG: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns 
(#91106)

With KNL/KNC being deprecated, we don't need to care about such no VLX
cases anymore. We may remove such patterns in the future.

Fixes #90844

(cherry picked from commit 7963d9a2b3c20561278a85b19e156e013231342c)

Added: 
llvm/test/CodeGen/X86/pr90844.ll

Modified: 
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86InstrAVX512.td

Removed: 




diff  --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 71fc6b5047eaa..c572b27fe401e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -29841,7 +29841,9 @@ static SDValue LowerRotate(SDValue Op, const 
X86Subtarget ,
 return R;
 
   // AVX512 implicitly uses modulo rotation amounts.
-  if (Subtarget.hasAVX512() && 32 <= EltSizeInBits) {
+  if ((Subtarget.hasVLX() ||
+   (Subtarget.hasAVX512() && Subtarget.hasEVEX512())) &&
+  32 <= EltSizeInBits) {
 // Attempt to rotate by immediate.
 if (IsCstSplat) {
   unsigned RotOpc = IsROTL ? X86ISD::VROTLI : X86ISD::VROTRI;

diff  --git a/llvm/lib/Target/X86/X86InstrAVX512.td 
b/llvm/lib/Target/X86/X86InstrAVX512.td
index bb5e22c714279..0564f2167d8ee 100644
--- a/llvm/lib/Target/X86/X86InstrAVX512.td
+++ b/llvm/lib/Target/X86/X86InstrAVX512.td
@@ -814,7 +814,7 @@ defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", 
v32f16_info, v16f16x_info,
 
 // A 128-bit extract from bits [255:128] of a 512-bit vector should use a
 // smaller extract to enable EVEX->VEX.
-let Predicates = [NoVLX] in {
+let Predicates = [NoVLX, HasEVEX512] in {
 def : Pat<(v2i64 (extract_subvector (v8i64 VR512:$src), (iPTR 2))),
   (v2i64 (VEXTRACTI128rr
   (v4i64 (EXTRACT_SUBREG (v8i64 VR512:$src), sub_ymm)),
@@ -3068,7 +3068,7 @@ def : Pat<(Narrow.KVT (and Narrow.KRC:$mask,
addr:$src2, (X86cmpm_imm_commute timm:$cc)), Narrow.KRC)>;
 }
 
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
 
@@ -3099,7 +3099,7 @@ let Predicates = [HasAVX512, NoVLX] in {
   defm : axv512_cmp_packed_cc_no_vlx_lowering<"VCMPPD", v2f64x_info, 
v8f64_info>;
 }
 
-let Predicates = [HasBWI, NoVLX] in {
+let Predicates = [HasBWI, NoVLX, HasEVEX512] in {
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
 
@@ -3493,7 +3493,7 @@ multiclass mask_move_lowering;
   defm : mask_move_lowering<"VMOVDQA32Z", v4i32x_info, v16i32_info>;
   defm : mask_move_lowering<"VMOVAPSZ", v8f32x_info, v16f32_info>;
@@ -3505,7 +3505,7 @@ let Predicates = [HasAVX512, NoVLX] in {
   defm : mask_move_lowering<"VMOVDQA64Z", v4i64x_info, v8i64_info>;
 }
 
-let Predicates = [HasBWI, NoVLX] in {
+let Predicates = [HasBWI, NoVLX, HasEVEX512] in {
   defm : mask_move_lowering<"VMOVDQU8Z", v16i8x_info, v64i8_info>;
   defm : mask_move_lowering<"VMOVDQU8Z", v32i8x_info, v64i8_info>;
 
@@ -4998,8 +4998,8 @@ defm VPMINUD : avx512_binop_rm_vl_d<0x3B, "vpminud", umin,
 defm VPMINUQ : avx512_binop_rm_vl_q<0x3B, "vpminuq", umin,
 SchedWriteVecALU, HasAVX512, 1>, T8;
 
-// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX.
-let Predicates = [HasDQI, NoVLX] in {
+// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX, 
HasEVEX512.
+let Predicates = [HasDQI, NoVLX, HasEVEX512] in {
   def : Pat<(v4i64 (mul (v4i64 VR256X:$src1), (v4i64 VR256X:$src2))),
 (EXTRACT_SUBREG
 (VPMULLQZrr
@@ -5055,7 +5055,7 @@ multiclass avx512_min_max_lowering {
  sub_xmm)>;
 }
 
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   defm : avx512_min_max_lowering<"VPMAXUQZ", umax>;
   defm : avx512_min_max_lowering<"VPMINUQZ", umin>;
   defm : avx512_min_max_lowering<"VPMAXSQZ", smax>;
@@ -6032,7 +6032,7 @@ defm VPSRL : avx512_shift_types<0xD2, 0xD3, 0xD1, 
"vpsrl", X86vsrl,
 SchedWriteVecShift>;
 
 // Use 512bit VPSRA/VPSRAI version to implement v2i64/v4i64 in case NoVLX.
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   def : Pat<(v4i64 (X86vsra (v4i64 VR256X:$src1), (v2i64 VR128X:$src2))),
 (EXTRACT_SUBREG (v8i64
   (VPSRAQZrr
@@ -6161,14 +6161,14 @@ defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", 
X86vsrlv, SchedWriteVarVecS
 defm VPRORV :

[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) (PR #91118)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91118

>From 047cd915b86a4f35543ad4e691953aaa5a91c4fe Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Sun, 5 May 2024 18:40:27 +0800
Subject: [PATCH] [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit
 patterns (#91106)

With KNL/KNC being deprecated, we don't need to care about such no VLX
cases anymore. We may remove such patterns in the future.

Fixes #90844

(cherry picked from commit 7963d9a2b3c20561278a85b19e156e013231342c)
---
 llvm/lib/Target/X86/X86ISelLowering.cpp |  4 ++-
 llvm/lib/Target/X86/X86InstrAVX512.td   | 42 -
 llvm/test/CodeGen/X86/pr90844.ll| 19 +++
 3 files changed, 43 insertions(+), 22 deletions(-)
 create mode 100644 llvm/test/CodeGen/X86/pr90844.ll

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 71fc6b5047eaa..c572b27fe401e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -29841,7 +29841,9 @@ static SDValue LowerRotate(SDValue Op, const 
X86Subtarget ,
 return R;
 
   // AVX512 implicitly uses modulo rotation amounts.
-  if (Subtarget.hasAVX512() && 32 <= EltSizeInBits) {
+  if ((Subtarget.hasVLX() ||
+   (Subtarget.hasAVX512() && Subtarget.hasEVEX512())) &&
+  32 <= EltSizeInBits) {
 // Attempt to rotate by immediate.
 if (IsCstSplat) {
   unsigned RotOpc = IsROTL ? X86ISD::VROTLI : X86ISD::VROTRI;
diff --git a/llvm/lib/Target/X86/X86InstrAVX512.td 
b/llvm/lib/Target/X86/X86InstrAVX512.td
index bb5e22c714279..0564f2167d8ee 100644
--- a/llvm/lib/Target/X86/X86InstrAVX512.td
+++ b/llvm/lib/Target/X86/X86InstrAVX512.td
@@ -814,7 +814,7 @@ defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", 
v32f16_info, v16f16x_info,
 
 // A 128-bit extract from bits [255:128] of a 512-bit vector should use a
 // smaller extract to enable EVEX->VEX.
-let Predicates = [NoVLX] in {
+let Predicates = [NoVLX, HasEVEX512] in {
 def : Pat<(v2i64 (extract_subvector (v8i64 VR512:$src), (iPTR 2))),
   (v2i64 (VEXTRACTI128rr
   (v4i64 (EXTRACT_SUBREG (v8i64 VR512:$src), sub_ymm)),
@@ -3068,7 +3068,7 @@ def : Pat<(Narrow.KVT (and Narrow.KRC:$mask,
addr:$src2, (X86cmpm_imm_commute timm:$cc)), Narrow.KRC)>;
 }
 
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
 
@@ -3099,7 +3099,7 @@ let Predicates = [HasAVX512, NoVLX] in {
   defm : axv512_cmp_packed_cc_no_vlx_lowering<"VCMPPD", v2f64x_info, 
v8f64_info>;
 }
 
-let Predicates = [HasBWI, NoVLX] in {
+let Predicates = [HasBWI, NoVLX, HasEVEX512] in {
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
   defm : axv512_icmp_packed_cc_no_vlx_lowering;
 
@@ -3493,7 +3493,7 @@ multiclass mask_move_lowering;
   defm : mask_move_lowering<"VMOVDQA32Z", v4i32x_info, v16i32_info>;
   defm : mask_move_lowering<"VMOVAPSZ", v8f32x_info, v16f32_info>;
@@ -3505,7 +3505,7 @@ let Predicates = [HasAVX512, NoVLX] in {
   defm : mask_move_lowering<"VMOVDQA64Z", v4i64x_info, v8i64_info>;
 }
 
-let Predicates = [HasBWI, NoVLX] in {
+let Predicates = [HasBWI, NoVLX, HasEVEX512] in {
   defm : mask_move_lowering<"VMOVDQU8Z", v16i8x_info, v64i8_info>;
   defm : mask_move_lowering<"VMOVDQU8Z", v32i8x_info, v64i8_info>;
 
@@ -4998,8 +4998,8 @@ defm VPMINUD : avx512_binop_rm_vl_d<0x3B, "vpminud", umin,
 defm VPMINUQ : avx512_binop_rm_vl_q<0x3B, "vpminuq", umin,
 SchedWriteVecALU, HasAVX512, 1>, T8;
 
-// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX.
-let Predicates = [HasDQI, NoVLX] in {
+// PMULLQ: Use 512bit version to implement 128/256 bit in case NoVLX, 
HasEVEX512.
+let Predicates = [HasDQI, NoVLX, HasEVEX512] in {
   def : Pat<(v4i64 (mul (v4i64 VR256X:$src1), (v4i64 VR256X:$src2))),
 (EXTRACT_SUBREG
 (VPMULLQZrr
@@ -5055,7 +5055,7 @@ multiclass avx512_min_max_lowering {
  sub_xmm)>;
 }
 
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   defm : avx512_min_max_lowering<"VPMAXUQZ", umax>;
   defm : avx512_min_max_lowering<"VPMINUQZ", umin>;
   defm : avx512_min_max_lowering<"VPMAXSQZ", smax>;
@@ -6032,7 +6032,7 @@ defm VPSRL : avx512_shift_types<0xD2, 0xD3, 0xD1, 
"vpsrl", X86vsrl,
 SchedWriteVecShift>;
 
 // Use 512bit VPSRA/VPSRAI version to implement v2i64/v4i64 in case NoVLX.
-let Predicates = [HasAVX512, NoVLX] in {
+let Predicates = [HasAVX512, NoVLX, HasEVEX512] in {
   def : Pat<(v4i64 (X86vsra (v4i64 VR256X:$src1), (v2i64 VR128X:$src2))),
 (EXTRACT_SUBREG (v8i64
   (VPSRAQZrr
@@ -6161,14 +6161,14 @@ defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", 
X86vsrlv, SchedWriteVarVecS
 defm VPRORV : avx512_var_shift_types<0x14, "vprorv", rotr,

[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/90719
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 58e44d3 - [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: David Stuttard
Date: 2024-05-08T20:08:59-07:00
New Revision: 58e44d3c6f67d5402ec38913d4262b94e73ac123

URL: 
https://github.com/llvm/llvm-project/commit/58e44d3c6f67d5402ec38913d4262b94e73ac123
DIFF: 
https://github.com/llvm/llvm-project/commit/58e44d3c6f67d5402ec38913d4262b94e73ac123.diff

LOG: [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
llvm/lib/Target/AMDGPU/SIInstrInfo.h
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1d..7a3198612f86f 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1832,7 +1832,7 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr ,
   // not, we need to ensure the subtarget is capable of backing off barrier
   // instructions in case there are any outstanding memory operations that may
   // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here.
-  if (MI.getOpcode() == AMDGPU::S_BARRIER &&
+  if (TII->isBarrierStart(MI.getOpcode()) &&
   !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
 Wait = Wait.combined(
 AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt()));

diff  --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1c9dacc09f815..626d903c0c695 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform;
   }
 
+  // Check to see if opcode is for a barrier start. Pre gfx12 this is just the
+  // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we 
want
+  // to check for the barrier start (S_BARRIER_SIGNAL*)
+  bool isBarrierStart(unsigned Opcode) const {
+return Opcode == AMDGPU::S_BARRIER ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM;
+  }
+
   static bool doesNotReadTiedSource(const MachineInstr ) {
 return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
   }

diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
index a7d3115af29bf..47c021769aa56 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
@@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT4-NEXT:s_wait_kmcnt 0x0
 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2
 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1]
+; VARIANT4-NEXT:s_wait_storecnt 0x0
 ; VARIANT4-NEXT:s_barrier_signal -1
 ; VARIANT4-NEXT:s_barrier_wait -1
 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1
@@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0
 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0
 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1]
+; VARIANT6-NEXT:s_wait_storecnt 0x0
 ; VARIANT6-NEXT:s_barrier_signal -1
 ; VARIANT6-NEXT:s_barrier_wait -1
 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1

diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
index 4ab5e97964a85..38a34ec6daf73 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
@@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GCN-NEXT:s_wait_kmcnt 0x0
 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1]
+; GCN-NEXT:s_wait_storecnt 0x0
 ; GCN-NEXT:s_barrier_signal -1
 ; GCN-NEXT:s_barrier_wait -1
 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1]
@@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0
 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v2, s[0:1]
+; GLOBAL-ISEL-NEXT:s_wait_storecnt 0x0
 ; GLOBAL-ISEL-NEXT:s_barrier_signal -1
 ; GLOBAL-ISEL-NEXT:

[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/90719

>From 58e44d3c6f67d5402ec38913d4262b94e73ac123 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Wed, 1 May 2024 11:37:13 +0100
Subject: [PATCH] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12
 (#90595)

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  | 11 ++
 .../CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll   |  2 ++
 .../AMDGPU/llvm.amdgcn.s.barrier.wait.ll  | 22 +++
 4 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1d..7a3198612f86f 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1832,7 +1832,7 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr ,
   // not, we need to ensure the subtarget is capable of backing off barrier
   // instructions in case there are any outstanding memory operations that may
   // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here.
-  if (MI.getOpcode() == AMDGPU::S_BARRIER &&
+  if (TII->isBarrierStart(MI.getOpcode()) &&
   !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
 Wait = Wait.combined(
 AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt()));
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1c9dacc09f815..626d903c0c695 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform;
   }
 
+  // Check to see if opcode is for a barrier start. Pre gfx12 this is just the
+  // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we 
want
+  // to check for the barrier start (S_BARRIER_SIGNAL*)
+  bool isBarrierStart(unsigned Opcode) const {
+return Opcode == AMDGPU::S_BARRIER ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM;
+  }
+
   static bool doesNotReadTiedSource(const MachineInstr ) {
 return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
index a7d3115af29bf..47c021769aa56 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
@@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT4-NEXT:s_wait_kmcnt 0x0
 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2
 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1]
+; VARIANT4-NEXT:s_wait_storecnt 0x0
 ; VARIANT4-NEXT:s_barrier_signal -1
 ; VARIANT4-NEXT:s_barrier_wait -1
 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1
@@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0
 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0
 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1]
+; VARIANT6-NEXT:s_wait_storecnt 0x0
 ; VARIANT6-NEXT:s_barrier_signal -1
 ; VARIANT6-NEXT:s_barrier_wait -1
 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
index 4ab5e97964a85..38a34ec6daf73 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
@@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GCN-NEXT:s_wait_kmcnt 0x0
 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1]
+; GCN-NEXT:s_wait_storecnt 0x0
 ; GCN-NEXT:s_barrier_signal -1
 ; GCN-NEXT:s_barrier_wait -1
 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1]
@@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0
 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v2, s[0:1]
+; GLOBAL-ISEL-NEXT:s_wait_storecnt 0x0
 ; GLOBAL-ISEL-NEXT:s_barrier_signal -1
 ; GLOBAL-ISEL-NEXT:s_barrier_wait -1
 ; GLOBAL-ISEL-NEXT:global_store_b32 v3, v0, s[0:1]
@@ -56,6 +58,7 @@ define amdgpu_kernel

[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91095
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] ce88e86 - [CMake][Release] Refactor cache file and use two stages for non-PGO builds (#89812)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Tom Stellard
Date: 2024-05-08T19:47:50-07:00
New Revision: ce88e86e428be7eea517201ddee8d62150ae8de4

URL: 
https://github.com/llvm/llvm-project/commit/ce88e86e428be7eea517201ddee8d62150ae8de4
DIFF: 
https://github.com/llvm/llvm-project/commit/ce88e86e428be7eea517201ddee8d62150ae8de4.diff

LOG: [CMake][Release] Refactor cache file and use two stages for non-PGO builds 
(#89812)

Completely refactor the cache file to simplify it and remove unnecessary
variables. The main functional change here is that the non-PGO builds
now use two stages, so `ninja -C build stage2-package` can be used with
both PGO and non-PGO builds.

(cherry picked from commit 6473fbf2d68c8486d168f29afc35d3e8a6fabe69)

Added: 


Modified: 
clang/cmake/caches/Release.cmake

Removed: 




diff  --git a/clang/cmake/caches/Release.cmake 
b/clang/cmake/caches/Release.cmake
index fa972636553f1..c164d5497275f 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -1,95 +1,93 @@
 # Plain options configure the first build.
 # BOOTSTRAP_* options configure the second build.
 # BOOTSTRAP_BOOTSTRAP_* options configure the third build.
+# PGO Builds have 3 stages (stage1, stage2-instrumented, stage2)
+# non-PGO Builds have 2 stages (stage1, stage2)
 
-# General Options
+
+function (set_final_stage_var name value type)
+  if (LLVM_RELEASE_ENABLE_PGO)
+set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  else()
+set(BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  endif()
+endfunction()
+
+function (set_instrument_and_final_stage_var name value type)
+  # This sets the varaible for the final stage in non-PGO builds and in
+  # the stage2-instrumented stage for PGO builds.
+  set(BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  if (LLVM_RELEASE_ENABLE_PGO)
+# Set the variable in the final stage for PGO builds.
+set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  endif()
+endfunction()
+
+# General Options:
+# If you want to override any of the LLVM_RELEASE_* variables you can set them
+# on the command line via -D, but you need to do this before you pass this
+# cache file to CMake via -C. e.g.
+#
+# cmake -D LLVM_RELEASE_ENABLE_PGO=ON -C Release.cmake
 set(LLVM_RELEASE_ENABLE_LTO THIN CACHE STRING "")
 set(LLVM_RELEASE_ENABLE_PGO OFF CACHE BOOL "")
-
+set(LLVM_RELEASE_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" 
CACHE STRING "")
+set(LLVM_RELEASE_ENABLE_PROJECTS 
"clang;lld;lldb;clang-tools-extra;bolt;polly;mlir;flang" CACHE STRING "")
+# Note we don't need to add install here, since it is one of the pre-defined
+# steps.
+set(LLVM_RELEASE_FINAL_STAGE_TARGETS 
"clang;package;check-all;check-llvm;check-clang" CACHE STRING "")
 set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "")
 
-# Stage 1 Bootstrap Setup
+# Stage 1 Options
+set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")
+
+set(STAGE1_PROJECTS "clang")
+set(STAGE1_RUNTIMES "")
+
 if (LLVM_RELEASE_ENABLE_PGO)
+  list(APPEND STAGE1_PROJECTS "lld")
+  list(APPEND STAGE1_RUNTIMES "compiler-rt")
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
-stage2
 stage2-package
 stage2-clang
-stage2-distribution
 stage2-install
-stage2-install-distribution
-stage2-install-distribution-toolchain
 stage2-check-all
 stage2-check-llvm
-stage2-check-clang
-stage2-test-suite CACHE STRING "")
-else()
-  set(CLANG_BOOTSTRAP_TARGETS
-clang
-check-all
-check-llvm
-check-clang
-test-suite
-stage3
-stage3-clang
-stage3-check-all
-stage3-check-llvm
-stage3-check-clang
-stage3-install
-stage3-test-suite CACHE STRING "")
-endif()
+stage2-check-clang CACHE STRING "")
 
-# Stage 1 Options
-set(STAGE1_PROJECTS "clang")
-set(STAGE1_RUNTIMES "")
+  # Configuration for stage2-instrumented
+  set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "")
+  # This enables the build targets for the final stage which is called stage2.
+  set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS ${LLVM_RELEASE_FINAL_STAGE_TARGETS} 
CACHE STRING "")
+  set(BOOTSTRAP_LLVM_BUILD_INSTRUMENTED IR CACHE STRING "")
+  set(BOOTSTRAP_LLVM_ENABLE_RUNTIMES "compiler-rt" CACHE STRING "")
+  set(BOOTSTRAP_LLVM_ENABLE_PROJECTS "clang;lld" CACHE STRING "")
 
-if (LLVM_RELEASE_ENABLE_PGO)
-  list(APPEND STAGE1_PROJECTS "lld")
-  list(APPEND STAGE1_RUNTIMES "compiler-rt")
+else()
+  if (LLVM_RELEASE_ENABLE_LTO)
+list(APPEND STAGE1_PROJECTS "lld")
+  endif()
+  # Any targets added here will be given the target name stage2-${target}, so
+  # if you want to run them you can just use:
+  # ninja -C $BUILDDIR stage2-${target}
+  set(CLANG_BOOTSTRAP_TARGETS ${LLVM_RELEASE_FINAL_STAGE_TARGETS} CACHE STRING 
"")
 endif()
 
+# Stage 1 Common Config
 set(LLVM_ENABLE_RUNTIMES ${STAGE1_RUNTIMES} CACHE STRING "")
 set(LLVM_ENABLE_PROJECTS ${STAGE1_PROJECTS} CACHE STRING "")

[llvm-branch-commits] [clang] b7e2397 - [CMake][Release] Enable CMAKE_POSITION_INDEPENDENT_CODE (#90139)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Tom Stellard
Date: 2024-05-08T19:47:50-07:00
New Revision: b7e2397c54b7cddac8fa188e68073f78e895a57a

URL: 
https://github.com/llvm/llvm-project/commit/b7e2397c54b7cddac8fa188e68073f78e895a57a
DIFF: 
https://github.com/llvm/llvm-project/commit/b7e2397c54b7cddac8fa188e68073f78e895a57a.diff

LOG: [CMake][Release] Enable CMAKE_POSITION_INDEPENDENT_CODE (#90139)

Set this in the cache file directly instead of via the test-release.sh
script so that the release builds can be reproduced with just the cache
file.

(cherry picked from commit 53ff002c6f7ec64a75ab0990b1314cc6b4bb67cf)

Added: 


Modified: 
clang/cmake/caches/Release.cmake
llvm/utils/release/test-release.sh

Removed: 




diff  --git a/clang/cmake/caches/Release.cmake 
b/clang/cmake/caches/Release.cmake
index c164d5497275f..c0bfcbdfc1c2a 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -82,6 +82,7 @@ set(LLVM_ENABLE_PROJECTS ${STAGE1_PROJECTS} CACHE STRING "")
 # stage2-instrumented and Final Stage Config:
 # Options that need to be set in both the instrumented stage (if we are doing
 # a pgo build) and the final stage.
+set_instrument_and_final_stage_var(CMAKE_POSITION_INDEPENDENT_CODE "ON" STRING)
 set_instrument_and_final_stage_var(LLVM_ENABLE_LTO 
"${LLVM_RELEASE_ENABLE_LTO}" STRING)
 if (LLVM_RELEASE_ENABLE_LTO)
   set_instrument_and_final_stage_var(LLVM_ENABLE_LLD "ON" BOOL)

diff  --git a/llvm/utils/release/test-release.sh 
b/llvm/utils/release/test-release.sh
index 4314b565e11b0..050004aa08c49 100755
--- a/llvm/utils/release/test-release.sh
+++ b/llvm/utils/release/test-release.sh
@@ -353,8 +353,7 @@ function build_with_cmake_cache() {
   env CC="$c_compiler" CXX="$cxx_compiler" \
   cmake -G "$generator" -B $CMakeBuildDir -S $SrcDir/llvm \
 -C $SrcDir/clang/cmake/caches/Release.cmake \
-   
-DCLANG_BOOTSTRAP_PASSTHROUGH="CMAKE_POSITION_INDEPENDENT_CODE;LLVM_LIT_ARGS" \
--DCMAKE_POSITION_INDEPENDENT_CODE=ON \
+   -DCLANG_BOOTSTRAP_PASSTHROUGH="LLVM_LIT_ARGS" \
 -DLLVM_LIT_ARGS="-j $NumJobs $LitVerbose" \
 $ExtraConfigureFlags
 2>&1 | tee $LogDir/llvm.configure-$Flavor.log



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] f2c5a10 - [CMake][Release] Add stage2-package target (#89517)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


Author: Tom Stellard
Date: 2024-05-08T19:47:50-07:00
New Revision: f2c5a10e1f27768b031b8b54cb056fd4e261ad8f

URL: 
https://github.com/llvm/llvm-project/commit/f2c5a10e1f27768b031b8b54cb056fd4e261ad8f
DIFF: 
https://github.com/llvm/llvm-project/commit/f2c5a10e1f27768b031b8b54cb056fd4e261ad8f.diff

LOG: [CMake][Release] Add stage2-package target (#89517)

This target will be used to generate the release binary package for
uploading to GitHub.

(cherry picked from commit a38f201f1ec70c2b1f3cf46e7f291c53bb16753e)

Added: 


Modified: 
clang/cmake/caches/Release.cmake

Removed: 




diff  --git a/clang/cmake/caches/Release.cmake 
b/clang/cmake/caches/Release.cmake
index bd1f688d61a7e..fa972636553f1 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -14,6 +14,7 @@ if (LLVM_RELEASE_ENABLE_PGO)
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
 stage2
+stage2-package
 stage2-clang
 stage2-distribution
 stage2-install
@@ -57,6 +58,7 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "")
 set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS
   clang
+  package
   check-all
   check-llvm
   check-clang CACHE STRING "")



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91095

>From f2c5a10e1f27768b031b8b54cb056fd4e261ad8f Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Wed, 24 Apr 2024 07:47:42 -0700
Subject: [PATCH 1/7] [CMake][Release] Add stage2-package target (#89517)

This target will be used to generate the release binary package for
uploading to GitHub.

(cherry picked from commit a38f201f1ec70c2b1f3cf46e7f291c53bb16753e)
---
 clang/cmake/caches/Release.cmake | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake
index bd1f688d61a7e..fa972636553f1 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -14,6 +14,7 @@ if (LLVM_RELEASE_ENABLE_PGO)
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
 stage2
+stage2-package
 stage2-clang
 stage2-distribution
 stage2-install
@@ -57,6 +58,7 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "")
 set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS
   clang
+  package
   check-all
   check-llvm
   check-clang CACHE STRING "")

>From ce88e86e428be7eea517201ddee8d62150ae8de4 Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Thu, 25 Apr 2024 15:32:08 -0700
Subject: [PATCH 2/7] [CMake][Release] Refactor cache file and use two stages
 for non-PGO builds (#89812)

Completely refactor the cache file to simplify it and remove unnecessary
variables. The main functional change here is that the non-PGO builds
now use two stages, so `ninja -C build stage2-package` can be used with
both PGO and non-PGO builds.

(cherry picked from commit 6473fbf2d68c8486d168f29afc35d3e8a6fabe69)
---
 clang/cmake/caches/Release.cmake | 134 +++
 1 file changed, 66 insertions(+), 68 deletions(-)

diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake
index fa972636553f1..c164d5497275f 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -1,95 +1,93 @@
 # Plain options configure the first build.
 # BOOTSTRAP_* options configure the second build.
 # BOOTSTRAP_BOOTSTRAP_* options configure the third build.
+# PGO Builds have 3 stages (stage1, stage2-instrumented, stage2)
+# non-PGO Builds have 2 stages (stage1, stage2)
 
-# General Options
+
+function (set_final_stage_var name value type)
+  if (LLVM_RELEASE_ENABLE_PGO)
+set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  else()
+set(BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  endif()
+endfunction()
+
+function (set_instrument_and_final_stage_var name value type)
+  # This sets the varaible for the final stage in non-PGO builds and in
+  # the stage2-instrumented stage for PGO builds.
+  set(BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  if (LLVM_RELEASE_ENABLE_PGO)
+# Set the variable in the final stage for PGO builds.
+set(BOOTSTRAP_BOOTSTRAP_${name} ${value} CACHE ${type} "")
+  endif()
+endfunction()
+
+# General Options:
+# If you want to override any of the LLVM_RELEASE_* variables you can set them
+# on the command line via -D, but you need to do this before you pass this
+# cache file to CMake via -C. e.g.
+#
+# cmake -D LLVM_RELEASE_ENABLE_PGO=ON -C Release.cmake
 set(LLVM_RELEASE_ENABLE_LTO THIN CACHE STRING "")
 set(LLVM_RELEASE_ENABLE_PGO OFF CACHE BOOL "")
-
+set(LLVM_RELEASE_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" 
CACHE STRING "")
+set(LLVM_RELEASE_ENABLE_PROJECTS 
"clang;lld;lldb;clang-tools-extra;bolt;polly;mlir;flang" CACHE STRING "")
+# Note we don't need to add install here, since it is one of the pre-defined
+# steps.
+set(LLVM_RELEASE_FINAL_STAGE_TARGETS 
"clang;package;check-all;check-llvm;check-clang" CACHE STRING "")
 set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "")
 
-# Stage 1 Bootstrap Setup
+# Stage 1 Options
+set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")
+
+set(STAGE1_PROJECTS "clang")
+set(STAGE1_RUNTIMES "")
+
 if (LLVM_RELEASE_ENABLE_PGO)
+  list(APPEND STAGE1_PROJECTS "lld")
+  list(APPEND STAGE1_RUNTIMES "compiler-rt")
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
-stage2
 stage2-package
 stage2-clang
-stage2-distribution
 stage2-install
-stage2-install-distribution
-stage2-install-distribution-toolchain
 stage2-check-all
 stage2-check-llvm
-stage2-check-clang
-stage2-test-suite CACHE STRING "")
-else()
-  set(CLANG_BOOTSTRAP_TARGETS
-clang
-check-all
-check-llvm
-check-clang
-test-suite
-stage3
-stage3-clang
-stage3-check-all
-stage3-check-llvm
-stage3-check-clang
-stage3-install
-stage3-test-suite CACHE STRING "")
-endif()
+stage2-check-clang CACHE STRING "")
 
-# Stage 1 Options
-set(STAGE1_PROJECTS "clang")
-set(STAGE1_RUNTIMES "")
+  # Configuration for stage2-instrumented
+  set(BOOTSTRAP_CLANG_ENABLE_BOOTSTRAP ON CACHE STRING "")
+  # This

[llvm-branch-commits] [clang] [llvm] Backport some fixes for building the release binaries (PR #91095)

2024-05-08 Thread Tom Stellard via llvm-branch-commits



@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 5)
+  set(LLVM_VERSION_PATCH 6)

tstellar wrote:

I just merged this commit in another PR.

https://github.com/llvm/llvm-project/pull/91095
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Bump version to 18.1.6 (PR #91094)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/91094
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] dd3aa6d - Bump version to 18.1.6 (#91094)

2024-05-08 Thread via llvm-branch-commits


Author: Tom Stellard
Date: 2024-05-08T19:41:30-07:00
New Revision: dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0

URL: 
https://github.com/llvm/llvm-project/commit/dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0
DIFF: 
https://github.com/llvm/llvm-project/commit/dd3aa6d0e9a8355c14d86b4c607fa89b30c52ec0.diff

LOG: Bump version to 18.1.6 (#91094)

Added: 


Modified: 
llvm/CMakeLists.txt
llvm/utils/lit/lit/__init__.py

Removed: 




diff  --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index f82be164ac9c4..26b7b01bb1f8d 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 5)
+  set(LLVM_VERSION_PATCH 6)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)

diff  --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 1cfcc7d37813b..d8b0e3bd1c69e 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 5)
+__versioninfo__ = (18, 1, 6)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91550

>From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 16 Feb 2024 21:34:02 +
Subject: [PATCH 1/3] [workflows] Rework pre-commit CI for the release branch

This rewrites the pre-commit CI for the release branch so that it
behaves almost exactly like the current buildkite builders.  It builds
every project and uses a better filtering method for selecting which
projects to build.

In addition, with this change we drop the Linux and Windows test
configs, since these are already covered by buildkite and add a
config for macos/aarch64.
---
 .github/workflows/ci-tests.yml| 156 +
 .../compute-projects-to-test/action.yml   |  21 ++
 .../compute-projects-to-test.sh   | 221 ++
 .github/workflows/continue-timeout-job.yml|  75 ++
 .github/workflows/get-job-id/action.yml   |  30 +++
 .github/workflows/lld-tests.yml   |  38 ---
 .../workflows/pr-sccache-restore/action.yml   |  26 +++
 .github/workflows/pr-sccache-save/action.yml  |  50 
 .github/workflows/timeout-restore/action.yml  |  33 +++
 .github/workflows/timeout-save/action.yml |  94 
 .../unprivileged-download-artifact/action.yml |  77 ++
 11 files changed, 783 insertions(+), 38 deletions(-)
 create mode 100644 .github/workflows/ci-tests.yml
 create mode 100644 .github/workflows/compute-projects-to-test/action.yml
 create mode 100755 
.github/workflows/compute-projects-to-test/compute-projects-to-test.sh
 create mode 100644 .github/workflows/continue-timeout-job.yml
 create mode 100644 .github/workflows/get-job-id/action.yml
 delete mode 100644 .github/workflows/lld-tests.yml
 create mode 100644 .github/workflows/pr-sccache-restore/action.yml
 create mode 100644 .github/workflows/pr-sccache-save/action.yml
 create mode 100644 .github/workflows/timeout-restore/action.yml
 create mode 100644 .github/workflows/timeout-save/action.yml
 create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml

diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
new file mode 100644
index 0..e1d1c02755939
--- /dev/null
+++ b/.github/workflows/ci-tests.yml
@@ -0,0 +1,156 @@
+name: "CI Tests"
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+types:
+  - opened
+  - synchronize
+  - reopened
+  # When a PR is closed, we still start this workflow, but then skip
+  # all the jobs, which makes it effectively a no-op.  The reason to
+  # do this is that it allows us to take advantage of concurrency groups
+  # to cancel in progress CI jobs whenever the PR is closed.
+  - closed
+branches:
+  - 'release/**'
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
+  cancel-in-progress: True
+
+jobs:
+  compute-test-configs:
+name: "Compute Configurations to Test"
+if: >-
+  github.repository_owner == 'llvm' &&
+  github.event.action != 'closed'
+runs-on: ubuntu-22.04
+outputs:
+  projects: ${{ steps.vars.outputs.projects }}
+  check-targets: ${{ steps.vars.outputs.check-targets }}
+  test-build: ${{ steps.vars.outputs.check-targets != '' }}
+  test-platforms: ${{ steps.platforms.outputs.result }}
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+with:
+  fetch-depth: 2
+
+  - name: Compute projects to test
+id: vars
+uses: ./.github/workflows/compute-projects-to-test
+
+  - name: Compute platforms to test
+uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea 
#v7.0.1
+id: platforms
+with:
+  script: |
+linuxConfig = {
+  name: "linux-x86_64",
+  runs_on: "ubuntu-22.04"
+}
+windowsConfig = {
+  name: "windows-x86_64",
+  runs_on: "windows-2022"
+}
+macConfig = {
+  name: "macos-x86_64",
+  runs_on: "macos-13"
+}
+macArmConfig = {
+  name: "macos-aarch64",
+  runs_on: "macos-14"
+}
+
+configs = []
+
+const base_ref = process.env.GITHUB_BASE_REF;
+if (base_ref.startsWith('release/')) {
+  // This is a pull request against a release branch.
+  configs.push(macConfig)
+  configs.push(macArmConfig)
+}
+
+return configs;
+
+  ci-build-test:
+# If this job name is changed, then we need to update the job-name
+# paramater for the timeout-save step below.
+name: "Build"
+needs:
+  - compute-test-configs
+permissions:
+  actions: write #pr-sccache-save may delete artifacts.
+runs-on: ${{ matrix.runs_on }}
+strategy:
+  fail-fast: false
+

[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91550

>From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 16 Feb 2024 21:34:02 +
Subject: [PATCH 1/2] [workflows] Rework pre-commit CI for the release branch

This rewrites the pre-commit CI for the release branch so that it
behaves almost exactly like the current buildkite builders.  It builds
every project and uses a better filtering method for selecting which
projects to build.

In addition, with this change we drop the Linux and Windows test
configs, since these are already covered by buildkite and add a
config for macos/aarch64.
---
 .github/workflows/ci-tests.yml| 156 +
 .../compute-projects-to-test/action.yml   |  21 ++
 .../compute-projects-to-test.sh   | 221 ++
 .github/workflows/continue-timeout-job.yml|  75 ++
 .github/workflows/get-job-id/action.yml   |  30 +++
 .github/workflows/lld-tests.yml   |  38 ---
 .../workflows/pr-sccache-restore/action.yml   |  26 +++
 .github/workflows/pr-sccache-save/action.yml  |  50 
 .github/workflows/timeout-restore/action.yml  |  33 +++
 .github/workflows/timeout-save/action.yml |  94 
 .../unprivileged-download-artifact/action.yml |  77 ++
 11 files changed, 783 insertions(+), 38 deletions(-)
 create mode 100644 .github/workflows/ci-tests.yml
 create mode 100644 .github/workflows/compute-projects-to-test/action.yml
 create mode 100755 
.github/workflows/compute-projects-to-test/compute-projects-to-test.sh
 create mode 100644 .github/workflows/continue-timeout-job.yml
 create mode 100644 .github/workflows/get-job-id/action.yml
 delete mode 100644 .github/workflows/lld-tests.yml
 create mode 100644 .github/workflows/pr-sccache-restore/action.yml
 create mode 100644 .github/workflows/pr-sccache-save/action.yml
 create mode 100644 .github/workflows/timeout-restore/action.yml
 create mode 100644 .github/workflows/timeout-save/action.yml
 create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml

diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
new file mode 100644
index 0..e1d1c02755939
--- /dev/null
+++ b/.github/workflows/ci-tests.yml
@@ -0,0 +1,156 @@
+name: "CI Tests"
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+types:
+  - opened
+  - synchronize
+  - reopened
+  # When a PR is closed, we still start this workflow, but then skip
+  # all the jobs, which makes it effectively a no-op.  The reason to
+  # do this is that it allows us to take advantage of concurrency groups
+  # to cancel in progress CI jobs whenever the PR is closed.
+  - closed
+branches:
+  - 'release/**'
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
+  cancel-in-progress: True
+
+jobs:
+  compute-test-configs:
+name: "Compute Configurations to Test"
+if: >-
+  github.repository_owner == 'llvm' &&
+  github.event.action != 'closed'
+runs-on: ubuntu-22.04
+outputs:
+  projects: ${{ steps.vars.outputs.projects }}
+  check-targets: ${{ steps.vars.outputs.check-targets }}
+  test-build: ${{ steps.vars.outputs.check-targets != '' }}
+  test-platforms: ${{ steps.platforms.outputs.result }}
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+with:
+  fetch-depth: 2
+
+  - name: Compute projects to test
+id: vars
+uses: ./.github/workflows/compute-projects-to-test
+
+  - name: Compute platforms to test
+uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea 
#v7.0.1
+id: platforms
+with:
+  script: |
+linuxConfig = {
+  name: "linux-x86_64",
+  runs_on: "ubuntu-22.04"
+}
+windowsConfig = {
+  name: "windows-x86_64",
+  runs_on: "windows-2022"
+}
+macConfig = {
+  name: "macos-x86_64",
+  runs_on: "macos-13"
+}
+macArmConfig = {
+  name: "macos-aarch64",
+  runs_on: "macos-14"
+}
+
+configs = []
+
+const base_ref = process.env.GITHUB_BASE_REF;
+if (base_ref.startsWith('release/')) {
+  // This is a pull request against a release branch.
+  configs.push(macConfig)
+  configs.push(macArmConfig)
+}
+
+return configs;
+
+  ci-build-test:
+# If this job name is changed, then we need to update the job-name
+# paramater for the timeout-save step below.
+name: "Build"
+needs:
+  - compute-test-configs
+permissions:
+  actions: write #pr-sccache-save may delete artifacts.
+runs-on: ${{ matrix.runs_on }}
+strategy:
+  fail-fast: false
+

[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/91550

>From 8ea4c39bef000973979cc75a39006e5f87481ee2 Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 16 Feb 2024 21:34:02 +
Subject: [PATCH] [workflows] Rework pre-commit CI for the release branch

This rewrites the pre-commit CI for the release branch so that it
behaves almost exactly like the current buildkite builders.  It builds
every project and uses a better filtering method for selecting which
projects to build.

In addition, with this change we drop the Linux and Windows test
configs, since these are already covered by buildkite and add a
config for macos/aarch64.
---
 .github/workflows/ci-tests.yml| 156 +
 .../compute-projects-to-test/action.yml   |  21 ++
 .../compute-projects-to-test.sh   | 221 ++
 .github/workflows/continue-timeout-job.yml|  75 ++
 .github/workflows/get-job-id/action.yml   |  30 +++
 .github/workflows/lld-tests.yml   |  38 ---
 .../workflows/pr-sccache-restore/action.yml   |  26 +++
 .github/workflows/pr-sccache-save/action.yml  |  50 
 .github/workflows/timeout-restore/action.yml  |  33 +++
 .github/workflows/timeout-save/action.yml |  94 
 .../unprivileged-download-artifact/action.yml |  77 ++
 11 files changed, 783 insertions(+), 38 deletions(-)
 create mode 100644 .github/workflows/ci-tests.yml
 create mode 100644 .github/workflows/compute-projects-to-test/action.yml
 create mode 100755 
.github/workflows/compute-projects-to-test/compute-projects-to-test.sh
 create mode 100644 .github/workflows/continue-timeout-job.yml
 create mode 100644 .github/workflows/get-job-id/action.yml
 delete mode 100644 .github/workflows/lld-tests.yml
 create mode 100644 .github/workflows/pr-sccache-restore/action.yml
 create mode 100644 .github/workflows/pr-sccache-save/action.yml
 create mode 100644 .github/workflows/timeout-restore/action.yml
 create mode 100644 .github/workflows/timeout-save/action.yml
 create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml

diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
new file mode 100644
index 0..e1d1c02755939
--- /dev/null
+++ b/.github/workflows/ci-tests.yml
@@ -0,0 +1,156 @@
+name: "CI Tests"
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+types:
+  - opened
+  - synchronize
+  - reopened
+  # When a PR is closed, we still start this workflow, but then skip
+  # all the jobs, which makes it effectively a no-op.  The reason to
+  # do this is that it allows us to take advantage of concurrency groups
+  # to cancel in progress CI jobs whenever the PR is closed.
+  - closed
+branches:
+  - 'release/**'
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
+  cancel-in-progress: True
+
+jobs:
+  compute-test-configs:
+name: "Compute Configurations to Test"
+if: >-
+  github.repository_owner == 'llvm' &&
+  github.event.action != 'closed'
+runs-on: ubuntu-22.04
+outputs:
+  projects: ${{ steps.vars.outputs.projects }}
+  check-targets: ${{ steps.vars.outputs.check-targets }}
+  test-build: ${{ steps.vars.outputs.check-targets != '' }}
+  test-platforms: ${{ steps.platforms.outputs.result }}
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+with:
+  fetch-depth: 2
+
+  - name: Compute projects to test
+id: vars
+uses: ./.github/workflows/compute-projects-to-test
+
+  - name: Compute platforms to test
+uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea 
#v7.0.1
+id: platforms
+with:
+  script: |
+linuxConfig = {
+  name: "linux-x86_64",
+  runs_on: "ubuntu-22.04"
+}
+windowsConfig = {
+  name: "windows-x86_64",
+  runs_on: "windows-2022"
+}
+macConfig = {
+  name: "macos-x86_64",
+  runs_on: "macos-13"
+}
+macArmConfig = {
+  name: "macos-aarch64",
+  runs_on: "macos-14"
+}
+
+configs = []
+
+const base_ref = process.env.GITHUB_BASE_REF;
+if (base_ref.startsWith('release/')) {
+  // This is a pull request against a release branch.
+  configs.push(macConfig)
+  configs.push(macArmConfig)
+}
+
+return configs;
+
+  ci-build-test:
+# If this job name is changed, then we need to update the job-name
+# paramater for the timeout-save step below.
+name: "Build"
+needs:
+  - compute-test-configs
+permissions:
+  actions: write #pr-sccache-save may delete artifacts.
+runs-on: ${{ matrix.runs_on }}
+strategy:
+  fail-fast: false
+

[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-github-workflow

Author: Tom Stellard (tstellar)


Changes

This rewrites the pre-commit CI for the release branch so that it behaves 
almost exactly like the current buildkite builders.  It builds every project 
and uses a better filtering method for selecting which projects to build.

In addition, with this change we drop the Linux and Windows test configs, since 
these are already covered by buildkite and add a config for macos/aarch64.

---

Patch is 25.87 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/91550.diff


10 Files Affected:

- (added) .github/workflows/ci-tests.yml (+154) 
- (added) .github/workflows/compute-projects-to-test/action.yml (+21) 
- (added) 
.github/workflows/compute-projects-to-test/compute-projects-to-test.sh (+221) 
- (added) .github/workflows/continue-timeout-job.yml (+75) 
- (added) .github/workflows/get-job-id/action.yml (+30) 
- (added) .github/workflows/pr-sccache-restore/action.yml (+26) 
- (added) .github/workflows/pr-sccache-save/action.yml (+50) 
- (added) .github/workflows/timeout-restore/action.yml (+33) 
- (added) .github/workflows/timeout-save/action.yml (+94) 
- (added) .github/workflows/unprivileged-download-artifact/action.yml (+77) 


``diff
diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
new file mode 100644
index 0..22e39174abee7
--- /dev/null
+++ b/.github/workflows/ci-tests.yml
@@ -0,0 +1,154 @@
+name: "CI Tests"
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+types:
+  - opened
+  - synchronize
+  - reopened
+  # When a PR is closed, we still start this workflow, but then skip
+  # all the jobs, which makes it effectively a no-op.  The reason to
+  # do this is that it allows us to take advantage of concurrency groups
+  # to cancel in progress CI jobs whenever the PR is closed.
+  - closed
+branches:
+  - main
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
+  cancel-in-progress: True
+
+jobs:
+  compute-test-configs:
+name: "Compute Configurations to Test"
+if: github.event.action != 'closed'
+runs-on: ubuntu-22.04
+outputs:
+  projects: ${{ steps.vars.outputs.projects }}
+  check-targets: ${{ steps.vars.outputs.check-targets }}
+  test-build: ${{ steps.vars.outputs.check-targets != '' }}
+  test-platforms: ${{ steps.platforms.outputs.result }}
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+with:
+  fetch-depth: 2
+
+  - name: Compute projects to test
+id: vars
+uses: ./.github/workflows/compute-projects-to-test
+
+  - name: Compute platforms to test
+uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea 
#v7.0.1
+id: platforms
+with:
+  script: |
+linuxConfig = {
+  name: "linux-x86_64",
+  runs_on: "ubuntu-22.04"
+}
+windowsConfig = {
+  name: "windows-x86_64",
+  runs_on: "windows-2022"
+}
+macConfig = {
+  name: "macos-x86_64",
+  runs_on: "macos-13"
+}
+macArmConfig = {
+  name: "macos-aarch64",
+  runs_on: "macos-14"
+}
+
+configs = []
+
+const base_ref = process.env.GITHUB_BASE_REF;
+if (base_ref.startsWith('release/')) {
+  // This is a pull request against a release branch.
+  configs.push(macConfig)
+  configs.push(macArmConfig)
+}
+
+return configs;
+
+  ci-build-test:
+# If this job name is changed, then we need to update the job-name
+# paramater for the timeout-save step below.
+name: "Build"
+needs:
+  - compute-test-configs
+permissions:
+  actions: write #pr-sccache-save may delete artifacts.
+runs-on: ${{ matrix.runs_on }}
+strategy:
+  fail-fast: false
+  matrix:
+include: ${{ 
fromJson(needs.compute-test-configs.outputs.test-platforms) }}
+if: needs.compute-test-configs.outputs.test-build == 'true'
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+
+  - name: Timeout Restore
+id: timeout
+uses: ./.github/workflows/timeout-restore
+with:
+  artifact-name-suffix: ${{ matrix.name }}
+
+  - name: Setup Windows
+uses: llvm/actions/setup-windows@main
+if: ${{ runner.os == 'Windows' }}
+with:
+  arch: amd64
+
+  - name: Install Ninja
+uses: llvm/actions/install-ninja@main
+
+  - name: Setup sccache
+uses: hendrikmuhs/ccache-action@v1
+with:
+  max-size: 2G
+  variant: sccache
+  key: ci-${{ matrix.name }}
+
+  - name: Restore sccache from previous PR run
+

[llvm-branch-commits] [llvm] [workflows] Rework pre-commit CI for the release branch (PR #91550)

2024-05-08 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar created 
https://github.com/llvm/llvm-project/pull/91550

This rewrites the pre-commit CI for the release branch so that it behaves 
almost exactly like the current buildkite builders.  It builds every project 
and uses a better filtering method for selecting which projects to build.

In addition, with this change we drop the Linux and Windows test configs, since 
these are already covered by buildkite and add a config for macos/aarch64.

>From a590088cbdf37d3c4d274c5ab9d6d4e4de9c922c Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 16 Feb 2024 21:34:02 +
Subject: [PATCH] [workflows] Rework pre-commit CI for the release branch

This rewrites the pre-commit CI for the release branch so that it
behaves almost exactly like the current buildkite builders.  It builds
every project and uses a better filtering method for selecting which
projects to build.

In addition, with this change we drop the Linux and Windows test
configs, since these are already covered by buildkite and add a
config for macos/aarch64.
---
 .github/workflows/ci-tests.yml| 154 
 .../compute-projects-to-test/action.yml   |  21 ++
 .../compute-projects-to-test.sh   | 221 ++
 .github/workflows/continue-timeout-job.yml|  75 ++
 .github/workflows/get-job-id/action.yml   |  30 +++
 .../workflows/pr-sccache-restore/action.yml   |  26 +++
 .github/workflows/pr-sccache-save/action.yml  |  50 
 .github/workflows/timeout-restore/action.yml  |  33 +++
 .github/workflows/timeout-save/action.yml |  94 
 .../unprivileged-download-artifact/action.yml |  77 ++
 10 files changed, 781 insertions(+)
 create mode 100644 .github/workflows/ci-tests.yml
 create mode 100644 .github/workflows/compute-projects-to-test/action.yml
 create mode 100755 
.github/workflows/compute-projects-to-test/compute-projects-to-test.sh
 create mode 100644 .github/workflows/continue-timeout-job.yml
 create mode 100644 .github/workflows/get-job-id/action.yml
 create mode 100644 .github/workflows/pr-sccache-restore/action.yml
 create mode 100644 .github/workflows/pr-sccache-save/action.yml
 create mode 100644 .github/workflows/timeout-restore/action.yml
 create mode 100644 .github/workflows/timeout-save/action.yml
 create mode 100644 .github/workflows/unprivileged-download-artifact/action.yml

diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
new file mode 100644
index 0..22e39174abee7
--- /dev/null
+++ b/.github/workflows/ci-tests.yml
@@ -0,0 +1,154 @@
+name: "CI Tests"
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+types:
+  - opened
+  - synchronize
+  - reopened
+  # When a PR is closed, we still start this workflow, but then skip
+  # all the jobs, which makes it effectively a no-op.  The reason to
+  # do this is that it allows us to take advantage of concurrency groups
+  # to cancel in progress CI jobs whenever the PR is closed.
+  - closed
+branches:
+  - main
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
+  cancel-in-progress: True
+
+jobs:
+  compute-test-configs:
+name: "Compute Configurations to Test"
+if: github.event.action != 'closed'
+runs-on: ubuntu-22.04
+outputs:
+  projects: ${{ steps.vars.outputs.projects }}
+  check-targets: ${{ steps.vars.outputs.check-targets }}
+  test-build: ${{ steps.vars.outputs.check-targets != '' }}
+  test-platforms: ${{ steps.platforms.outputs.result }}
+steps:
+  - name: Fetch LLVM sources
+uses: actions/checkout@v4
+with:
+  fetch-depth: 2
+
+  - name: Compute projects to test
+id: vars
+uses: ./.github/workflows/compute-projects-to-test
+
+  - name: Compute platforms to test
+uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea 
#v7.0.1
+id: platforms
+with:
+  script: |
+linuxConfig = {
+  name: "linux-x86_64",
+  runs_on: "ubuntu-22.04"
+}
+windowsConfig = {
+  name: "windows-x86_64",
+  runs_on: "windows-2022"
+}
+macConfig = {
+  name: "macos-x86_64",
+  runs_on: "macos-13"
+}
+macArmConfig = {
+  name: "macos-aarch64",
+  runs_on: "macos-14"
+}
+
+configs = []
+
+const base_ref = process.env.GITHUB_BASE_REF;
+if (base_ref.startsWith('release/')) {
+  // This is a pull request against a release branch.
+  configs.push(macConfig)
+  configs.push(macArmConfig)
+}
+
+return configs;
+
+  ci-build-test:
+# If this job name is changed, then we need to update the job-name
+# paramater for the timeout-save step below.
+name: "Build"
+needs:

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Paul Kirth via llvm-branch-commits


ilovepi wrote:

This tends to bite anyone using LTO with RISCV. In particular I’m concerned 
about the impact on Rust, since they’ll pin LLVM until  the LLVM 19 release. 
About 60% of Fuchsia is implemented in rust. More if you count only count 
userland. 

We’re hoping to avoid a situation where we can’t use LTO on RISCV Fuchsia 
targets, as we’re starting to rely more on LTO configurations, to enable 
features like control flow integrity. 

https://github.com/llvm/llvm-project/pull/91514
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [InstSimplify] Do not simplify freeze in `simplifyWithOpReplaced` (#91215) (PR #91419)

2024-05-08 Thread Nikita Popov via llvm-branch-commits


https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/91419
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits

teresajohnson wrote:

> #87600 is a functional change and the diffbase of this patch, and 
> `llvm/test/ThinLTO/X86/import_callee_declaration.ll` should be a test case 
> for both patches.
> 
> In the [diffbase](https://github.com/llvm/llvm-project/pull/87600), bitcode 
> writer takes maps as additional parameters to populate import status, and 
> it's not straightforward to construct regression tests there without this 
> patch. I wonder if I shall introduce `cl::list` in 
> llvm-lto/llvm-lto2 (as a repeated arg) to specify `filename:GUID` to test the 
> diffbase alone.

Rather than add an option just for testing that one alone, I have a suggestion 
for splitting up the PRs slightly differently. What if you submitted this one 
first, minus the modified calls to writeIndexToFile and the part of the test 
that checks the disassembled index (just have the testing for this one check 
the number of declarations imported and other debug messages). Then move the 
modified calls to writeIndexToFile and the index disassembly checking to 
PR87600 that can be committed as a follow on? That way each change comes with a 
test.

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [AArch64][GISEL] Consider fcmp true and fcmp false in cond code selection (#86972) (PR #91126)

2024-05-08 Thread Marc Auberer via llvm-branch-commits


marcauberer wrote:

@arsenm How are lit test failures handled in case of cherry-picks? Seems like 
GISEL behaves a bit different on `release/18.x`.
I have a fix prepared locally, but how should I push it? I can't push to a 
llvmbot branch, can I?

https://github.com/llvm/llvm-project/pull/91126
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -1670,11 +1798,15 @@ Expected FunctionImporter::importFunctions(
   if (!GV.hasName())
 continue;
   auto GUID = GV.getGUID();
-  auto Import = ImportGUIDs.count(GUID);
-  LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing global "
-<< GUID << " " << GV.getName() << " from "
-<< SrcModule->getSourceFileName() << "\n");
-  if (Import) {
+  auto ImportType = maybeGetImportType(ImportGUIDs, GUID);
+  if (!ImportType)

teresajohnson wrote:

Or do what I suggested above which goes back to only needing one LLVM_DEBUG

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -245,8 +256,10 @@ static auto qualifyCalleeCandidates(
 }
 
 /// Given a list of possible callee implementation for a call site, select one
-/// that fits the \p Threshold. If none are found, the Reason will give the 
last
-/// reason for the failure (last, in the order of CalleeSummaryList entries).
+/// that fits the \p Threshold for function definition import. If none are
+/// found, the Reason will give the last reason for the failure (last, in the
+/// order of CalleeSummaryList entries). If caller wants to select eligible
+/// summary

teresajohnson wrote:

dangling sentence?

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits


https://github.com/teresajohnson commented:

I only had time for a cursory review, some comments / suggestions below. I also 
have a suggestion for the testing issue wrt to the other patch, will note that 
separately

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -1634,17 +1752,27 @@ Expected FunctionImporter::importFunctions(
   return std::move(Err);
 
 auto  = FunctionsToImportPerModule->second;
+
 // Find the globals to import
 SetVector GlobalsToImport;
 for (Function  : *SrcModule) {
   if (!F.hasName())
 continue;
   auto GUID = F.getGUID();
-  auto Import = ImportGUIDs.count(GUID);
-  LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing function "
-<< GUID << " " << F.getName() << " from "
-<< SrcModule->getSourceFileName() << "\n");
-  if (Import) {
+  auto ImportType = maybeGetImportType(ImportGUIDs, GUID);
+
+  if (!ImportType) {

teresajohnson wrote:

You could combine this with the below ImportDefinition checking to keep the 
same flow as before with one debug message, e.g.:

```
auto ImportType = maybeGetImportType(...);
auto ImportDefinition = false;
if (ImportType) {
   ImportDefinition = ...;
}
LLVM_DEBUG(dbgs() << (ImportDefinition ...
if (ImportDefinition) {
...
```

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -1634,17 +1752,27 @@ Expected FunctionImporter::importFunctions(
   return std::move(Err);
 
 auto  = FunctionsToImportPerModule->second;
+
 // Find the globals to import
 SetVector GlobalsToImport;
 for (Function  : *SrcModule) {
   if (!F.hasName())
 continue;
   auto GUID = F.getGUID();
-  auto Import = ImportGUIDs.count(GUID);
-  LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing function "
-<< GUID << " " << F.getName() << " from "
-<< SrcModule->getSourceFileName() << "\n");
-  if (Import) {
+  auto ImportType = maybeGetImportType(ImportGUIDs, GUID);
+
+  if (!ImportType) {

teresajohnson wrote:

Also consider indicating which are imported as declarations in the debug 
message?

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -158,7 +158,7 @@ void llvm::computeLTOCacheKey(
 
   std::vector ExportsGUID;
   ExportsGUID.reserve(ExportList.size());
-  for (const auto  : ExportList) {
+  for (const auto &[VI, UnusedImportType] : ExportList) {

teresajohnson wrote:

We should probably include the new import type result in the cache key. Because 
if that changes then presumably the cached object should be invalidated as it 
would be different?

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits


https://github.com/teresajohnson edited 
https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ThinLTO] Generate import status in per-module combined summary (PR #88024)

2024-05-08 Thread Teresa Johnson via llvm-branch-commits



@@ -1670,11 +1798,15 @@ Expected FunctionImporter::importFunctions(
   if (!GV.hasName())
 continue;
   auto GUID = GV.getGUID();
-  auto Import = ImportGUIDs.count(GUID);
-  LLVM_DEBUG(dbgs() << (Import ? "Is" : "Not") << " importing global "
-<< GUID << " " << GV.getName() << " from "
-<< SrcModule->getSourceFileName() << "\n");
-  if (Import) {
+  auto ImportType = maybeGetImportType(ImportGUIDs, GUID);
+  if (!ImportType)

teresajohnson wrote:

Do we need to emit a debug message in this case like you are doing for 
functions above? Ditto for aliases below

https://github.com/llvm/llvm-project/pull/88024
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [BOLT][NFCI] Use heuristic for matching split global functions (PR #90429)

2024-05-08 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/90429
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [BOLT][BAT] Fix translate for branches added by BOLT (PR #90811)

2024-05-08 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov closed https://github.com/llvm/llvm-project/pull/90811
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [BOLT][NFCI] Allow non-simple functions to be in disassembled state (PR #90806)

2024-05-08 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov closed https://github.com/llvm/llvm-project/pull/90806
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)

2024-05-08 Thread Andrew Kelley via llvm-branch-commits


andrewrk wrote:

Thanks!

https://github.com/llvm/llvm-project/pull/91425
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Craig Topper via llvm-branch-commits

topperc wrote:

> Can you briefly summarize why this is important to backport? At first glance, 
> this is only relevant for LTO with mixed architecture specifications, 
> which... I can see someone might want it, I guess, but it seems pretty easy 
> to work around not having it.

It's not just mixed architecture specifications. Even in a non-mixed situation 
the Compressed instruction flag in the ELF header doesn't get set correctly for 
LTO. Prior to these patches, the flag is set using the subtarget features from 
the TargetMachine which are empty in an LTO build. The linker needs this flag 
to do linker relaxation for alignment correctly. The workaround is to pass 
`-Wl,-plugin-opt=-mattr=+c`.

CC @ilovepi who asked me to try to backport it.

https://github.com/llvm/llvm-project/pull/91514
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Eli Friedman via llvm-branch-commits


efriedma-quic wrote:

Can you briefly summarize why this is important to backport?  At first glance, 
this is only relevant for LTO with mixed architecture specifications, which... 
I can see someone might want it, I guess, but it seems pretty easy to work 
around not having it.

https://github.com/llvm/llvm-project/pull/91514
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/90807

>From acf58ceb37d2aa917e8d84d243faadc58f5f3a7d Mon Sep 17 00:00:00 2001
From: Amir Ayupov 
Date: Mon, 6 May 2024 13:35:04 -0700
Subject: [PATCH 1/3] Simplify IsReturn check

Created using spr 1.3.4
---
 bolt/lib/Profile/DataAggregator.cpp | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/bolt/lib/Profile/DataAggregator.cpp 
b/bolt/lib/Profile/DataAggregator.cpp
index e4a7324c38175..d02e4499014ed 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -778,13 +778,13 @@ bool DataAggregator::doBranch(uint64_t From, uint64_t To, 
uint64_t Count,
 if (BinaryFunction *Func = getBinaryFunctionContainingAddress(Addr)) {
   Addr -= Func->getAddress();
   if (IsFrom) {
-if (Func->hasInstructions()) {
-  if (MCInst *Inst = Func->getInstructionAtOffset(Addr))
-IsReturn = BC->MIB->isReturn(*Inst);
-} else if (std::optional Inst =
-Func->disassembleInstructionAtOffset(Addr)) {
-  IsReturn = BC->MIB->isReturn(*Inst);
-}
+auto checkReturn = [&](auto MaybeInst) {
+  IsReturn = MaybeInst && BC->MIB->isReturn(*MaybeInst);
+};
+if (Func->hasInstructions())
+  checkReturn(Func->getInstructionAtOffset(Addr));
+else
+  checkReturn(Func->disassembleInstructionAtOffset(Addr));
   }
 
   if (BAT)

>From 22052e461e5671f376fe2dcb733446b0a63e956d Mon Sep 17 00:00:00 2001
From: Amir Ayupov 
Date: Tue, 7 May 2024 18:30:48 -0700
Subject: [PATCH 2/3] drop const from disassembleInstructionAtOffset

Created using spr 1.3.4
---
 bolt/include/bolt/Core/BinaryFunction.h | 3 +--
 bolt/lib/Core/BinaryFunction.cpp| 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/include/bolt/Core/BinaryFunction.h 
b/bolt/include/bolt/Core/BinaryFunction.h
index b21312f92c485..3c641581e247a 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -930,8 +930,7 @@ class BinaryFunction {
 return const_cast(this)->getInstructionAtOffset(Offset);
   }
 
-  const std::optional
-  disassembleInstructionAtOffset(uint64_t Offset) const;
+  std::optional disassembleInstructionAtOffset(uint64_t Offset) const;
 
   /// Return offset for the first instruction. If there is data at the
   /// beginning of a function then offset of the first instruction could
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 5f3c0cb1ad754..fb81fc3f2ba7b 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1167,7 +1167,7 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst 
,
   }
 }
 
-const std::optional
+std::optional
 BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
   assert(CurrentState == State::Empty);
   assert(Offset < MaxSize && "invalid offset");

>From 63725510bf85c9e3862800830f5881099ab4b21f Mon Sep 17 00:00:00 2001
From: Amir Ayupov 
Date: Wed, 8 May 2024 11:59:59 -0700
Subject: [PATCH 3/3] Assert messages

Created using spr 1.3.4
---
 bolt/lib/Core/BinaryFunction.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index fb81fc3f2ba7b..4721a247ee2e2 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1169,10 +1169,10 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst 
,
 
 std::optional
 BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
-  assert(CurrentState == State::Empty);
-  assert(Offset < MaxSize && "invalid offset");
+  assert(CurrentState == State::Empty && "Function should not be 
disassembled");
+  assert(Offset < MaxSize && "Invalid offset");
   ErrorOr> FunctionData = getData();
-  assert(FunctionData && "cannot get function as data");
+  assert(FunctionData && "Cannot get function as data");
   MCInst Instr;
   uint64_t InstrSize = 0;
   const uint64_t InstrAddress = getAddress() + Offset;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Craig Topper via llvm-branch-commits


https://github.com/topperc updated 
https://github.com/llvm/llvm-project/pull/91514

>From ee109e3627e5b93297bfc7908f684eedb5feb5ec Mon Sep 17 00:00:00 2001
From: Craig Topper 
Date: Tue, 13 Feb 2024 16:17:50 -0800
Subject: [PATCH 1/3] [RISCV] Add canonical ISA string as Module metadata in
 IR. (#80760)

In an LTO build, we don't set the ELF attributes to indicate what
extensions were compiled with. The target CPU/Attrs in
RISCVTargetMachine do not get set for an LTO build. Each function gets a
target-cpu/feature attribute, but this isn't usable to set ELF attributs
since we wouldn't know what function to use. We can't just once since it
might have been compiler with an attribute likes target_verson.

This patch adds the ISA as Module metadata so we can retrieve it in the
backend. Individual translation units can still be compiled with
different strings so we need to collect the unique set when Modules are
merged.

The backend will need to combine the unique ISA strings to produce a
single value for the ELF attributes. This will be done in a separate
patch.
---
 clang/lib/CodeGen/CodeGenModule.cpp   |  14 +
 .../RISCV/ntlh-intrinsics/riscv32-zihintntl.c | 350 +-
 .../test/CodeGen/RISCV/riscv-metadata-arch.c  |  20 +
 3 files changed, 209 insertions(+), 175 deletions(-)
 create mode 100644 clang/test/CodeGen/RISCV/riscv-metadata-arch.c

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 1280bcd36de94..eb13cd40eb8a2 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/RISCVISAInfo.h"
 #include "llvm/Support/TimeProfiler.h"
 #include "llvm/Support/xxhash.h"
 #include "llvm/TargetParser/Triple.h"
@@ -1059,6 +1060,19 @@ void CodeGenModule::Release() {
 llvm::LLVMContext  = TheModule.getContext();
 getModule().addModuleFlag(llvm::Module::Error, "target-abi",
   llvm::MDString::get(Ctx, ABIStr));
+
+// Add the canonical ISA string as metadata so the backend can set the ELF
+// attributes correctly. We use AppendUnique so LTO will keep all of the
+// unique ISA strings that were linked together.
+const std::vector  =
+getTarget().getTargetOpts().Features;
+auto ParseResult = llvm::RISCVISAInfo::parseFeatures(
+Arch == llvm::Triple::riscv64 ? 64 : 32, Features);
+if (!errorToBool(ParseResult.takeError()))
+  getModule().addModuleFlag(
+  llvm::Module::AppendUnique, "riscv-isa",
+  llvm::MDNode::get(
+  Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(;
   }
 
   if (CodeGenOpts.SanitizeCfiCrossDso) {
diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c 
b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
index 897edbc6450af..b11c2ca010e7c 100644
--- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
+++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
@@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2;
 
 // clang-format off
 void ntl_all_sizes() {   // CHECK-LABEL: 
ntl_all_sizes
-  uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4,

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)


Changes

Resolves #91513

---

Patch is 57.83 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/91514.diff


10 Files Affected:

- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+14) 
- (modified) clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c 
(+175-175) 
- (added) clang/test/CodeGen/RISCV/riscv-metadata-arch.c (+20) 
- (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVELFStreamer.cpp (+4-4) 
- (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVELFStreamer.h (-1) 
- (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp (+5) 
- (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.h (+5) 
- (modified) llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp (+28-4) 
- (added) llvm/test/CodeGen/RISCV/attributes-module-flag.ll (+17) 
- (added) llvm/test/CodeGen/RISCV/module-elf-flags.ll (+13) 


``diff
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 1280bcd36de94..f576cd8b853c2 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/RISCVISAInfo.h"
 #include "llvm/Support/TimeProfiler.h"
 #include "llvm/Support/xxhash.h"
 #include "llvm/TargetParser/Triple.h"
@@ -1059,6 +1060,19 @@ void CodeGenModule::Release() {
 llvm::LLVMContext  = TheModule.getContext();
 getModule().addModuleFlag(llvm::Module::Error, "target-abi",
   llvm::MDString::get(Ctx, ABIStr));
+
+// Add the canonical ISA string as metadata so the backend can set the ELF
+// attributes correctly. We use AppendUnique so LTO will keep all of the
+// unique ISA strings that were linked together.
+const std::vector  =
+getTarget().getTargetOpts().Features;
+auto ParseResult =
+llvm::RISCVISAInfo::parseFeatures(T.isRISCV64() ? 64 : 32, Features);
+if (!errorToBool(ParseResult.takeError()))
+  getModule().addModuleFlag(
+  llvm::Module::AppendUnique, "riscv-isa",
+  llvm::MDNode::get(
+  Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(;
   }
 
   if (CodeGenOpts.SanitizeCfiCrossDso) {
diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c 
b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
index 897edbc6450af..b11c2ca010e7c 100644
--- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
+++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
@@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2;
 
 // clang-format off
 void ntl_all_sizes() {   // CHECK-LABEL: 
ntl_all_sizes
-  uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain 
!5
-  v8ss1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load <8 x i16>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain 
!5
-  v16sc1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load <16 x i8>{{.*}}align 16, !nontemporal !4, !riscv-nontemporal-domain 
!5
-  *scvi1 = __riscv_ntl_load(scvi2, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Craig Topper via llvm-branch-commits


https://github.com/topperc created 
https://github.com/llvm/llvm-project/pull/91514

Resolves #91513

>From f45df1cf1b74957e2f9609b982e964654f9af824 Mon Sep 17 00:00:00 2001
From: Craig Topper 
Date: Tue, 13 Feb 2024 16:17:50 -0800
Subject: [PATCH 1/3] [RISCV] Add canonical ISA string as Module metadata in
 IR. (#80760)

In an LTO build, we don't set the ELF attributes to indicate what
extensions were compiled with. The target CPU/Attrs in
RISCVTargetMachine do not get set for an LTO build. Each function gets a
target-cpu/feature attribute, but this isn't usable to set ELF attributs
since we wouldn't know what function to use. We can't just once since it
might have been compiler with an attribute likes target_verson.

This patch adds the ISA as Module metadata so we can retrieve it in the
backend. Individual translation units can still be compiled with
different strings so we need to collect the unique set when Modules are
merged.

The backend will need to combine the unique ISA strings to produce a
single value for the ELF attributes. This will be done in a separate
patch.
---
 clang/lib/CodeGen/CodeGenModule.cpp   |  14 +
 .../RISCV/ntlh-intrinsics/riscv32-zihintntl.c | 350 +-
 .../test/CodeGen/RISCV/riscv-metadata-arch.c  |  20 +
 3 files changed, 209 insertions(+), 175 deletions(-)
 create mode 100644 clang/test/CodeGen/RISCV/riscv-metadata-arch.c

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp 
b/clang/lib/CodeGen/CodeGenModule.cpp
index 1280bcd36de94..f576cd8b853c2 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/RISCVISAInfo.h"
 #include "llvm/Support/TimeProfiler.h"
 #include "llvm/Support/xxhash.h"
 #include "llvm/TargetParser/Triple.h"
@@ -1059,6 +1060,19 @@ void CodeGenModule::Release() {
 llvm::LLVMContext  = TheModule.getContext();
 getModule().addModuleFlag(llvm::Module::Error, "target-abi",
   llvm::MDString::get(Ctx, ABIStr));
+
+// Add the canonical ISA string as metadata so the backend can set the ELF
+// attributes correctly. We use AppendUnique so LTO will keep all of the
+// unique ISA strings that were linked together.
+const std::vector  =
+getTarget().getTargetOpts().Features;
+auto ParseResult =
+llvm::RISCVISAInfo::parseFeatures(T.isRISCV64() ? 64 : 32, Features);
+if (!errorToBool(ParseResult.takeError()))
+  getModule().addModuleFlag(
+  llvm::Module::AppendUnique, "riscv-isa",
+  llvm::MDNode::get(
+  Ctx, llvm::MDString::get(Ctx, (*ParseResult)->toString(;
   }
 
   if (CodeGenOpts.SanitizeCfiCrossDso) {
diff --git a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c 
b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
index 897edbc6450af..b11c2ca010e7c 100644
--- a/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
+++ b/clang/test/CodeGen/RISCV/ntlh-intrinsics/riscv32-zihintntl.c
@@ -28,190 +28,190 @@ vint8m1_t *scvc1, *scvc2;
 
 // clang-format off
 void ntl_all_sizes() {   // CHECK-LABEL: 
ntl_all_sizes
-  uc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  sc = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i8{{.*}}align 1, !nontemporal !4, !riscv-nontemporal-domain !5
-  us = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ss = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i16{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  ui = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  si = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
i32{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  ull = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  sll = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE); // CHECK: load 
i64{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  h1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
half{{.*}}align 2, !nontemporal !4, !riscv-nontemporal-domain !5
-  f1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
float{{.*}}align 4, !nontemporal !4, !riscv-nontemporal-domain !5
-  d1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // CHECK: load 
double{{.*}}align 8, !nontemporal !4, !riscv-nontemporal-domain !5
-  v4si1 = __riscv_ntl_load(, __RISCV_NTLH_INNERMOST_PRIVATE);   // 
CHECK: load <4 x i32>{{.*}}align 16, !nontemporal !4,

[llvm-branch-commits] [clang] [llvm] Backport "riscv-isa" module metadata to 18.x (PR #91514)

2024-05-08 Thread Craig Topper via llvm-branch-commits


https://github.com/topperc milestoned 
https://github.com/llvm/llvm-project/pull/91514
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Maksim Panchenko via llvm-branch-commits


https://github.com/maksfb edited https://github.com/llvm/llvm-project/pull/90807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Maksim Panchenko via llvm-branch-commits



@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst 
,
   }
 }
 
+std::optional
+BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
+  assert(CurrentState == State::Empty);
+  assert(Offset < MaxSize && "invalid offset");

maksfb wrote:

nit: capitalize the message.

https://github.com/llvm/llvm-project/pull/90807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Maksim Panchenko via llvm-branch-commits



@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst 
,
   }
 }
 
+std::optional
+BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
+  assert(CurrentState == State::Empty);
+  assert(Offset < MaxSize && "invalid offset");
+  ErrorOr> FunctionData = getData();
+  assert(FunctionData && "cannot get function as data");

maksfb wrote:

nit: ditto.

https://github.com/llvm/llvm-project/pull/90807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Maksim Panchenko via llvm-branch-commits



@@ -1167,6 +1167,21 @@ void BinaryFunction::handleAArch64IndirectCall(MCInst 
,
   }
 }
 
+std::optional
+BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
+  assert(CurrentState == State::Empty);

maksfb wrote:

nit: add message.

https://github.com/llvm/llvm-project/pull/90807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Ignore returns in DataAggregator (PR #90807)

2024-05-08 Thread Maksim Panchenko via llvm-branch-commits


https://github.com/maksfb approved this pull request.

Please address the nits. Otherwise - good to go.

https://github.com/llvm/llvm-project/pull/90807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits



@@ -4991,3 +4971,38 @@ OMPClause 
*Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc);
   return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data);
 }
+
+bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+   SourceLocation ,
+   SourceLocation ,
+   SourceLocation ,
+   SmallVectorImpl ,
+   bool ReqIntConst) {
+  assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) &&
+ "Expected parsing to start at clause name");
+  ClauseNameLoc = ConsumeToken();
+
+  // Parse inside of '(' and ')'.
+  BalancedDelimiterTracker T(*this, tok::l_paren, 
tok::annot_pragma_openmp_end);
+  if (T.consumeOpen()) {
+Diag(Tok, diag::err_expected) << tok::l_paren;
+return true;
+  }
+
+  // Parse the list with interleaved commas.
+  do {
+ExprResult Val =
+ReqIntConst ? ParseConstantExpression() : ParseAssignmentExpression();
+if (!Val.isUsable()) {
+  // Encountered something other than an expression; abort to ')'.
+  T.skipToEnd();
+  return true;

Meinersbur wrote:

Callers should not use the output parameters when returning true. This seems to 
be common for `Parse...` methods, such as `ParseOpenACCIntExprList`, 
`parseOpenMPDeclareMapperVarDecl`, `ParseOpenMPParensExpr`, ...

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits



@@ -17432,16 +17457,54 @@ OMPClause 
*SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs,
   SourceLocation StartLoc,
   SourceLocation LParenLoc,
   SourceLocation EndLoc) {
-  for (Expr *SizeExpr : SizeExprs) {
-ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause(
-SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true);
-if (!NumForLoopsResult.isUsable())
-  return nullptr;
+  SmallVector SanitizedSizeExprs;
+  llvm::append_range(SanitizedSizeExprs, SizeExprs);
+
+  for (Expr * : SanitizedSizeExprs) {
+// Skip if already sanitized, e.g. during a partial template instantiation.
+if (!SizeExpr)
+  continue;
+
+bool IsValid = isNonNegativeIntegerValue(SizeExpr, SemaRef, OMPC_sizes,
+ /*StrictlyPositive=*/true);
+
+// isNonNegativeIntegerValue returns true for non-integral types (but still
+// emits error diagnostic), so check for the expected type explicitly.
+QualType SizeTy = SizeExpr->getType();
+if (!SizeTy->isIntegerType())
+  IsValid = false;
+
+// Handling in templates is tricky. There are four possibilities to
+// consider:
+//
+// 1a. The expression is valid and we are in a instantiated template or not
+// in a template:
+//   Pass valid expression to be further analysed later in Sema.
+// 1b. The expression is valid and we are in a template (including partial
+// instantiation):
+//   isNonNegativeIntegerValue skipped any checks so there is no
+//   guarantee it will be correct after instantiation.
+//   ActOnOpenMPSizesClause will be called again at instantiation when
+//   it is not in a dependent context anymore. This may cause warnings
+//   to be emitted multiple times.
+// 2a. The expression is invalid and we are in an instantiated template or
+// not in a template:
+//   Invalidate the expression with a clearly wrong value (nullptr) so
+//   later in Sema we do not have to do the same validity analysis 
again
+//   or crash from unexpected data. Error diagnostics have already been
+//   emitted.
+// 2b. The expression is invalid and we are in a template (including 
partial
+// instantiation):
+//   Pass the invalid expression as-is, template instantiation may
+//   replace unexpected types/values with valid ones. The directives
+//   with this clause must not try to use these expressions in 
dependent
+//   contexts.

Meinersbur wrote:

Case 2b is already adhered to in `ActOnOpenMPTileDirective`:
```
  // Delay tiling to when template is completely instantiated.
  if (SemaRef.CurContext->isDependentContext())
return OMPTileDirective::Create(Context, StartLoc, EndLoc, Clauses,
NumLoops, AStmt, nullptr, nullptr);
```
Delaying further analysis seems generally to be what OpenMP does, e.g. 
```
isNonNegativeIntegerValue(...) {
  if (!ValExpr->isTypeDependent() && !ValExpr->isValueDependent() &&
  !ValExpr->isInstantiationDependent()) {
```

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits



@@ -15197,6 +15202,36 @@ StmtResult 
SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses,
   // Once the original iteration values are set, append the innermost body.
   Stmt *Inner = Body;
 
+  auto MakeDimTileSize = [ = this->SemaRef, , ,
+  SizesClause, CurScope](int I) -> Expr * {
+Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I];
+if (isa(DimTileSizeExpr))
+  return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr));
+
+// When the tile size is not a constant but a variable, it is possible to
+// pass non-positive numbers. To preserve the invariant that every loop

Meinersbur wrote:

```
int a = 0;
#pragma omp tile sizes(a)
for (int i = 0; i < 42; ++i)
  body(i);
```
While I don't think it can be expected this gives any useful tiling, it should 
still execute `body` 42 times.

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits


https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/91345

>From a2aa6950ce3880b8e669025d95ac9e72245e26a7 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Tue, 7 May 2024 16:42:41 +0200
Subject: [PATCH 1/3] Allow non-constant tile sizes

---
 clang/include/clang/Parse/Parser.h|  17 ++
 clang/lib/Parse/ParseOpenMP.cpp   |  65 --
 clang/lib/Sema/SemaOpenMP.cpp | 113 +++--
 clang/test/OpenMP/tile_ast_print.cpp  |  17 ++
 clang/test/OpenMP/tile_codegen.cpp| 216 --
 clang/test/OpenMP/tile_messages.cpp   |  50 +++-
 openmp/runtime/test/transform/tile/intfor.c   | 187 +++
 .../test/transform/tile/negtile_intfor.c  |  44 
 .../tile/parallel-wsloop-collapse-intfor.cpp  | 100 
 9 files changed, 737 insertions(+), 72 deletions(-)
 create mode 100644 openmp/runtime/test/transform/tile/intfor.c
 create mode 100644 openmp/runtime/test/transform/tile/negtile_intfor.c
 create mode 100644 
openmp/runtime/test/transform/tile/parallel-wsloop-collapse-intfor.cpp

diff --git a/clang/include/clang/Parse/Parser.h 
b/clang/include/clang/Parse/Parser.h
index daefd4f28f011..1b500c11457f4 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -3553,6 +3553,23 @@ class Parser : public CodeCompletionHandler {
   OMPClause *ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OpenMPClauseKind Kind, bool ParseOnly);
 
+  /// Parses a clause consisting of a list of expressions.
+  ///
+  /// \param Kind  The clause to parse.
+  /// \param ClauseNameLoc [out] The location of the clause name.
+  /// \param OpenLoc   [out] The location of '('.
+  /// \param CloseLoc  [out] The location of ')'.
+  /// \param Exprs [out] The parsed expressions.
+  /// \param ReqIntConst   If true, each expression must be an integer 
constant.
+  ///
+  /// \return Whether the clause was parsed successfully.
+  bool ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+ SourceLocation ,
+ SourceLocation ,
+ SourceLocation ,
+ SmallVectorImpl ,
+ bool ReqIntConst = false);
+
   /// Parses and creates OpenMP 5.0 iterators expression:
   ///  = 'iterator' '(' { [  ] identifier =
   ///  }+ ')'
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 18ba1185ee8de..b8b32f9546c4f 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3107,34 +3107,14 @@ bool Parser::ParseOpenMPSimpleVarList(
 }
 
 OMPClause *Parser::ParseOpenMPSizesClause() {
-  SourceLocation ClauseNameLoc = ConsumeToken();
+  SourceLocation ClauseNameLoc, OpenLoc, CloseLoc;
   SmallVector ValExprs;
-
-  BalancedDelimiterTracker T(*this, tok::l_paren, 
tok::annot_pragma_openmp_end);
-  if (T.consumeOpen()) {
-Diag(Tok, diag::err_expected) << tok::l_paren;
+  if (ParseOpenMPExprListClause(OMPC_sizes, ClauseNameLoc, OpenLoc, CloseLoc,
+ValExprs))
 return nullptr;
-  }
-
-  while (true) {
-ExprResult Val = ParseConstantExpression();
-if (!Val.isUsable()) {
-  T.skipToEnd();
-  return nullptr;
-}
-
-ValExprs.push_back(Val.get());
-
-if (Tok.is(tok::r_paren) || Tok.is(tok::annot_pragma_openmp_end))
-  break;
-
-ExpectAndConsume(tok::comma);
-  }
-
-  T.consumeClose();
 
-  return Actions.OpenMP().ActOnOpenMPSizesClause(
-  ValExprs, ClauseNameLoc, T.getOpenLocation(), T.getCloseLocation());
+  return Actions.OpenMP().ActOnOpenMPSizesClause(ValExprs, ClauseNameLoc,
+ OpenLoc, CloseLoc);
 }
 
 OMPClause *Parser::ParseOpenMPUsesAllocatorClause(OpenMPDirectiveKind DKind) {
@@ -4991,3 +4971,38 @@ OMPClause 
*Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc);
   return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data);
 }
+
+bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+   SourceLocation ,
+   SourceLocation ,
+   SourceLocation ,
+   SmallVectorImpl ,
+   bool ReqIntConst) {
+  assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) &&
+ "Expected parsing to start at clause name");
+  ClauseNameLoc = ConsumeToken();
+
+  // Parse inside of '(' and ')'.
+  BalancedDelimiterTracker T(*this, tok::l_paren, 
tok::annot_pragma_openmp_end);
+  if (T.consumeOpen()) {
+Diag(Tok, diag::err_expected) << tok::l_paren;
+return true;
+  }
+
+  // Parse the list with interleaved commas.
+  do {
+ExprResult Val =
+ReqIntConst ?

[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:

@jhuber6 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/91479
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)

2024-05-08 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/91479
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [openmp] release/18.x: [OpenMP] Fix child processes to use affinity_none (#91391) (PR #91479)

2024-05-08 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/91479

Backport 73bb8d9

Requested by: @jpeyton52

>From 8665ddef7954319a892cc7ce46099d1d31f59a1c Mon Sep 17 00:00:00 2001
From: Jonathan Peyton 
Date: Wed, 8 May 2024 09:23:50 -0500
Subject: [PATCH] [OpenMP] Fix child processes to use affinity_none (#91391)

When a child process is forked with OpenMP already initialized, the
child process resets its affinity mask and sets proc-bind-var to false
so that the entire original affinity mask is used. This patch corrects
an issue with the affinity initialization code setting affinity to
compact instead of none for this special case of forked children.

The test trying to catch this only testing explicit setting of
KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

Fixes: #91098
(cherry picked from commit 73bb8d9d92f689863c94d48517e89d35dae0ebcf)
---
 openmp/runtime/src/kmp_settings.cpp | 2 ++
 openmp/runtime/test/affinity/redetect.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/openmp/runtime/src/kmp_settings.cpp 
b/openmp/runtime/src/kmp_settings.cpp
index ec86ee07472c1..58f19ea5b8ab7 100644
--- a/openmp/runtime/src/kmp_settings.cpp
+++ b/openmp/runtime/src/kmp_settings.cpp
@@ -6426,6 +6426,8 @@ void __kmp_env_initialize(char const *string) {
 }
 if ((__kmp_nested_proc_bind.bind_types[0] != proc_bind_intel) &&
 (__kmp_nested_proc_bind.bind_types[0] != proc_bind_default)) {
+  if (__kmp_nested_proc_bind.bind_types[0] == proc_bind_false)
+__kmp_affinity.type = affinity_none;
   if (__kmp_affinity.type == affinity_default) {
 __kmp_affinity.type = affinity_compact;
 __kmp_affinity.flags.dups = FALSE;
diff --git a/openmp/runtime/test/affinity/redetect.c 
b/openmp/runtime/test/affinity/redetect.c
index dba83b72cc42e..4b96d1bd92ee7 100644
--- a/openmp/runtime/test/affinity/redetect.c
+++ b/openmp/runtime/test/affinity/redetect.c
@@ -1,4 +1,5 @@
 // RUN: %libomp-compile
+// RUN: %libomp-run
 // RUN: env KMP_AFFINITY=none %libomp-run
 // REQUIRES: linux
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -15197,6 +15202,36 @@ StmtResult 
SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses,
   // Once the original iteration values are set, append the innermost body.
   Stmt *Inner = Body;
 
+  auto MakeDimTileSize = [ = this->SemaRef, , ,
+  SizesClause, CurScope](int I) -> Expr * {
+Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I];
+if (isa(DimTileSizeExpr))
+  return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr));
+
+// When the tile size is not a constant but a variable, it is possible to
+// pass non-positive numbers. To preserve the invariant that every loop

alexey-bataev wrote:

```suggestion
// pass positive numbers. To preserve the invariant that every loop
```


https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -17432,16 +17457,54 @@ OMPClause 
*SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs,
   SourceLocation StartLoc,
   SourceLocation LParenLoc,
   SourceLocation EndLoc) {
-  for (Expr *SizeExpr : SizeExprs) {
-ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause(
-SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true);
-if (!NumForLoopsResult.isUsable())
-  return nullptr;
+  SmallVector SanitizedSizeExprs;
+  llvm::append_range(SanitizedSizeExprs, SizeExprs);

alexey-bataev wrote:

```suggestion
  SmallVector SanitizedSizeExprs(SizeExprs.begin(), SizeExprs.end());
```


https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -17432,16 +17457,54 @@ OMPClause 
*SemaOpenMP::ActOnOpenMPSizesClause(ArrayRef SizeExprs,
   SourceLocation StartLoc,
   SourceLocation LParenLoc,
   SourceLocation EndLoc) {
-  for (Expr *SizeExpr : SizeExprs) {
-ExprResult NumForLoopsResult = VerifyPositiveIntegerConstantInClause(
-SizeExpr, OMPC_sizes, /*StrictlyPositive=*/true);
-if (!NumForLoopsResult.isUsable())
-  return nullptr;
+  SmallVector SanitizedSizeExprs;
+  llvm::append_range(SanitizedSizeExprs, SizeExprs);
+
+  for (Expr * : SanitizedSizeExprs) {
+// Skip if already sanitized, e.g. during a partial template instantiation.
+if (!SizeExpr)
+  continue;
+
+bool IsValid = isNonNegativeIntegerValue(SizeExpr, SemaRef, OMPC_sizes,
+ /*StrictlyPositive=*/true);
+
+// isNonNegativeIntegerValue returns true for non-integral types (but still
+// emits error diagnostic), so check for the expected type explicitly.
+QualType SizeTy = SizeExpr->getType();
+if (!SizeTy->isIntegerType())
+  IsValid = false;
+
+// Handling in templates is tricky. There are four possibilities to
+// consider:
+//
+// 1a. The expression is valid and we are in a instantiated template or not
+// in a template:
+//   Pass valid expression to be further analysed later in Sema.
+// 1b. The expression is valid and we are in a template (including partial
+// instantiation):
+//   isNonNegativeIntegerValue skipped any checks so there is no
+//   guarantee it will be correct after instantiation.
+//   ActOnOpenMPSizesClause will be called again at instantiation when
+//   it is not in a dependent context anymore. This may cause warnings
+//   to be emitted multiple times.
+// 2a. The expression is invalid and we are in an instantiated template or
+// not in a template:
+//   Invalidate the expression with a clearly wrong value (nullptr) so
+//   later in Sema we do not have to do the same validity analysis 
again
+//   or crash from unexpected data. Error diagnostics have already been
+//   emitted.
+// 2b. The expression is invalid and we are in a template (including 
partial
+// instantiation):
+//   Pass the invalid expression as-is, template instantiation may
+//   replace unexpected types/values with valid ones. The directives
+//   with this clause must not try to use these expressions in 
dependent
+//   contexts.

alexey-bataev wrote:

This must be fixed, even if tricky

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -15111,13 +15111,11 @@ StmtResult 
SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses,
   ASTContext  = getASTContext();
   Scope *CurScope = SemaRef.getCurScope();
 
-  auto SizesClauses =
-  OMPExecutableDirective::getClausesOfKind(Clauses);
-  if (SizesClauses.empty()) {
-// A missing 'sizes' clause is already reported by the parser.
+  const OMPSizesClause *SizesClause =

alexey-bataev wrote:

```suggestion
  const auto *SizesClause =
```


https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -15197,6 +15202,36 @@ StmtResult 
SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef Clauses,
   // Once the original iteration values are set, append the innermost body.
   Stmt *Inner = Body;
 
+  auto MakeDimTileSize = [ = this->SemaRef, , ,
+  SizesClause, CurScope](int I) -> Expr * {
+Expr *DimTileSizeExpr = SizesClause->getSizesRefs()[I];
+if (isa(DimTileSizeExpr))
+  return AssertSuccess(CopyTransformer.TransformExpr(DimTileSizeExpr));
+
+// When the tile size is not a constant but a variable, it is possible to
+// pass non-positive numbers. To preserve the invariant that every loop

alexey-bataev wrote:

Or what do you mean here?

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Alexey Bataev via llvm-branch-commits



@@ -4991,3 +4971,38 @@ OMPClause 
*Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc);
   return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data);
 }
+
+bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+   SourceLocation ,
+   SourceLocation ,
+   SourceLocation ,
+   SmallVectorImpl ,
+   bool ReqIntConst) {
+  assert(getOpenMPClauseName(Kind) == PP.getSpelling(Tok) &&
+ "Expected parsing to start at clause name");
+  ClauseNameLoc = ConsumeToken();
+
+  // Parse inside of '(' and ')'.
+  BalancedDelimiterTracker T(*this, tok::l_paren, 
tok::annot_pragma_openmp_end);
+  if (T.consumeOpen()) {
+Diag(Tok, diag::err_expected) << tok::l_paren;
+return true;
+  }
+
+  // Parse the list with interleaved commas.
+  do {
+ExprResult Val =
+ReqIntConst ? ParseConstantExpression() : ParseAssignmentExpression();
+if (!Val.isUsable()) {
+  // Encountered something other than an expression; abort to ')'.
+  T.skipToEnd();
+  return true;

alexey-bataev wrote:

Do you need to clear output parameters if early exited?

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)

2024-05-08 Thread Tobias Gysi via llvm-branch-commits



@@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots(
   // lazily and cached to avoid expensive recomputation.
   BlockIndexCache blockIndexCache;
 
-  for (PromotableAllocationOpInterface allocator : allocators) {
-for (MemorySlot slot : allocator.getPromotableSlots()) {
-  if (slot.ptr.use_empty())
-continue;
-
-  MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
-  std::optional info = analyzer.computeInfo();
-  if (info) {
-MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
-   std::move(*info), statistics, blockIndexCache)
-.promoteSlot();
-promotedAny = true;
+  SmallVector workList(allocators.begin(),
+allocators.end());
+
+  SmallVector newWorkList;
+  newWorkList.reserve(workList.size());
+  while (true) {
+for (PromotableAllocationOpInterface allocator : workList) {
+  for (MemorySlot slot : allocator.getPromotableSlots()) {
+if (slot.ptr.use_empty())
+  continue;
+
+MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
+std::optional info = analyzer.computeInfo();
+if (info) {
+  MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
+ std::move(*info), statistics, blockIndexCache)
+  .promoteSlot();
+  promotedAny = true;
+  continue;
+}
+newWorkList.push_back(allocator);

gysit wrote:

I think we may add the same allocator multiple times here, if the allocator 
returns multiple slots?

https://github.com/llvm/llvm-project/pull/91464
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-core

Author: Christian Ulmann (Dinistro)


Changes

This commit modifies Mem2Reg's API to always attempt a full promotion on all 
the passed in "allocators". This ensures that the pass does not require 
unnecessary walks over the regions and improves caching benefits.

---
Full diff: https://github.com/llvm/llvm-project/pull/91464.diff


2 Files Affected:

- (modified) mlir/include/mlir/Transforms/Mem2Reg.h (+3-3) 
- (modified) mlir/lib/Transforms/Mem2Reg.cpp (+36-26) 


``diff
diff --git a/mlir/include/mlir/Transforms/Mem2Reg.h 
b/mlir/include/mlir/Transforms/Mem2Reg.h
index fee7fb312750..6986cad9ae12 100644
--- a/mlir/include/mlir/Transforms/Mem2Reg.h
+++ b/mlir/include/mlir/Transforms/Mem2Reg.h
@@ -9,7 +9,6 @@
 #ifndef MLIR_TRANSFORMS_MEM2REG_H
 #define MLIR_TRANSFORMS_MEM2REG_H
 
-#include "mlir/IR/PatternMatch.h"
 #include "mlir/Interfaces/MemorySlotInterfaces.h"
 #include "llvm/ADT/Statistic.h"
 
@@ -23,8 +22,9 @@ struct Mem2RegStatistics {
   llvm::Statistic *newBlockArgumentAmount = nullptr;
 };
 
-/// Attempts to promote the memory slots of the provided allocators. Succeeds 
if
-/// at least one memory slot was promoted.
+/// Attempts to promote the memory slots of the provided allocators. 
Iteratively
+/// retries the promotion of all slots as promoting one slot might enable
+/// subsequent promotions. Succeeds if at least one memory slot was promoted.
 LogicalResult
 tryToPromoteMemorySlots(ArrayRef allocators,
 OpBuilder , const DataLayout ,
diff --git a/mlir/lib/Transforms/Mem2Reg.cpp b/mlir/lib/Transforms/Mem2Reg.cpp
index 8adbbcd01cb4..390d2a3f54b6 100644
--- a/mlir/lib/Transforms/Mem2Reg.cpp
+++ b/mlir/lib/Transforms/Mem2Reg.cpp
@@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots(
   // lazily and cached to avoid expensive recomputation.
   BlockIndexCache blockIndexCache;
 
-  for (PromotableAllocationOpInterface allocator : allocators) {
-for (MemorySlot slot : allocator.getPromotableSlots()) {
-  if (slot.ptr.use_empty())
-continue;
-
-  MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
-  std::optional info = analyzer.computeInfo();
-  if (info) {
-MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
-   std::move(*info), statistics, blockIndexCache)
-.promoteSlot();
-promotedAny = true;
+  SmallVector workList(allocators.begin(),
+allocators.end());
+
+  SmallVector newWorkList;
+  newWorkList.reserve(workList.size());
+  while (true) {
+for (PromotableAllocationOpInterface allocator : workList) {
+  for (MemorySlot slot : allocator.getPromotableSlots()) {
+if (slot.ptr.use_empty())
+  continue;
+
+MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
+std::optional info = analyzer.computeInfo();
+if (info) {
+  MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
+ std::move(*info), statistics, blockIndexCache)
+  .promoteSlot();
+  promotedAny = true;
+  continue;
+}
+newWorkList.push_back(allocator);
   }
 }
+if (workList.size() == newWorkList.size())
+  break;
+
+// Swap the vector's backing memory and clear the entries in newWorkList
+// afterwards. This ensures that additional heap allocations can be 
avoided.
+workList.swap(newWorkList);
+newWorkList.clear();
   }
 
   return success(promotedAny);
@@ -677,22 +693,16 @@ struct Mem2Reg : impl::Mem2RegBase {
 
   OpBuilder builder((), region.front().begin());
 
-  // Promoting a slot can allow for further promotion of other slots,
-  // promotion is tried until no promotion succeeds.
-  while (true) {
-SmallVector allocators;
-// Build a list of allocators to attempt to promote the slots of.
-region.walk([&](PromotableAllocationOpInterface allocator) {
-  allocators.emplace_back(allocator);
-});
-
-// Attempt promoting until no promotion succeeds.
-if (failed(tryToPromoteMemorySlots(allocators, builder, dataLayout,
-   dominance, statistics)))
-  break;
+  SmallVector allocators;
+  // Build a list of allocators to attempt to promote the slots of.
+  region.walk([&](PromotableAllocationOpInterface allocator) {
+allocators.emplace_back(allocator);
+  });
 
+  // Attempt promoting as many of the slots as possible.
+  if (succeeded(tryToPromoteMemorySlots(allocators, builder, dataLayout,
+dominance, statistics)))
 changed = true;
-  }
 }
 if (!changed)
   markAllAnalysesPreserved();

``




https://github.com/llvm/llvm-project/pull/91464

[llvm-branch-commits] [mlir] [MLIR][Mem2Reg] Change API to always retry promotion after changes (PR #91464)

2024-05-08 Thread Christian Ulmann via llvm-branch-commits


https://github.com/Dinistro created 
https://github.com/llvm/llvm-project/pull/91464

This commit modifies Mem2Reg's API to always attempt a full promotion on all 
the passed in "allocators". This ensures that the pass does not require 
unnecessary walks over the regions and improves caching benefits.

>From c5a6fd716c09d3445db41337c1bfbc9d6626e4da Mon Sep 17 00:00:00 2001
From: Christian Ulmann 
Date: Wed, 8 May 2024 12:03:56 +
Subject: [PATCH] [MLIR][Mem2Reg] Change API to always retry promotion after
 changes

This commit modifies the Mem2Reg's API to always attempt a full
promotion on all the passed in "allocators". This ensures that the pass
does not require unnecessary walks over the regions and improves caching
benefits.
---
 mlir/include/mlir/Transforms/Mem2Reg.h |  6 +--
 mlir/lib/Transforms/Mem2Reg.cpp| 62 +++---
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/mlir/include/mlir/Transforms/Mem2Reg.h 
b/mlir/include/mlir/Transforms/Mem2Reg.h
index fee7fb312750..6986cad9ae12 100644
--- a/mlir/include/mlir/Transforms/Mem2Reg.h
+++ b/mlir/include/mlir/Transforms/Mem2Reg.h
@@ -9,7 +9,6 @@
 #ifndef MLIR_TRANSFORMS_MEM2REG_H
 #define MLIR_TRANSFORMS_MEM2REG_H
 
-#include "mlir/IR/PatternMatch.h"
 #include "mlir/Interfaces/MemorySlotInterfaces.h"
 #include "llvm/ADT/Statistic.h"
 
@@ -23,8 +22,9 @@ struct Mem2RegStatistics {
   llvm::Statistic *newBlockArgumentAmount = nullptr;
 };
 
-/// Attempts to promote the memory slots of the provided allocators. Succeeds 
if
-/// at least one memory slot was promoted.
+/// Attempts to promote the memory slots of the provided allocators. 
Iteratively
+/// retries the promotion of all slots as promoting one slot might enable
+/// subsequent promotions. Succeeds if at least one memory slot was promoted.
 LogicalResult
 tryToPromoteMemorySlots(ArrayRef allocators,
 OpBuilder , const DataLayout ,
diff --git a/mlir/lib/Transforms/Mem2Reg.cpp b/mlir/lib/Transforms/Mem2Reg.cpp
index 8adbbcd01cb4..390d2a3f54b6 100644
--- a/mlir/lib/Transforms/Mem2Reg.cpp
+++ b/mlir/lib/Transforms/Mem2Reg.cpp
@@ -636,20 +636,36 @@ LogicalResult mlir::tryToPromoteMemorySlots(
   // lazily and cached to avoid expensive recomputation.
   BlockIndexCache blockIndexCache;
 
-  for (PromotableAllocationOpInterface allocator : allocators) {
-for (MemorySlot slot : allocator.getPromotableSlots()) {
-  if (slot.ptr.use_empty())
-continue;
-
-  MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
-  std::optional info = analyzer.computeInfo();
-  if (info) {
-MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
-   std::move(*info), statistics, blockIndexCache)
-.promoteSlot();
-promotedAny = true;
+  SmallVector workList(allocators.begin(),
+allocators.end());
+
+  SmallVector newWorkList;
+  newWorkList.reserve(workList.size());
+  while (true) {
+for (PromotableAllocationOpInterface allocator : workList) {
+  for (MemorySlot slot : allocator.getPromotableSlots()) {
+if (slot.ptr.use_empty())
+  continue;
+
+MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
+std::optional info = analyzer.computeInfo();
+if (info) {
+  MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
+ std::move(*info), statistics, blockIndexCache)
+  .promoteSlot();
+  promotedAny = true;
+  continue;
+}
+newWorkList.push_back(allocator);
   }
 }
+if (workList.size() == newWorkList.size())
+  break;
+
+// Swap the vector's backing memory and clear the entries in newWorkList
+// afterwards. This ensures that additional heap allocations can be 
avoided.
+workList.swap(newWorkList);
+newWorkList.clear();
   }
 
   return success(promotedAny);
@@ -677,22 +693,16 @@ struct Mem2Reg : impl::Mem2RegBase {
 
   OpBuilder builder((), region.front().begin());
 
-  // Promoting a slot can allow for further promotion of other slots,
-  // promotion is tried until no promotion succeeds.
-  while (true) {
-SmallVector allocators;
-// Build a list of allocators to attempt to promote the slots of.
-region.walk([&](PromotableAllocationOpInterface allocator) {
-  allocators.emplace_back(allocator);
-});
-
-// Attempt promoting until no promotion succeeds.
-if (failed(tryToPromoteMemorySlots(allocators, builder, dataLayout,
-   dominance, statistics)))
-  break;
+  SmallVector allocators;
+  // Build a list of allocators to attempt to promote the slots of.
+  region.walk([&](PromotableAllocationOpInterface allocator) {
+allocators.emplace_back(allocator);
+  });
 
+  // Attempt

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Michael Kruse (Meinersbur)


Changes

Allow non-constants in the `sizes` clause such as
```
#pragma omp tile sizes(a)
for (int i = 0; i  n; ++i)
```
This is permitted since tile was introduced in [OpenMP 
5.1](https://www.openmp.org/spec-html/5.1/openmpsu53.html#x78-860002.11.9).

It is possible to sneak-in negative numbers at runtime as in
```
int a = -1;
#pragma omp tile sizes(a)
```
Even though it is not well-formed, it should still result in every loop 
iteration to be executed exactly once, an invariant of the tile construct that 
we should ensure. `ParseOpenMPExprListClause` is extracted-out to be reused by 
the `permutation` clause if the `interchange` construct. Some care was put in 
to ensure correct behavior in template contexts.

This patch also adds end-to-end tests. This is to avoid errors like the 
off-by-one error I caused with the initial implementation of the unroll 
construct.


---

Patch is 41.44 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/91345.diff


9 Files Affected:

- (modified) clang/include/clang/Parse/Parser.h (+17) 
- (modified) clang/lib/Parse/ParseOpenMP.cpp (+40-25) 
- (modified) clang/lib/Sema/SemaOpenMP.cpp (+88-25) 
- (modified) clang/test/OpenMP/tile_ast_print.cpp (+17) 
- (modified) clang/test/OpenMP/tile_codegen.cpp (+201-15) 
- (modified) clang/test/OpenMP/tile_messages.cpp (+43-7) 
- (added) openmp/runtime/test/transform/tile/intfor.c (+191) 
- (added) openmp/runtime/test/transform/tile/negtile_intfor.c (+44) 
- (added) 
openmp/runtime/test/transform/tile/parallel-wsloop-collapse-intfor.cpp (+100) 


``diff
diff --git a/clang/include/clang/Parse/Parser.h 
b/clang/include/clang/Parse/Parser.h
index daefd4f28f011..1b500c11457f4 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -3553,6 +3553,23 @@ class Parser : public CodeCompletionHandler {
   OMPClause *ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OpenMPClauseKind Kind, bool ParseOnly);
 
+  /// Parses a clause consisting of a list of expressions.
+  ///
+  /// \param Kind  The clause to parse.
+  /// \param ClauseNameLoc [out] The location of the clause name.
+  /// \param OpenLoc   [out] The location of '('.
+  /// \param CloseLoc  [out] The location of ')'.
+  /// \param Exprs [out] The parsed expressions.
+  /// \param ReqIntConst   If true, each expression must be an integer 
constant.
+  ///
+  /// \return Whether the clause was parsed successfully.
+  bool ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+ SourceLocation ,
+ SourceLocation ,
+ SourceLocation ,
+ SmallVectorImpl ,
+ bool ReqIntConst = false);
+
   /// Parses and creates OpenMP 5.0 iterators expression:
   ///  = 'iterator' '(' { [  ] identifier =
   ///  }+ ')'
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 18ba1185ee8de..b8b32f9546c4f 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3107,34 +3107,14 @@ bool Parser::ParseOpenMPSimpleVarList(
 }
 
 OMPClause *Parser::ParseOpenMPSizesClause() {
-  SourceLocation ClauseNameLoc = ConsumeToken();
+  SourceLocation ClauseNameLoc, OpenLoc, CloseLoc;
   SmallVector ValExprs;
-
-  BalancedDelimiterTracker T(*this, tok::l_paren, 
tok::annot_pragma_openmp_end);
-  if (T.consumeOpen()) {
-Diag(Tok, diag::err_expected) << tok::l_paren;
+  if (ParseOpenMPExprListClause(OMPC_sizes, ClauseNameLoc, OpenLoc, CloseLoc,
+ValExprs))
 return nullptr;
-  }
-
-  while (true) {
-ExprResult Val = ParseConstantExpression();
-if (!Val.isUsable()) {
-  T.skipToEnd();
-  return nullptr;
-}
-
-ValExprs.push_back(Val.get());
-
-if (Tok.is(tok::r_paren) || Tok.is(tok::annot_pragma_openmp_end))
-  break;
-
-ExpectAndConsume(tok::comma);
-  }
-
-  T.consumeClose();
 
-  return Actions.OpenMP().ActOnOpenMPSizesClause(
-  ValExprs, ClauseNameLoc, T.getOpenLocation(), T.getCloseLocation());
+  return Actions.OpenMP().ActOnOpenMPSizesClause(ValExprs, ClauseNameLoc,
+ OpenLoc, CloseLoc);
 }
 
 OMPClause *Parser::ParseOpenMPUsesAllocatorClause(OpenMPDirectiveKind DKind) {
@@ -4991,3 +4971,38 @@ OMPClause 
*Parser::ParseOpenMPVarListClause(OpenMPDirectiveKind DKind,
   OMPVarListLocTy Locs(Loc, LOpen, Data.RLoc);
   return Actions.OpenMP().ActOnOpenMPVarListClause(Kind, Vars, Locs, Data);
 }
+
+bool Parser::ParseOpenMPExprListClause(OpenMPClauseKind Kind,
+   SourceLocation ,
+   SourceLocation ,
+   SourceLocation ,
+

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits


https://github.com/Meinersbur ready_for_review 
https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [openmp] [Clang][OpenMP][Tile] Allow non-constant tile sizes. (PR #91345)

2024-05-08 Thread Michael Kruse via llvm-branch-commits


Meinersbur wrote:

Test failure is from unrelated `DataFlowSanitizer-x86_64 :: 
release_shadow_space.c`

https://github.com/llvm/llvm-project/pull/91345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)

2024-05-08 Thread Simon Pilgrim via llvm-branch-commits


https://github.com/RKSimon approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/91425
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

87 matches

Mail list logo