[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/159234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/159234

AMDGPU: Ensure both wavesize features are not set

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

This breaks the assembler test which enables both, but this
behavior is not really useful. Maybe it's better to just delete
the test.

Convert wave_any test to update_mc_test_checks

update wave_any test

>From 41365e5cc69b3732c8bc8f1d138c3b6984e08e41 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 17 Sep 2025 02:00:48 +0900
Subject: [PATCH 1/3] AMDGPU: Ensure both wavesize features are not set

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

This breaks the assembler test which enables both, but this
behavior is not really useful. Maybe it's better to just delete
the test.
---
 .../MCTargetDesc/AMDGPUMCTargetDesc.cpp   | 16 +++--
 .../wavesize-feature-unsupported-target.s | 23 +++
 .../AMDGPU/gfx1250_wave64_feature.s   | 13 +++
 .../AMDGPU/gfx9_wave32_feature.txt| 13 +++
 4 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s
 create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s
 create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx9_wave32_feature.txt

diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef 
CPU, StringRef FS) {
   MCSubtargetInfo *STI =
   createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS);
 
+  bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64);
+  bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32);
+
   // FIXME: We should error for the default target.
   if (STI->getFeatureBits().none())
 STI->ToggleFeature(AMDGPU::FeatureSouthernIslands);
 
-  if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) &&
-  !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) {
+  if (!IsWave64 && !IsWave32) {
 // If there is no default wave size it must be a generation before gfx10,
 // these have FeatureWavefrontSize64 in their definition already. For 
gfx10+
 // set wave32 as a default.
 STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI)
? AMDGPU::FeatureWavefrontSize32
: AMDGPU::FeatureWavefrontSize64);
+  } else if (IsWave64 && IsWave32) {
+// The wave size is mutually exclusive. If both somehow end up set, wave64
+// wins.
+//
+// FIXME: This should really just be an error.
+STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32);
   }
 
+  assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
+  STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
+ "wavesize features are mutually exclusive");
+
   return STI;
 }
 
diff --git a/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s 
b/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s
new file mode 100644
index 0..8fc7b7fb05f0c
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s
@@ -0,0 +1,23 @@
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+wavefrontsize64 -o - %s | 
FileCheck -check-prefix=GFX1250 %s
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -mattr=+wavefrontsize32 -o - %s | 
FileCheck -check-prefix=GFX900 %s
+
+// Both are supported, but not at the same time
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 
-mattr=+wavefrontsize32,+wavefrontsize64 %s | FileCheck -check-prefixes=GFX10 %s
+
+// Test that there is no assertion when using an explicit
+// wavefrontsize attribute on a target which does not support it.
+
+// GFX1250: v_add_f64_e32 v[0:1], 1.0, v[0:1]
+// GFX900: v_add_f64 v[0:1], 1.0, v[0:1]
+// GFX10: v_add_f64 v[0:1], 1.0, v[0:1]
+v_add_f64 v[0:1], 1.0, v[0:1]
+
+// GFX1250: v_cmp_eq_u32_e64 s[0:1], 1.0, s1
+// GFX900: v_cmp_eq_u32_e64 s[0:1], 1.0, s1
+// GFX10: v_cmp_eq_u32_e64 s[0:1], 1.0, s1
+v_cmp_eq_u32_e64 s[0:1], 1.0, s1
+
+// GFX1250: v_cndmask_b32_e64 v1, v2, v3, s[0:1]
+// GFX900: v_cndmask_b32_e64 v1, v2, v3, s[0:1]
+// GFX10: v_cndmask_b32_e64 v1, v2, v3, s[0:1]
+v_cndmask_b32 v1, v2, v3, s[0:1]
diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s
new file mode 100644
index 0..bdea636a9efe3
--- /dev/null
+++ b/llvm/

[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)

2025-09-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

This breaks the assembler test which enables both, but this
behavior is not really useful. Maybe it's better to just delete
the test.

---

Patch is 24.16 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/159234.diff


5 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp (+14-2) 
- (modified) llvm/test/MC/AMDGPU/wave_any.s (+62-60) 
- (added) llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s (+23) 
- (added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s (+13) 
- (added) llvm/test/MC/Disassembler/AMDGPU/gfx9_wave32_feature.txt (+13) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef 
CPU, StringRef FS) {
   MCSubtargetInfo *STI =
   createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS);
 
+  bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64);
+  bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32);
+
   // FIXME: We should error for the default target.
   if (STI->getFeatureBits().none())
 STI->ToggleFeature(AMDGPU::FeatureSouthernIslands);
 
-  if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) &&
-  !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) {
+  if (!IsWave64 && !IsWave32) {
 // If there is no default wave size it must be a generation before gfx10,
 // these have FeatureWavefrontSize64 in their definition already. For 
gfx10+
 // set wave32 as a default.
 STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI)
? AMDGPU::FeatureWavefrontSize32
: AMDGPU::FeatureWavefrontSize64);
+  } else if (IsWave64 && IsWave32) {
+// The wave size is mutually exclusive. If both somehow end up set, wave64
+// wins.
+//
+// FIXME: This should really just be an error.
+STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32);
   }
 
+  assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
+  STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
+ "wavesize features are mutually exclusive");
+
   return STI;
 }
 
diff --git a/llvm/test/MC/AMDGPU/wave_any.s b/llvm/test/MC/AMDGPU/wave_any.s
index 27502eff89bfc..15b235a92d68e 100644
--- a/llvm/test/MC/AMDGPU/wave_any.s
+++ b/llvm/test/MC/AMDGPU/wave_any.s
@@ -1,229 +1,231 @@
-// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 
-mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck 
--check-prefix=GFX10 %s
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py 
UTC_ARGS: --version 6
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 
-mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck 
--check-prefixes=GFX10 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 
-mattr=+wavefrontsize32,+wavefrontsize64 -filetype=null %s 2>&1 | FileCheck 
-implicit-check-not=error: --check-prefixes=GFX10-ERR %s
 
 v_cmp_ge_i32_e32 s0, v0
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v0; encoding: 
[0x00,0x00,0x0c,0x7d]
 
 v_cmp_ge_i32_e32 vcc_lo, s0, v1
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v1 ; encoding: [0x00,0x02,0x0c,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or 
mode
 
 v_cmp_ge_i32_e32 vcc, s0, v2
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v2 ; encoding: [0x00,0x04,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v2; encoding: 
[0x00,0x04,0x0c,0x7d]
 
 v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD
-// GFX10: v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; 
encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
+// GFX10-ERR: :[[@LINE-1]]:19: error: invalid operand for instruction
 
 v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD
 // GFX10: v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; 
encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
 
 v_cmp_class_f32_e32 vcc_lo, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or 
mode
 
 v_cmp_class_f32_e32 vcc, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10: v_cmp_class_f32_e32 vcc, s0, v0 ; encoding: 
[0x00,0x00,0x10,0x7d]
 
 v_cmp_class_f16_sdw

[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#159234** https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#159227** https://app.graphite.dev/github/pr/llvm/llvm-project/159227?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/159234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Add test where WeakCrossingSIV misses dependency due to overflow (NFC) (PR #158281)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj edited 
https://github.com/llvm/llvm-project/pull/158281
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits


@@ -815,8 +815,8 @@ for.end:  ; preds = 
%for.body
 ;; A[3*i - 2] = 1;
 ;; }
 ;;
-;; FIXME: DependencyAnalsysis currently detects no dependency between
-;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example,
+;; There exists dependency between `A[-6*i + INT64_MAX]` and `A[3*i - 2]`.
+;; For example,

kasuga-fj wrote:

It is intentional. I think it's non-trivial that the dependency exists between 
them.

https://github.com/llvm/llvm-project/pull/157086
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)

2025-09-16 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl updated 
https://github.com/llvm/llvm-project/pull/157754

>From 75a8df62df2ef7e8c02d7a76120e57e2dd1a1539 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" 
Date: Tue, 9 Sep 2025 17:33:38 -0400
Subject: [PATCH 1/2] [LoopUnroll] Fix block frequencies when no runtime

This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a runtime,
this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
  sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
  `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
  original estimated trip count (e.g., 10) divided by the unroll count
  (e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a runtime and complete unrolling, and there are
two associated tests this patch marks as XFAIL.  They will be
addressed in future patches that should land with this patch.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp  | 36 --
 .../peel.ll}  |  0
 .../branch-weights-freq/unroll-partial.ll | 68 +++
 .../LoopUnroll/runtime-loop-branchweight.ll   |  1 +
 .../LoopUnroll/unroll-heuristics-pgo.ll   |  1 +
 5 files changed, 100 insertions(+), 6 deletions(-)
 rename llvm/test/Transforms/LoopUnroll/{peel-branch-weights-freq.ll => 
branch-weights-freq/peel.ll} (100%)
 create mode 100644 
llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp 
b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 8a6c7789d1372..93c43396c54b6 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo 
*LI,
 
   const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
   const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
-  unsigned EstimatedLoopInvocationWeight = 0;
   std::optional OriginalTripCount =
-  llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight);
+  llvm::getLoopEstimatedTripCount(L);
 
   // Effectively "DCE" unrolled iterations that are beyond the max tripcount
   // and will never be executed.
@@ -1130,10 +1129,35 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, 
LoopInfo *LI,
 // We shouldn't try to use `L` anymore.
 L = nullptr;
   } else if (OriginalTripCount) {
-// Update the trip count. Note that the remainder has already logic
-// computing it in `UnrollRuntimeLoopRemainder`.
-setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count,
-  EstimatedLoopInvocationWeight);
+// Update metadata for the estimated trip count.
+//
+// If ULO.Runtime, UnrollRuntimeLoopRemainder handles branch weights for 
the
+// remainder loop it creates, and the unrolled loop's branch weights are
+// adjusted below.  Otherwise, if unrolled loop iterations' latches become
+// unconditional, branch weights are adjusted above.  Otherwise, the
+// original loop's branch weights are correct for the unrolled loop, so do
+// not adjust them.
+// FIXME: Actually handle such unconditional latches and ULO.Runtime.
+//
+// For example, consider what happens if the unroll count is 4 for a loop
+// with an estimated trip count of 10 when we do not create a remainder 
loop
+// and all iterations' latches remain conditional.  Each unrolled
+// iteration's latch still has the same probability of exiting the loop as
+// it did when in the original loop, and thus it should still have the same
+// branch weights.  Each unrolled iteration's non-zero probability of
+// exiting already appropriately reduces the probability of reaching the
+// remaining iterations just as it did in the original loop.  Trying to 
also
+// adjust the branch weights of the final unrolled iteration's latch (i.e.,
+// the backedge for the unrolled loop as a whole) to reflect its new trip
+// count of 3 will erroneously further reduce its block frequencies.
+// However, in case an analysis later needs to estimate the trip count of
+// the unrolled loop as a whole without considering the branch weights for
+// each unrolled iteration's latch within it, we store the new trip count 
as
+// separate metadata.
+unsigned NewTripCount = *OriginalTripCount / ULO.Count;
+if (!ULO.Runtime && *OriginalTripCount % ULO.Count)
+  NewTripCount += 1;
+setLoopEstima

[llvm-branch-commits] [llvm] [DA] Add test where WeakCrossingSIV misses dependency due to overflow (NFC) (PR #158281)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj updated 
https://github.com/llvm/llvm-project/pull/158281

>From a42c8002548c97d6c7755b1db821a5c682881efe Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga 
Date: Fri, 12 Sep 2025 11:06:39 +
Subject: [PATCH] [DA] Add test where WeakCrossingSIV misses dependency due to
 overflow

---
 .../DependenceAnalysis/WeakCrossingSIV.ll | 224 ++
 1 file changed, 224 insertions(+)

diff --git a/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll 
b/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
index cd044032e34f1..58dded965de27 100644
--- a/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -disable-output "-passes=print" -aa-pipeline=basic-aa 2>&1 
\
 ; RUN: | FileCheck %s
+; RUN: opt < %s -disable-output "-passes=print" -da-run-siv-routines-only 
2>&1 \
+; RUN: | FileCheck %s --check-prefix=CHECK-SIV-ONLY
 
 ; ModuleID = 'WeakCrossingSIV.bc'
 target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
@@ -26,6 +28,20 @@ define void @weakcrossing0(ptr %A, ptr %B, i64 %n) nounwind 
uwtable ssp {
 ; CHECK-NEXT:  Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 
%0, ptr %B.addr.02, align 4
 ; CHECK-NEXT:da analyze - none!
 ;
+; CHECK-SIV-ONLY-LABEL: 'weakcrossing0'
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %conv, ptr %arrayidx, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
%0 = load i32, ptr %arrayidx2, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - flow [0|<]!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - confused!
+; CHECK-SIV-ONLY-NEXT:  Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: 
%0 = load i32, ptr %arrayidx2, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - confused!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
 entry:
   %cmp1 = icmp eq i64 %n, 0
   br i1 %cmp1, label %for.end, label %for.body.preheader
@@ -79,6 +95,21 @@ define void @weakcrossing1(ptr %A, ptr %B, i64 %n) nounwind 
uwtable ssp {
 ; CHECK-NEXT:  Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 
%0, ptr %B.addr.02, align 4
 ; CHECK-NEXT:da analyze - none!
 ;
+; CHECK-SIV-ONLY-LABEL: 'weakcrossing1'
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %conv, ptr %arrayidx, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
%0 = load i32, ptr %arrayidx2, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - flow [<>] splitable!
+; CHECK-SIV-ONLY-NEXT:da analyze - split level = 1, iteration = 0!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - confused!
+; CHECK-SIV-ONLY-NEXT:  Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: 
%0 = load i32, ptr %arrayidx2, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - confused!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: 
store i32 %0, ptr %B.addr.02, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
 entry:
   %cmp1 = icmp eq i64 %n, 0
   br i1 %cmp1, label %for.end, label %for.body.preheader
@@ -130,6 +161,20 @@ define void @weakcrossing2(ptr %A, ptr %B, i64 %n) 
nounwind uwtable ssp {
 ; CHECK-NEXT:  Src: store i32 %0, ptr %B.addr.01, align 4 --> Dst: store i32 
%0, ptr %B.addr.01, align 4
 ; CHECK-NEXT:da analyze - none!
 ;
+; CHECK-SIV-ONLY-LABEL: 'weakcrossing2'
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %conv, ptr %arrayidx, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
%0 = load i32, ptr %arrayidx1, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: 
store i32 %0, ptr %B.addr.01, align 4
+; CHECK-SIV-ONLY-NEXT:da analyze - confused!
+; CHECK-SIV-ONLY-NEXT:  Src: %0 = load i32, ptr %arrayidx1, align 4 --> Dst: 
%0 = load i32

[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj updated 
https://github.com/llvm/llvm-project/pull/157086

>From 9f8794a071e152cf128dc03d9994c884fecf5d12 Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga 
Date: Fri, 5 Sep 2025 11:41:29 +
Subject: [PATCH 1/2] [DA] Add overflow check in ExactSIV

---
 llvm/lib/Analysis/DependenceAnalysis.cpp  | 14 +-
 llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp 
b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 0f77a1410e83b..6e576e866b310 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -1170,6 +1170,15 @@ const SCEVConstant 
*DependenceInfo::collectConstantUpperBound(const Loop *L,
   return nullptr;
 }
 
+/// Returns \p A - \p B if it guaranteed not to signed wrap. Otherwise returns
+/// nullptr. \p A and \p B must have the same integer type.
+static const SCEV *minusSCEVNoSignedOverflow(const SCEV *A, const SCEV *B,
+ ScalarEvolution &SE) {
+  if (SE.willNotOverflow(Instruction::Sub, /*Signed=*/true, A, B))
+return SE.getMinusSCEV(A, B);
+  return nullptr;
+}
+
 // testZIV -
 // When we have a pair of subscripts of the form [c1] and [c2],
 // where c1 and c2 are both loop invariant, we attack it using
@@ -1626,7 +1635,9 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff, 
const SCEV *DstCoeff,
   assert(0 < Level && Level <= CommonLevels && "Level out of range");
   Level--;
   Result.Consistent = false;
-  const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
+  const SCEV *Delta = minusSCEVNoSignedOverflow(DstConst, SrcConst, *SE);
+  if (!Delta)
+return false;
   LLVM_DEBUG(dbgs() << "\tDelta = " << *Delta << "\n");
   NewConstraint.setLine(SrcCoeff, SE->getNegativeSCEV(DstCoeff), Delta,
 CurLoop);
@@ -1716,6 +1727,7 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff, 
const SCEV *DstCoeff,
   // explore directions
   unsigned NewDirection = Dependence::DVEntry::NONE;
   APInt LowerDistance, UpperDistance;
+  // TODO: Overflow check may be needed.
   if (TA.sgt(TB)) {
 LowerDistance = (TY - TX) + (TA - TB) * TL;
 UpperDistance = (TY - TX) + (TA - TB) * TU;
diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll 
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index 2a809c32d7d21..e8e7cb11bb23e 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -841,7 +841,7 @@ define void @exact14(ptr %A) {
 ; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
0, ptr %idx.0, align 1
 ; CHECK-SIV-ONLY-NEXT:da analyze - none!
 ; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
-; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]!
 ; CHECK-SIV-ONLY-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
 ; CHECK-SIV-ONLY-NEXT:da analyze - none!
 ;

>From a34c3208d903906caf5b9435f1705f695a68277e Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga 
Date: Tue, 16 Sep 2025 13:12:16 +
Subject: [PATCH 2/2] fix comment

---
 llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll 
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index e8e7cb11bb23e..6f33e2314ffba 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -815,8 +815,8 @@ for.end:  ; preds = 
%for.body
 ;; A[3*i - 2] = 1;
 ;; }
 ;;
-;; FIXME: DependencyAnalsysis currently detects no dependency between
-;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example,
+;; There exists dependency between `A[-6*i + INT64_MAX]` and `A[3*i - 2]`.
+;; For example,
 ;;
 ;; | memory location| -6*i + INT64_MAX   | 3*i - 2
 ;; |||---

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj updated 
https://github.com/llvm/llvm-project/pull/157085

>From 4e43533b48aa613b05fb0753ac290809da8f28d1 Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga 
Date: Fri, 5 Sep 2025 11:32:54 +
Subject: [PATCH 1/2] [DA] Add test where ExactSIV misses dependency due to
 overflow (NFC)

---
 .../Analysis/DependenceAnalysis/ExactSIV.ll   | 120 ++
 1 file changed, 120 insertions(+)

diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll 
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index 0fe62991fede9..a16751397c487 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -807,3 +807,123 @@ for.body: ; preds 
= %entry, %for.body
 for.end:  ; preds = %for.body
   ret void
 }
+
+;; max_i = INT64_MAX/6  // 1537228672809129301
+;; for (long long i = 0; i <= max_i; i++) {
+;;   A[-6*i + INT64_MAX] = 0;
+;;   if (i)
+;; A[3*i - 2] = 1;
+;; }
+;;
+;; FIXME: There is a loop-carried dependency between
+;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example,
+;;
+;; | memory location| -6*i + INT64_MAX   | 3*i - 2
+;; |||---
+;; | A[1]   | i = max_i  | i = 1
+;; | A[4611686018427387901] | i = 768614336404564651 | i = max_i
+;;
+;; Actually,
+;;  * 1   = -6*max_i  + INT64_MAX = 3*1 - 2
+;;  * 4611686018427387901 = -6*768614336404564651 + INT64_MAX = 3*max_i - 2
+;;
+
+define void @exact14(ptr %A) {
+; CHECK-LABEL: 'exact14'
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr 
%idx.0, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+;
+; CHECK-SIV-ONLY-LABEL: 'exact14'
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
0, ptr %idx.0, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, 
%loop.latch ]
+  %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  store i8 0, ptr %idx.0
+  %cond.store = icmp ne i64 %i, 0
+  br i1 %cond.store, label %if.store, label %loop.latch
+
+if.store:
+  %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1
+  store i8 1, ptr %idx.1
+  br label %loop.latch
+
+loop.latch:
+  %i.inc = add nuw nsw i64 %i, 1
+  %subscript.0.next = add nsw i64 %subscript.0, -6
+  %subscript.1.next = add nsw i64 %subscript.1, 3
+  %exitcond = icmp sgt i64 %i.inc, 1537228672809129301
+  br i1 %exitcond, label %exit, label %loop.header
+
+exit:
+  ret void
+}
+
+;; A generalized version of @exact14.
+;;
+;; for (long long i = 0; i <= n / 6; i++) {
+;;   A[-6*i + n] = 0;
+;;   if (i)
+;; A[3*i - 2] = 1;
+;; }
+
+define void @exact15(ptr %A, i64 %n) {
+; CHECK-LABEL: 'exact15'
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr 
%idx.0, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - output [*|<]!
+; CHECK-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+;
+; CHECK-SIV-ONLY-LABEL: 'exact15'
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
0, ptr %idx.0, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
+entry:
+  %bound = sdiv i64 %n, 6
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop.header, label %exit
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %subscript.0 = phi i64 [ %n, %entry ], [ %subscript.0.next, %loop.latch ]
+  %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  store i8 0, ptr %idx.0
+  %cond.store = icmp 

[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-16 Thread Tobias Stadler via llvm-branch-commits




tobias-stadler wrote:

It would be good to change the testing methodology here. Currently all the 
dsymutil tests are blobs. We should be able to get remarks and .o files from 
llc. However, we need to link the .o files into a binary. Do you know of a way 
to do this with the available llvm tools?

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-16 Thread Tobias Stadler via llvm-branch-commits




tobias-stadler wrote:

Until we figure out a better testing methodology for dsymutil, I'd like to land 
this with the blob tests to unblock further work on the remarks infra.

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)

2025-09-16 Thread Sergei Barannikov via llvm-branch-commits


@@ -902,7 +908,9 @@ def memri34_pcrel : Operand { // memri, imm is a 
34-bit value.
 def PPCRegGxRCOperand : AsmOperandClass {
   let Name = "RegGxRC"; let PredicateMethod = "isRegNumber";
 }
-def ptr_rc_idx : Operand, PointerLikeRegClass<0> {
+def ptr_rc_idx : Operand,

s-barannikov wrote:

This one is still using double inheritance.

https://github.com/llvm/llvm-project/pull/158777
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c4a5c58 - Revert "AMDGPU/GlobalISel: Import D16 load patterns and add combines for them…"

2025-09-16 Thread via llvm-branch-commits

Author: Petar Avramovic
Date: 2025-09-11T12:48:18+02:00
New Revision: c4a5c5809defb97fd1b757694d71bb7aa0978544

URL: 
https://github.com/llvm/llvm-project/commit/c4a5c5809defb97fd1b757694d71bb7aa0978544
DIFF: 
https://github.com/llvm/llvm-project/commit/c4a5c5809defb97fd1b757694d71bb7aa0978544.diff

LOG: Revert "AMDGPU/GlobalISel: Import D16 load patterns and add combines for 
them…"

This reverts commit b97010865caa0439d4cedc45e9582e645816519f.

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUCombine.td
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
llvm/lib/Target/AMDGPU/SIInstructions.td
llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll

Removed: 
llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll



diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index e8b211f7866ad..b5dac95b57a2d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -71,12 +71,6 @@ def int_minmax_to_med3 : GICombineRule<
  [{ return matchIntMinMaxToMed3(*${min_or_max}, ${matchinfo}); }]),
   (apply [{ applyMed3(*${min_or_max}, ${matchinfo}); }])>;
 
-let Predicates = [Predicate<"Subtarget->d16PreservesUnusedBits()">] in
-def d16_load : GICombineRule<
-  (defs root:$bitcast),
-  (combine (G_BITCAST $dst, $src):$bitcast,
-   [{ return combineD16Load(*${bitcast} ); }])>;
-
 def fp_minmax_to_med3 : GICombineRule<
   (defs root:$min_or_max, med3_matchdata:$matchinfo),
   (match (wip_match_opcode G_FMAXNUM,
@@ -225,6 +219,5 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
-   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
-   d16_load]> {
+   cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
 }

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td 
b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
index bb4bf742fb861..0c112d1787c1a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
@@ -315,13 +315,6 @@ def : GINodeEquiv;
 def : GINodeEquiv;
 def : GINodeEquiv;
 
-def : GINodeEquiv;
-def : GINodeEquiv;
-def : GINodeEquiv;
-def : GINodeEquiv;
-def : GINodeEquiv;
-def : GINodeEquiv;
-
 def : GINodeEquiv;
 // G_AMDGPU_WHOLE_WAVE_FUNC_RETURN is simpler than AMDGPUwhole_wave_return,
 // so we don't mark it as equivalent.

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index fd604e1b19cd4..ee324a5e93f0f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,10 +89,6 @@ class AMDGPURegBankCombinerImpl : public Combiner {
 
   void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) 
const;
 
-  bool combineD16Load(MachineInstr &MI) const;
-  bool applyD16Load(unsigned D16Opc, MachineInstr &DstMI,
-MachineInstr *SmallLoad, Register ToOverwriteD16) const;
-
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -396,88 +392,6 @@ void 
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
   MI.eraseFromParent();
 }
 
-bool AMDGPURegBankCombinerImpl::combineD16Load(MachineInstr &MI) const {
-  Register Dst;
-  MachineInstr *Load, *SextLoad;
-  const int64_t CleanLo16 = 0x;
-  const int64_t CleanHi16 = 0x;
-
-  // Load lo
-  if (mi_match(MI.getOperand(1).getReg(), MRI,
-   m_GOr(m_GAnd(m_GBitcast(m_Reg(Dst)),
-m_Copy(m_SpecificICst(CleanLo16))),
- m_MInstr(Load {
-
-if (Load->getOpcode() == AMDGPU::G_ZEXTLOAD) {
-  const MachineMemOperand *MMO = *Load->memoperands_begin();
-  unsigned LoadSize = MMO->getSizeInBits().getValue();
-  if (LoadSize == 8)
-return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO_U8, MI, Load, Dst);
-  if (LoadSize == 16)
-return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO, MI, Load, Dst);
-  return false;
-}
-
-if (mi_match(
-Load, MRI,
-m_GAnd(m_MInstr(SextLoad), m_Copy(m_SpecificICst(CleanHi16) {
-  if (SextLoad->getOpcode() != AMDGPU::G_SEXTLOAD)
-return false;
-
-  const MachineMemOperand *MMO = *SextLoad->memoperands_begin();
-  if (MMO->getSizeInBits().getValue() != 8)
-return false;
-
-  return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO_I8, MI, SextLoad, Dst);
-}
-
-return false;
-  }
-
-  // Load hi
-  if (mi_match(MI.getOperand(1).getR

[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)

2025-09-16 Thread S. VenkataKeerthy via llvm-branch-commits


@@ -144,6 +145,73 @@ struct Embedding {
 using InstEmbeddingsMap = DenseMap;
 using BBEmbeddingsMap = DenseMap;
 
+/// Generic storage class for section-based vocabularies.
+/// VocabStorage provides a generic foundation for storing and accessing
+/// embeddings organized into sections.
+class VocabStorage {
+private:
+  /// Section-based storage
+  std::vector> Sections;
+
+  size_t TotalSize = 0;
+  unsigned Dimension = 0;
+
+public:
+  /// Default constructor creates empty storage (invalid state)
+  VocabStorage() : Sections(), TotalSize(0), Dimension(0) {}
+
+  /// Create a VocabStorage with pre-organized section data
+  VocabStorage(std::vector> &&SectionData);
+
+  VocabStorage(VocabStorage &&) = default;
+  VocabStorage &operator=(VocabStorage &&Other);
+
+  VocabStorage(const VocabStorage &) = delete;
+  VocabStorage &operator=(const VocabStorage &) = delete;
+
+  /// Get total number of entries across all sections
+  size_t size() const { return TotalSize; }
+
+  /// Get number of sections
+  unsigned getNumSections() const {
+return static_cast(Sections.size());
+  }
+
+  /// Section-based access: Storage[sectionId][localIndex]
+  const std::vector &operator[](unsigned SectionId) const {
+assert(SectionId < Sections.size() && "Invalid section ID");
+return Sections[SectionId];
+  }
+
+  /// Get vocabulary dimension
+  unsigned getDimension() const { return Dimension; }
+
+  /// Check if vocabulary is valid (has data)
+  bool isValid() const { return TotalSize > 0; }
+
+  /// Iterator support for section-based access

svkeerthy wrote:

Having this iterator makes it easy to iterate over the vocabulary (We iterate 
over the vocabulary in tool).

https://github.com/llvm/llvm-project/pull/158376
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c5b5583 - Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional up…"

2025-09-16 Thread via llvm-branch-commits

Author: Mingming Liu
Date: 2025-09-16T12:51:22-07:00
New Revision: c5b558385b956faf99348b3f0de91926061afcfb

URL: 
https://github.com/llvm/llvm-project/commit/c5b558385b956faf99348b3f0de91926061afcfb
DIFF: 
https://github.com/llvm/llvm-project/commit/c5b558385b956faf99348b3f0de91926061afcfb.diff

LOG: Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional 
up…"

This reverts commit 027bccc4692923d0f1ba3d4d970071f747c2255c.

Added: 


Modified: 
llvm/include/llvm/IR/GlobalObject.h
llvm/lib/CodeGen/CodeGenPrepare.cpp
llvm/lib/CodeGen/StaticDataAnnotator.cpp
llvm/lib/IR/Globals.cpp
llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
llvm/unittests/IR/CMakeLists.txt

Removed: 
llvm/unittests/IR/GlobalObjectTest.cpp



diff  --git a/llvm/include/llvm/IR/GlobalObject.h 
b/llvm/include/llvm/IR/GlobalObject.h
index e273387807cf6..08a02b42bdc14 100644
--- a/llvm/include/llvm/IR/GlobalObject.h
+++ b/llvm/include/llvm/IR/GlobalObject.h
@@ -121,10 +121,8 @@ class GlobalObject : public GlobalValue {
   /// appropriate default object file section.
   LLVM_ABI void setSection(StringRef S);
 
-  /// If existing prefix is 
diff erent from \p Prefix, set it to \p Prefix. If \p
-  /// Prefix is empty, the set clears the existing metadata. Returns true if
-  /// section prefix changed and false otherwise.
-  LLVM_ABI bool setSectionPrefix(StringRef Prefix);
+  /// Set the section prefix for this global object.
+  LLVM_ABI void setSectionPrefix(StringRef Prefix);
 
   /// Get the section prefix for this global object.
   LLVM_ABI std::optional getSectionPrefix() const;

diff  --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp 
b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 92d87681c9adc..9db4c9e5e2807 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -583,23 +583,23 @@ bool CodeGenPrepare::_run(Function &F) {
   // if requested.
   if (BBSectionsGuidedSectionPrefix && BBSectionsProfileReader &&
   BBSectionsProfileReader->isFunctionHot(F.getName())) {
-EverMadeChange |= F.setSectionPrefix("hot");
+F.setSectionPrefix("hot");
   } else if (ProfileGuidedSectionPrefix) {
 // The hot attribute overwrites profile count based hotness while profile
 // counts based hotness overwrite the cold attribute.
 // This is a conservative behabvior.
 if (F.hasFnAttribute(Attribute::Hot) ||
 PSI->isFunctionHotInCallGraph(&F, *BFI))
-  EverMadeChange |= F.setSectionPrefix("hot");
+  F.setSectionPrefix("hot");
 // If PSI shows this function is not hot, we will placed the function
 // into unlikely section if (1) PSI shows this is a cold function, or
 // (2) the function has a attribute of cold.
 else if (PSI->isFunctionColdInCallGraph(&F, *BFI) ||
  F.hasFnAttribute(Attribute::Cold))
-  EverMadeChange |= F.setSectionPrefix("unlikely");
+  F.setSectionPrefix("unlikely");
 else if (ProfileUnknownInSpecialSection && PSI->hasPartialSampleProfile() 
&&
  PSI->isFunctionHotnessUnknown(F))
-  EverMadeChange |= F.setSectionPrefix("unknown");
+  F.setSectionPrefix("unknown");
   }
 
   /// This optimization identifies DIV instructions that can be

diff  --git a/llvm/lib/CodeGen/StaticDataAnnotator.cpp 
b/llvm/lib/CodeGen/StaticDataAnnotator.cpp
index 53a9ab4dbda02..2d9b489a80acb 100644
--- a/llvm/lib/CodeGen/StaticDataAnnotator.cpp
+++ b/llvm/lib/CodeGen/StaticDataAnnotator.cpp
@@ -91,7 +91,8 @@ bool StaticDataAnnotator::runOnModule(Module &M) {
 if (SectionPrefix.empty())
   continue;
 
-Changed |= GV.setSectionPrefix(SectionPrefix);
+GV.setSectionPrefix(SectionPrefix);
+Changed = true;
   }
 
   return Changed;

diff  --git a/llvm/lib/IR/Globals.cpp b/llvm/lib/IR/Globals.cpp
index 1a7a5c5fbad6b..11d33e262fecb 100644
--- a/llvm/lib/IR/Globals.cpp
+++ b/llvm/lib/IR/Globals.cpp
@@ -288,22 +288,10 @@ void GlobalObject::setSection(StringRef S) {
   setGlobalObjectFlag(HasSectionHashEntryBit, !S.empty());
 }
 
-bool GlobalObject::setSectionPrefix(StringRef Prefix) {
-  StringRef ExistingPrefix;
-  if (std::optional MaybePrefix = getSectionPrefix())
-ExistingPrefix = *MaybePrefix;
-
-  if (ExistingPrefix == Prefix)
-return false;
-
-  if (Prefix.empty()) {
-setMetadata(LLVMContext::MD_section_prefix, nullptr);
-return true;
-  }
+void GlobalObject::setSectionPrefix(StringRef Prefix) {
   MDBuilder MDB(getContext());
   setMetadata(LLVMContext::MD_section_prefix,
   MDB.createGlobalObjectSectionPrefix(Prefix));
-  return true;
 }
 
 std::optional GlobalObject::getSectionPrefix() const {

diff  --git a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp 
b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
index c86092bd51eda..ecb2f2dbc552b 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProf

[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-16 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj updated 
https://github.com/llvm/llvm-project/pull/157085

>From 4e43533b48aa613b05fb0753ac290809da8f28d1 Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga 
Date: Fri, 5 Sep 2025 11:32:54 +
Subject: [PATCH 1/2] [DA] Add test where ExactSIV misses dependency due to
 overflow (NFC)

---
 .../Analysis/DependenceAnalysis/ExactSIV.ll   | 120 ++
 1 file changed, 120 insertions(+)

diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll 
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index 0fe62991fede9..a16751397c487 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -807,3 +807,123 @@ for.body: ; preds 
= %entry, %for.body
 for.end:  ; preds = %for.body
   ret void
 }
+
+;; max_i = INT64_MAX/6  // 1537228672809129301
+;; for (long long i = 0; i <= max_i; i++) {
+;;   A[-6*i + INT64_MAX] = 0;
+;;   if (i)
+;; A[3*i - 2] = 1;
+;; }
+;;
+;; FIXME: There is a loop-carried dependency between
+;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example,
+;;
+;; | memory location| -6*i + INT64_MAX   | 3*i - 2
+;; |||---
+;; | A[1]   | i = max_i  | i = 1
+;; | A[4611686018427387901] | i = 768614336404564651 | i = max_i
+;;
+;; Actually,
+;;  * 1   = -6*max_i  + INT64_MAX = 3*1 - 2
+;;  * 4611686018427387901 = -6*768614336404564651 + INT64_MAX = 3*max_i - 2
+;;
+
+define void @exact14(ptr %A) {
+; CHECK-LABEL: 'exact14'
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr 
%idx.0, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+;
+; CHECK-SIV-ONLY-LABEL: 'exact14'
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
0, ptr %idx.0, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, 
%loop.latch ]
+  %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  store i8 0, ptr %idx.0
+  %cond.store = icmp ne i64 %i, 0
+  br i1 %cond.store, label %if.store, label %loop.latch
+
+if.store:
+  %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1
+  store i8 1, ptr %idx.1
+  br label %loop.latch
+
+loop.latch:
+  %i.inc = add nuw nsw i64 %i, 1
+  %subscript.0.next = add nsw i64 %subscript.0, -6
+  %subscript.1.next = add nsw i64 %subscript.1, 3
+  %exitcond = icmp sgt i64 %i.inc, 1537228672809129301
+  br i1 %exitcond, label %exit, label %loop.header
+
+exit:
+  ret void
+}
+
+;; A generalized version of @exact14.
+;;
+;; for (long long i = 0; i <= n / 6; i++) {
+;;   A[-6*i + n] = 0;
+;;   if (i)
+;; A[3*i - 2] = 1;
+;; }
+
+define void @exact15(ptr %A, i64 %n) {
+; CHECK-LABEL: 'exact15'
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr 
%idx.0, align 1
+; CHECK-NEXT:da analyze - none!
+; CHECK-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - output [*|<]!
+; CHECK-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr 
%idx.1, align 1
+; CHECK-NEXT:da analyze - none!
+;
+; CHECK-SIV-ONLY-LABEL: 'exact15'
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
0, ptr %idx.0, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]!
+; CHECK-SIV-ONLY-NEXT:  Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 
1, ptr %idx.1, align 1
+; CHECK-SIV-ONLY-NEXT:da analyze - none!
+;
+entry:
+  %bound = sdiv i64 %n, 6
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop.header, label %exit
+
+loop.header:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ]
+  %subscript.0 = phi i64 [ %n, %entry ], [ %subscript.0.next, %loop.latch ]
+  %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ]
+  %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0
+  store i8 0, ptr %idx.0
+  %cond.store = icmp 

[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-16 Thread Abid Qadeer via llvm-branch-commits

https://github.com/abidh approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/156837
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)

2025-09-16 Thread Sergei Barannikov via llvm-branch-commits


@@ -868,10 +868,16 @@ def crbitm: Operand {
 def PPCRegGxRCNoR0Operand : AsmOperandClass {
   let Name = "RegGxRCNoR0"; let PredicateMethod = "isRegNumber";
 }
-def ptr_rc_nor0 : Operand, PointerLikeRegClass<1> {
+
+def ptr_rc_nor0 : RegClassByHwMode<
+  [PPC32, PPC64],
+  [GPRC_NOR0, G8RC_NOX0]>;
+
+def PtrOpNoR0 : RegisterOperand {

s-barannikov wrote:

Maybe swap the names of RegClassByHwMode / RegisterOperand to reduce diff?

https://github.com/llvm/llvm-project/pull/158777
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)

2025-09-16 Thread David Blaikie via llvm-branch-commits

https://github.com/dwblaikie edited 
https://github.com/llvm/llvm-project/pull/159209
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)

2025-09-16 Thread David Blaikie via llvm-branch-commits

https://github.com/dwblaikie edited 
https://github.com/llvm/llvm-project/pull/159209
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)

2025-09-16 Thread David Blaikie via llvm-branch-commits

https://github.com/dwblaikie ready_for_review 
https://github.com/llvm/llvm-project/pull/159209
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [NFC][CodeGe][CFI] Pre-commit transparent_union tests (PR #158192)

2025-09-16 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka edited 
https://github.com/llvm/llvm-project/pull/158192
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)

2025-09-16 Thread S. VenkataKeerthy via llvm-branch-commits

https://github.com/svkeerthy updated 
https://github.com/llvm/llvm-project/pull/158376

>From 763b16710251eb055b0b192051069cbc838dd7d4 Mon Sep 17 00:00:00 2001
From: svkeerthy 
Date: Fri, 12 Sep 2025 22:06:44 +
Subject: [PATCH] VocabStorage

---
 llvm/include/llvm/Analysis/IR2Vec.h   | 145 +++--
 llvm/lib/Analysis/IR2Vec.cpp  | 215 +
 llvm/lib/Analysis/InlineAdvisor.cpp   |   2 +-
 llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp|   6 +-
 .../FunctionPropertiesAnalysisTest.cpp|  13 +-
 llvm/unittests/Analysis/IR2VecTest.cpp| 294 +++---
 6 files changed, 541 insertions(+), 134 deletions(-)

diff --git a/llvm/include/llvm/Analysis/IR2Vec.h 
b/llvm/include/llvm/Analysis/IR2Vec.h
index 4a6db5d895a62..1d3f87e47d269 100644
--- a/llvm/include/llvm/Analysis/IR2Vec.h
+++ b/llvm/include/llvm/Analysis/IR2Vec.h
@@ -45,6 +45,7 @@
 #include "llvm/Support/JSON.h"
 #include 
 #include 
+#include 
 
 namespace llvm {
 
@@ -144,6 +145,73 @@ struct Embedding {
 using InstEmbeddingsMap = DenseMap;
 using BBEmbeddingsMap = DenseMap;
 
+/// Generic storage class for section-based vocabularies.
+/// VocabStorage provides a generic foundation for storing and accessing
+/// embeddings organized into sections.
+class VocabStorage {
+private:
+  /// Section-based storage
+  std::vector> Sections;
+
+  const size_t TotalSize = 0;
+  const unsigned Dimension = 0;
+
+public:
+  /// Default constructor creates empty storage (invalid state)
+  VocabStorage() : Sections(), TotalSize(0), Dimension(0) {}
+
+  /// Create a VocabStorage with pre-organized section data
+  VocabStorage(std::vector> &&SectionData);
+
+  VocabStorage(VocabStorage &&) = default;
+  VocabStorage &operator=(VocabStorage &&) = delete;
+
+  VocabStorage(const VocabStorage &) = delete;
+  VocabStorage &operator=(const VocabStorage &) = delete;
+
+  /// Get total number of entries across all sections
+  size_t size() const { return TotalSize; }
+
+  /// Get number of sections
+  unsigned getNumSections() const {
+return static_cast(Sections.size());
+  }
+
+  /// Section-based access: Storage[sectionId][localIndex]
+  const std::vector &operator[](unsigned SectionId) const {
+assert(SectionId < Sections.size() && "Invalid section ID");
+return Sections[SectionId];
+  }
+
+  /// Get vocabulary dimension
+  unsigned getDimension() const { return Dimension; }
+
+  /// Check if vocabulary is valid (has data)
+  bool isValid() const { return TotalSize > 0; }
+
+  /// Iterator support for section-based access
+  class const_iterator {
+const VocabStorage *Storage;
+unsigned SectionId = 0;
+size_t LocalIndex = 0;
+
+  public:
+const_iterator(const VocabStorage *Storage, unsigned SectionId,
+   size_t LocalIndex)
+: Storage(Storage), SectionId(SectionId), LocalIndex(LocalIndex) {}
+
+LLVM_ABI const Embedding &operator*() const;
+LLVM_ABI const_iterator &operator++();
+LLVM_ABI bool operator==(const const_iterator &Other) const;
+LLVM_ABI bool operator!=(const const_iterator &Other) const;
+  };
+
+  const_iterator begin() const { return const_iterator(this, 0, 0); }
+  const_iterator end() const {
+return const_iterator(this, getNumSections(), 0);
+  }
+};
+
 /// Class for storing and accessing the IR2Vec vocabulary.
 /// The Vocabulary class manages seed embeddings for LLVM IR entities. The
 /// seed embeddings are the initial learned representations of the entities
@@ -164,7 +232,7 @@ using BBEmbeddingsMap = DenseMap;
 class Vocabulary {
   friend class llvm::IR2VecVocabAnalysis;
 
-  // Vocabulary Slot Layout:
+  // Vocabulary Layout:
   // ++--+
   // | Entity Type| Index Range  |
   // ++--+
@@ -175,8 +243,16 @@ class Vocabulary {
   // Note: "Similar" LLVM Types are grouped/canonicalized together.
   //   Operands include Comparison predicates (ICmp/FCmp).
   //   This can be extended to include other specializations in future.
-  using VocabVector = std::vector;
-  VocabVector Vocab;
+  enum class Section : unsigned {
+Opcodes = 0,
+CanonicalTypes = 1,
+Operands = 2,
+Predicates = 3,
+MaxSections
+  };
+
+  // Use section-based storage for better organization and efficiency
+  VocabStorage Storage;
 
   static constexpr unsigned NumICmpPredicates =
   static_cast(CmpInst::LAST_ICMP_PREDICATE) -
@@ -228,9 +304,18 @@ class Vocabulary {
   NumICmpPredicates + NumFCmpPredicates;
 
   Vocabulary() = default;
-  LLVM_ABI Vocabulary(VocabVector &&Vocab) : Vocab(std::move(Vocab)) {}
+  LLVM_ABI Vocabulary(VocabStorage &&Storage) : Storage(std::move(Storage)) {}
+
+  Vocabulary(const Vocabulary &) = delete;
+  Vocabulary &operator=(const Vocabulary &) = delete;
+
+  Vocabulary(Vocabulary &&) = default;
+  Vocabulary &op

[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-16 Thread Jon Roelofs via llvm-branch-commits




jroelofs wrote:

likewise. I’ll leave this “unresolved” so it doesn’t get hidden

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/159234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156610

>From bdd9ab29d7c0c57edc5b8848c7e4be5626b5f57e Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 08:36:34 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
---
 .../OpenMP/DoConcurrentConversion.cpp | 117 ++
 .../DoConcurrent/reduce_device.mlir   |  53 
 2 files changed, 121 insertions(+), 49 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir

diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d00a4fdd2cf2e..6e308499100fa 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
   for (mlir::Value local : loop.getLocalVars())
 liveIns.push_back(local);
+
+  for (mlir::Value reduce : loop.getReduceVars())
+liveIns.push_back(reduce);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -319,7 +322,7 @@ class DoConcurrentConversion
   targetOp =
   genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns,
   targetClauseOps, loopNestClauseOps, liveInShapeInfoMap);
-  genTeamsOp(doLoop.getLoc(), rewriter);
+  genTeamsOp(rewriter, loop, mapper);
 }
 
 mlir::omp::ParallelOp parallelOp =
@@ -492,46 +495,7 @@ class DoConcurrentConversion
 if (!mapToDevice)
   genPrivatizers(rewriter, mapper, loop, wsloopClauseOps);
 
-if (!loop.getReduceVars().empty()) {
-  for (auto [op, byRef, sym, arg] : llvm::zip_equal(
-   loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
-   loop.getReduceSymsAttr().getAsRange(),
-   loop.getRegionReduceArgs())) {
-auto firReducer = moduleSymbolTable.lookup(
-sym.getLeafReference());
-
-mlir::OpBuilder::InsertionGuard guard(rewriter);
-rewriter.setInsertionPointAfter(firReducer);
-std::string ompReducerName = sym.getLeafReference().str() + ".omp";
-
-auto ompReducer =
-moduleSymbolTable.lookup(
-rewriter.getStringAttr(ompReducerName));
-
-if (!ompReducer) {
-  ompReducer = mlir::omp::DeclareReductionOp::create(
-  rewriter, firReducer.getLoc(), ompReducerName,
-  firReducer.getTypeAttr().getValue());
-
-  cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(),
-  ompReducer.getAllocRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(),
-  ompReducer.getInitializerRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(),
-  ompReducer.getReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(),
-  ompReducer.getAtomicReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(),
-  ompReducer.getCleanupRegion());
-  moduleSymbolTable.insert(ompReducer);
-}
-
-wsloopClauseOps.reductionVars.push_back(op);
-wsloopClauseOps.reductionByref.push_back(byRef);
-wsloopClauseOps.reductionSyms.push_back(
-mlir::SymbolRefAttr::get(ompReducer));
-  }
-}
+genReductions(rewriter, mapper, loop, wsloopClauseOps);
 
 auto wsloopOp =
 mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps);
@@ -553,8 +517,6 @@ class DoConcurrentConversion
 
 rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back());
 mlir::omp::YieldOp::create(rewriter, loop->getLoc());
-loop->getParentOfType().print(
-llvm::errs(), mlir::OpPrintingFlags().assumeVerified());
 
 return {loopNestOp, wsloopOp};
   }
@@ -778,15 +740,26 @@ class DoConcurrentConversion
 liveInName, shape);
   }
 
-  mlir::omp::TeamsOp
-  genTeamsOp(mlir::Location loc,
- mlir::ConversionPatternRewriter &rewriter) const {
-auto teamsOp = rewriter.create(
-loc, /*clauses=*/mlir::omp::TeamsOperands{});
+  mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter,
+fir::DoConcurrentLoopOp loop,
+mlir::IRMapping &mapper) const {
+mlir::omp::TeamsOperands teamsOps;
+genReductions(rewriter, mapper, loop, teamsOps);
+
+mlir::Location loc = loop.getLoc();
+aut

[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156837

>From c5dde7cbcece549d0996a6671d1ae1b53b9cd63b Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH 1/3] [flang][OpenMP] Support multi-block reduction combiner 
 regions on the GPU

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +
 .../omptarget-multi-block-reduction.mlir  | 85 +++
 2 files changed, 88 insertions(+)
 create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 220eee3cb8b087..b516c3c3f4efee 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3507,6 +3507,8 @@ Expected 
OpenMPIRBuilder::createReductionFunction(
 return AfterIP.takeError();
   if (!Builder.GetInsertBlock())
 return ReductionFunc;
+
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHSPtr);
 }
   }
@@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createReductionsGPU(
   RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
   if (!AfterIP)
 return AfterIP.takeError();
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHS, false);
 }
   }
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir 
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 00..aaf06d2d0e0c22
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = 
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+  llvm.func @bar() {}
+  llvm.func @baz() {}
+
+  omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+  } init {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+  } combiner {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+  ^bb3:  // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+  }
+  llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> 
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) 
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+  %c1_2 = llvm.mlir.constant(1 : i32) : i32
+  %c10 = llvm.mlir.constant(10 : i32) : i32
+  omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 
: !llvm.ptr) {
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step 
(%c1_2) {
+omp.yield
+  }
+} {omp.composite}
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+omp.terminator
+  }
+  omp.terminator
+}
+llvm.return
+  }
+}
+
+// CHECK:  call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME:   ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK:   .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK:   [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK:   [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef 
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK:   call voi

[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes

Should not do anything.

---

Patch is 85.72 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/158823.diff


10 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+5-3) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
(+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir 
(+115-115) 
- (modified) 
llvm/test/CodeGen/AMDGPU/inflate-reg-class-vgpr-mfma-to-av-with-load-source.mir 
(+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll (+12-12) 
- (modified) 
llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll (+8-8) 
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir 
(+4-4) 
- (modified) 
llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir 
(+6-6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir 
(+14-14) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll (+1-1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 7eccaafefc893..4e1876db41d3d 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -1131,7 +1131,8 @@ def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v2
   let Size = 32;
 }
 
-def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, 
SReg_64)> {
+def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
+(add VReg_64, SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1139,7 +1140,7 @@ def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 
32, (add VReg_64, SReg_6
 }
 
 def VS_64_Align2 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
-   (add VReg_64_Align2, SReg_64)> {
+   (add VReg_64_Align2, SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1153,7 +1154,8 @@ def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 
32, (add VGPR_32, AGPR_3
   let Size = 32;
 }
 
-def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add 
VReg_64_Lo256_Align2, SReg_64)> {
+def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
+  (add VReg_64_Lo256_Align2, 
SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index f9d11cb23fa4e..2cde060529bec 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
 define double @test_multiple_register_outputs_mixed() #0 {
   ; CHECK-LABEL: name: test_multiple_register_outputs_mixed
   ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3670026 /* regdef:VReg_64 
*/, def %9
+  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3735562 /* regdef:VReg_64 
*/, def %9
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY %8
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s64) = COPY %9
   ; CHECK-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = 
G_UNMERGE_VALUES [[COPY1]](s64)
diff --git a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir 
b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
index 04cb0b14679bb..029aa3957d32b 100644
--- a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+++ b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
@@ -20,13 +20,13 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64 = COPY [[COPY]]
 ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64 = COPY [[COPY1]]
-; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* 
reguse:AReg_64 */, [[COPY2]]
+; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* 
reguse:AReg_64 */, [[COPY2]]
 ; CHECK-NEXT: SI_RETURN
 %0:vgpr_32 = COPY $vgpr0
 %1:vgpr_32 = COPY $vgpr1
 undef %2.sub0:areg_64 = COPY %0
 %2.sub1:areg_64 = COPY %1
-INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, 
killed %2
+INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, 
killed %2
 SI_RETURN
 
 ...
@@ -45,13 +45,13 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = C

[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm commented:

This does probably add inline asm support for this usage 

https://github.com/llvm/llvm-project/pull/158823
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/158823

Should not do anything.

>From 2e363048d0f6ec969e6824bdaa062fee3907d853 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Tue, 16 Sep 2025 00:28:29 -0700
Subject: [PATCH] [AMDGPU] Add aperture classes to VS_64

Should not do anything.
---
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |   8 +-
 .../GlobalISel/irtranslator-inline-asm.ll |   2 +-
 .../coalesce-copy-to-agpr-to-av-registers.mir | 230 +-
 ...class-vgpr-mfma-to-av-with-load-source.mir |  12 +-
 llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll   |  24 +-
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  16 +-
 .../rewrite-vgpr-mfma-to-agpr-copy-from.mir   |   8 +-
 ...gpr-mfma-to-agpr-subreg-insert-extract.mir |  12 +-
 ...te-vgpr-mfma-to-agpr-subreg-src2-chain.mir |  28 +--
 .../CodeGen/AMDGPU/spill-vector-superclass.ll |   2 +-
 10 files changed, 172 insertions(+), 170 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 7eccaafefc893..4e1876db41d3d 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -1131,7 +1131,8 @@ def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v2
   let Size = 32;
 }
 
-def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, 
SReg_64)> {
+def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
+(add VReg_64, SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1139,7 +1140,7 @@ def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 
32, (add VReg_64, SReg_6
 }
 
 def VS_64_Align2 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
-   (add VReg_64_Align2, SReg_64)> {
+   (add VReg_64_Align2, SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
@@ -1153,7 +1154,8 @@ def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 
32, (add VGPR_32, AGPR_3
   let Size = 32;
 }
 
-def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add 
VReg_64_Lo256_Align2, SReg_64)> {
+def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32,
+  (add VReg_64_Lo256_Align2, 
SReg_64_Encodable)> {
   let isAllocatable = 0;
   let HasVGPR = 1;
   let HasSGPR = 1;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
index f9d11cb23fa4e..2cde060529bec 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
@@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 {
 define double @test_multiple_register_outputs_mixed() #0 {
   ; CHECK-LABEL: name: test_multiple_register_outputs_mixed
   ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3670026 /* regdef:VReg_64 
*/, def %9
+  ; CHECK-NEXT:   INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* 
attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3735562 /* regdef:VReg_64 
*/, def %9
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY %8
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s64) = COPY %9
   ; CHECK-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = 
G_UNMERGE_VALUES [[COPY1]](s64)
diff --git a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir 
b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
index 04cb0b14679bb..029aa3957d32b 100644
--- a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+++ b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
@@ -20,13 +20,13 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64 = COPY [[COPY]]
 ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64 = COPY [[COPY1]]
-; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* 
reguse:AReg_64 */, [[COPY2]]
+; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* 
reguse:AReg_64 */, [[COPY2]]
 ; CHECK-NEXT: SI_RETURN
 %0:vgpr_32 = COPY $vgpr0
 %1:vgpr_32 = COPY $vgpr1
 undef %2.sub0:areg_64 = COPY %0
 %2.sub1:areg_64 = COPY %1
-INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, 
killed %2
+INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, 
killed %2
 SI_RETURN
 
 ...
@@ -45,13 +45,13 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64_align2 = COPY [[COPY]]
 ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64_align2 = COPY [[COPY1]]
-; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4

[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#158823** https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#158754** https://app.graphite.dev/github/pr/llvm/llvm-project/158754?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#158725** https://app.graphite.dev/github/pr/llvm/llvm-project/158725?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/158823
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158777

>From 0821bf6b56fbcf9aebc2eea8b4e1af02f9f2d1f9 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 5 Sep 2025 18:03:59 +0900
Subject: [PATCH 1/2] PPC: Replace PointerLikeRegClass with RegClassByHwMode

---
 .../PowerPC/Disassembler/PPCDisassembler.cpp  |  3 --
 llvm/lib/Target/PowerPC/PPC.td|  6 
 llvm/lib/Target/PowerPC/PPCInstrInfo.cpp  | 28 ++-
 llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 10 +--
 4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp 
b/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
index 47586c417cfe3..70e619cc22b19 100644
--- a/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
+++ b/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
@@ -185,9 +185,6 @@ DecodeG8RC_NOX0RegisterClass(MCInst &Inst, uint64_t RegNo, 
uint64_t Address,
   return decodeRegisterClass(Inst, RegNo, XRegsNoX0);
 }
 
-#define DecodePointerLikeRegClass0 DecodeGPRCRegisterClass
-#define DecodePointerLikeRegClass1 DecodeGPRC_NOR0RegisterClass
-
 static DecodeStatus DecodeSPERCRegisterClass(MCInst &Inst, uint64_t RegNo,
  uint64_t Address,
  const MCDisassembler *Decoder) {
diff --git a/llvm/lib/Target/PowerPC/PPC.td b/llvm/lib/Target/PowerPC/PPC.td
index 386d0f65d1ed1..d491e88b66ad8 100644
--- a/llvm/lib/Target/PowerPC/PPC.td
+++ b/llvm/lib/Target/PowerPC/PPC.td
@@ -394,6 +394,12 @@ def NotAIX : Predicate<"!Subtarget->isAIXABI()">;
 def IsISAFuture : Predicate<"Subtarget->isISAFuture()">;
 def IsNotISAFuture : Predicate<"!Subtarget->isISAFuture()">;
 
+//===--===//
+// HwModes
+//===--===//
+
+defvar PPC32 = DefaultMode;
+def PPC64 : HwMode<[In64BitMode]>;
 
 // Since new processors generally contain a superset of features of those that
 // came before them, the idea is to make implementations of new processors
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp 
b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index db066bc4b7bdd..55e38bcf4afc9 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -2142,33 +2142,23 @@ bool PPCInstrInfo::onlyFoldImmediate(MachineInstr 
&UseMI, MachineInstr &DefMI,
   assert(UseIdx < UseMI.getNumOperands() && "Cannot find Reg in UseMI");
   assert(UseIdx < UseMCID.getNumOperands() && "No operand description for 
Reg");
 
-  const MCOperandInfo *UseInfo = &UseMCID.operands()[UseIdx];
-
   // We can fold the zero if this register requires a GPRC_NOR0/G8RC_NOX0
   // register (which might also be specified as a pointer class kind).
-  if (UseInfo->isLookupPtrRegClass()) {
-if (UseInfo->RegClass /* Kind */ != 1)
-  return false;
-  } else {
-if (UseInfo->RegClass != PPC::GPRC_NOR0RegClassID &&
-UseInfo->RegClass != PPC::G8RC_NOX0RegClassID)
-  return false;
-  }
+
+  const MCOperandInfo &UseInfo = UseMCID.operands()[UseIdx];
+  int16_t RegClass = getOpRegClassID(UseInfo);
+  if (UseInfo.RegClass != PPC::GPRC_NOR0RegClassID &&
+  UseInfo.RegClass != PPC::G8RC_NOX0RegClassID)
+return false;
 
   // Make sure this is not tied to an output register (or otherwise
   // constrained). This is true for ST?UX registers, for example, which
   // are tied to their output registers.
-  if (UseInfo->Constraints != 0)
+  if (UseInfo.Constraints != 0)
 return false;
 
-  MCRegister ZeroReg;
-  if (UseInfo->isLookupPtrRegClass()) {
-bool isPPC64 = Subtarget.isPPC64();
-ZeroReg = isPPC64 ? PPC::ZERO8 : PPC::ZERO;
-  } else {
-ZeroReg = UseInfo->RegClass == PPC::G8RC_NOX0RegClassID ?
-  PPC::ZERO8 : PPC::ZERO;
-  }
+  MCRegister ZeroReg =
+  RegClass == PPC::G8RC_NOX0RegClassID ? PPC::ZERO8 : PPC::ZERO;
 
   LLVM_DEBUG(dbgs() << "Folded immediate zero for: ");
   LLVM_DEBUG(UseMI.dump());
diff --git a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td 
b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
index 8b690b7b833b3..adda91786d19c 100644
--- a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
+++ b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
@@ -868,7 +868,11 @@ def crbitm: Operand {
 def PPCRegGxRCNoR0Operand : AsmOperandClass {
   let Name = "RegGxRCNoR0"; let PredicateMethod = "isRegNumber";
 }
-def ptr_rc_nor0 : Operand, PointerLikeRegClass<1> {
+
+def ptr_rc_nor0 : Operand,
+  RegClassByHwMode<
+[PPC32, PPC64],
+[GPRC_NOR0, G8RC_NOX0]> {
   let ParserMatchClass = PPCRegGxRCNoR0Operand;
 }
 
@@ -902,7 +906,9 @@ def memri34_pcrel : Operand { // memri, imm is a 
34-bit value.
 def PPCRegGxRCOperand : AsmOperandClass {
   let Name = "RegGxRC"; let PredicateMethod = "isRegNumber";
 }
-def ptr_rc_idx : Operand, PointerLikeRegClass<0> {
+def ptr_rc_idx : Operand,

[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/158777
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/158823
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/158823
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix codegen to emit COPY instead of S_MOV_B64 for aperture regs (PR #158754)

2025-09-16 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/158754
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [PAC][Driver] Support ptrauth flags only on ARM64 Darwin or with pauthtest ABI (PR #113152)

2025-09-16 Thread Daniil Kovalev via llvm-branch-commits

https://github.com/kovdan01 updated 
https://github.com/llvm/llvm-project/pull/113152

>From 64489c9dd71e9ff5b0b05130e73b8e7d2ba1fde7 Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Mon, 21 Oct 2024 12:18:56 +0300
Subject: [PATCH 1/8] [PAC][Driver] Support ptrauth flags only on ARM64 Darwin

Most ptrauth flags are ABI-affecting, so they should not be exposed to
end users. Under certain conditions, some ptrauth driver flags are intended
to be used for ARM64 Darwin, so allow them in this case.

Leave `-faarch64-jump-table-hardening` available for all AArch64 targets
since it's not ABI-affecting.
---
 clang/lib/Driver/ToolChains/Clang.cpp |  28 -
 clang/lib/Driver/ToolChains/Linux.cpp |  53 ++---
 clang/test/Driver/aarch64-ptrauth.c   | 164 --
 3 files changed, 135 insertions(+), 110 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index f9e6031522134..08ee45856e5e1 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1662,34 +1662,6 @@ void Clang::AddAArch64TargetArgs(const ArgList &Args,
 
   AddUnalignedAccessWarning(CmdArgs);
 
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_intrinsics,
-options::OPT_fno_ptrauth_intrinsics);
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_calls,
-options::OPT_fno_ptrauth_calls);
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_returns,
-options::OPT_fno_ptrauth_returns);
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_auth_traps,
-options::OPT_fno_ptrauth_auth_traps);
-  Args.addOptInFlag(
-  CmdArgs, options::OPT_fptrauth_vtable_pointer_address_discrimination,
-  options::OPT_fno_ptrauth_vtable_pointer_address_discrimination);
-  Args.addOptInFlag(
-  CmdArgs, options::OPT_fptrauth_vtable_pointer_type_discrimination,
-  options::OPT_fno_ptrauth_vtable_pointer_type_discrimination);
-  Args.addOptInFlag(
-  CmdArgs, options::OPT_fptrauth_type_info_vtable_pointer_discrimination,
-  options::OPT_fno_ptrauth_type_info_vtable_pointer_discrimination);
-  Args.addOptInFlag(
-  CmdArgs, options::OPT_fptrauth_function_pointer_type_discrimination,
-  options::OPT_fno_ptrauth_function_pointer_type_discrimination);
-
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_indirect_gotos,
-options::OPT_fno_ptrauth_indirect_gotos);
-  Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_init_fini,
-options::OPT_fno_ptrauth_init_fini);
-  Args.addOptInFlag(CmdArgs,
-options::OPT_fptrauth_init_fini_address_discrimination,
-options::OPT_fno_ptrauth_init_fini_address_discrimination);
   Args.addOptInFlag(CmdArgs, options::OPT_faarch64_jump_table_hardening,
 options::OPT_fno_aarch64_jump_table_hardening);
 
diff --git a/clang/lib/Driver/ToolChains/Linux.cpp 
b/clang/lib/Driver/ToolChains/Linux.cpp
index 04a8ad1d165d4..1e93b3aafbf47 100644
--- a/clang/lib/Driver/ToolChains/Linux.cpp
+++ b/clang/lib/Driver/ToolChains/Linux.cpp
@@ -484,49 +484,16 @@ std::string Linux::ComputeEffectiveClangTriple(const 
llvm::opt::ArgList &Args,
 // options represent the default signing schema.
 static void handlePAuthABI(const Driver &D, const ArgList &DriverArgs,
ArgStringList &CC1Args) {
-  if (!DriverArgs.hasArg(options::OPT_fptrauth_intrinsics,
- options::OPT_fno_ptrauth_intrinsics))
-CC1Args.push_back("-fptrauth-intrinsics");
-
-  if (!DriverArgs.hasArg(options::OPT_fptrauth_calls,
- options::OPT_fno_ptrauth_calls))
-CC1Args.push_back("-fptrauth-calls");
-
-  if (!DriverArgs.hasArg(options::OPT_fptrauth_returns,
- options::OPT_fno_ptrauth_returns))
-CC1Args.push_back("-fptrauth-returns");
-
-  if (!DriverArgs.hasArg(options::OPT_fptrauth_auth_traps,
- options::OPT_fno_ptrauth_auth_traps))
-CC1Args.push_back("-fptrauth-auth-traps");
-
-  if (!DriverArgs.hasArg(
-  options::OPT_fptrauth_vtable_pointer_address_discrimination,
-  options::OPT_fno_ptrauth_vtable_pointer_address_discrimination))
-CC1Args.push_back("-fptrauth-vtable-pointer-address-discrimination");
-
-  if (!DriverArgs.hasArg(
-  options::OPT_fptrauth_vtable_pointer_type_discrimination,
-  options::OPT_fno_ptrauth_vtable_pointer_type_discrimination))
-CC1Args.push_back("-fptrauth-vtable-pointer-type-discrimination");
-
-  if (!DriverArgs.hasArg(
-  options::OPT_fptrauth_type_info_vtable_pointer_discrimination,
-  options::OPT_fno_ptrauth_type_info_vtable_pointer_discrimination))
-CC1Args.push_back("-fptrauth-type-info-vtable-pointer-discrimination");
-
-  if (!DriverArgs.hasArg(options::OPT_fptrauth_indirect_gotos,
- options::OPT_fno_ptrauth_indirect_gotos))
-CC1Args.

[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/157638

>From 509959568c433d7745ca1f5387edd7654b3e1c2a Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 05:54:00 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device

Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.
---
 .../include/flang/Optimizer/Dialect/FIROps.td |  12 ++
 .../OpenMP/DoConcurrentConversion.cpp | 192 +++---
 .../Transforms/DoConcurrent/local_device.mlir |  49 +
 3 files changed, 175 insertions(+), 78 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir

diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td 
b/flang/include/flang/Optimizer/Dialect/FIROps.td
index bc971e8fd6600..fc6eedc6ed4c6 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
   return getReduceVars().size();
 }
 
+unsigned getInductionVarsStart() {
+  return 0;
+}
+
+unsigned getLocalOperandsStart() {
+  return getNumInductionVars();
+}
+
+unsigned getReduceOperandsStart() {
+  return getLocalOperandsStart() + getNumLocalOperands();
+}
+
 mlir::Block::BlockArgListType getInductionVars() {
   return getBody()->getArguments().slice(0, getNumInductionVars());
 }
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 6c71924000842..d00a4fdd2cf2e 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
 liveIns.push_back(operand->get());
   });
+
+  for (mlir::Value local : loop.getLocalVars())
+liveIns.push_back(local);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -298,8 +301,7 @@ class DoConcurrentConversion
   .getIsTargetDevice();
 
   mlir::omp::TargetOperands targetClauseOps;
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps,
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps,
isTargetDevice ? nullptr : &targetClauseOps);
 
   LiveInShapeInfoMap liveInShapeInfoMap;
@@ -321,14 +323,13 @@ class DoConcurrentConversion
 }
 
 mlir::omp::ParallelOp parallelOp =
-genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper);
+genParallelOp(rewriter, loop, ivInfos, mapper);
 
 // Only set as composite when part of `distribute parallel do`.
 parallelOp.setComposite(mapToDevice);
 
 if (!mapToDevice)
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps);
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps);
 
 for (mlir::Value local : locals)
   looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
@@ -337,10 +338,38 @@ class DoConcurrentConversion
 if (mapToDevice)
   genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true);
 
-mlir::omp::LoopNestOp ompLoopNest =
+auto [loopNestOp, wsLoopOp] =
 genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps,
 /*isComposite=*/mapToDevice);
 
+// `local` region arguments are transferred/cloned from the `do concurrent`
+// loop to the loopnest op when the region is cloned above. Instead, these
+// region arguments should be on the workshare loop's region.
+if (mapToDevice) {
+  for (auto [parallelArg, loopNestArg] : llvm::zip_equal(
+   parallelOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getLocalOperandsStart(), loop.getNumLocalOperands(
+rewriter.replaceAllUsesWith(loopNestArg, parallelArg);
+
+  for (auto [wsloopArg, loopNestArg] : llvm::zip_equal(
+   wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getReduceOperandsStart(), 
loop.getNumReduceOperands(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+} else {
+  for (auto [wsloopArg, loopNestArg] :
+   llvm::zip_equal(wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().drop_front(
+   loopNestClauseOps.loopLowerBounds.size(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+}
+
+for (unsigned i = 0;
+ i 

[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156610

>From 5b9f17606b95f689a7ffb0187d103b2a4bd62e24 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 08:36:34 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
---
 .../OpenMP/DoConcurrentConversion.cpp | 117 ++
 .../DoConcurrent/reduce_device.mlir   |  53 
 2 files changed, 121 insertions(+), 49 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir

diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d00a4fdd2cf2e..6e308499100fa 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
   for (mlir::Value local : loop.getLocalVars())
 liveIns.push_back(local);
+
+  for (mlir::Value reduce : loop.getReduceVars())
+liveIns.push_back(reduce);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -319,7 +322,7 @@ class DoConcurrentConversion
   targetOp =
   genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns,
   targetClauseOps, loopNestClauseOps, liveInShapeInfoMap);
-  genTeamsOp(doLoop.getLoc(), rewriter);
+  genTeamsOp(rewriter, loop, mapper);
 }
 
 mlir::omp::ParallelOp parallelOp =
@@ -492,46 +495,7 @@ class DoConcurrentConversion
 if (!mapToDevice)
   genPrivatizers(rewriter, mapper, loop, wsloopClauseOps);
 
-if (!loop.getReduceVars().empty()) {
-  for (auto [op, byRef, sym, arg] : llvm::zip_equal(
-   loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
-   loop.getReduceSymsAttr().getAsRange(),
-   loop.getRegionReduceArgs())) {
-auto firReducer = moduleSymbolTable.lookup(
-sym.getLeafReference());
-
-mlir::OpBuilder::InsertionGuard guard(rewriter);
-rewriter.setInsertionPointAfter(firReducer);
-std::string ompReducerName = sym.getLeafReference().str() + ".omp";
-
-auto ompReducer =
-moduleSymbolTable.lookup(
-rewriter.getStringAttr(ompReducerName));
-
-if (!ompReducer) {
-  ompReducer = mlir::omp::DeclareReductionOp::create(
-  rewriter, firReducer.getLoc(), ompReducerName,
-  firReducer.getTypeAttr().getValue());
-
-  cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(),
-  ompReducer.getAllocRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(),
-  ompReducer.getInitializerRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(),
-  ompReducer.getReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(),
-  ompReducer.getAtomicReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(),
-  ompReducer.getCleanupRegion());
-  moduleSymbolTable.insert(ompReducer);
-}
-
-wsloopClauseOps.reductionVars.push_back(op);
-wsloopClauseOps.reductionByref.push_back(byRef);
-wsloopClauseOps.reductionSyms.push_back(
-mlir::SymbolRefAttr::get(ompReducer));
-  }
-}
+genReductions(rewriter, mapper, loop, wsloopClauseOps);
 
 auto wsloopOp =
 mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps);
@@ -553,8 +517,6 @@ class DoConcurrentConversion
 
 rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back());
 mlir::omp::YieldOp::create(rewriter, loop->getLoc());
-loop->getParentOfType().print(
-llvm::errs(), mlir::OpPrintingFlags().assumeVerified());
 
 return {loopNestOp, wsloopOp};
   }
@@ -778,15 +740,26 @@ class DoConcurrentConversion
 liveInName, shape);
   }
 
-  mlir::omp::TeamsOp
-  genTeamsOp(mlir::Location loc,
- mlir::ConversionPatternRewriter &rewriter) const {
-auto teamsOp = rewriter.create(
-loc, /*clauses=*/mlir::omp::TeamsOperands{});
+  mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter,
+fir::DoConcurrentLoopOp loop,
+mlir::IRMapping &mapper) const {
+mlir::omp::TeamsOperands teamsOps;
+genReductions(rewriter, mapper, loop, teamsOps);
+
+mlir::Location loc = loop.getLoc();
+aut

[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)

2025-09-16 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn milestoned 
https://github.com/llvm/llvm-project/pull/158918
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)

2025-09-16 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn created 
https://github.com/llvm/llvm-project/pull/158918

MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov to 
check for overflow.

(cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db)

>From 89c5e7e99f08f6f79aafa2ab91b0e224194f95b6 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Tue, 2 Sep 2025 09:37:19 +0100
Subject: [PATCH] [Loads] Check for overflow when adding MaxPtrDiff + Offset.

MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov
to check for overflow.

(cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db)
---
 llvm/lib/Analysis/Loads.cpp   |   5 +-
 .../LoopVectorize/load-deref-pred-align.ll| 130 ++
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 393f2648de3c9..fcc2cf2f7e8e7 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -382,7 +382,10 @@ bool llvm::isDereferenceableAndAlignedInLoop(
 if (Offset->getAPInt().urem(Alignment.value()) != 0)
   return false;
 
-AccessSize = MaxPtrDiff + Offset->getAPInt();
+bool Overflow = false;
+AccessSize = MaxPtrDiff.uadd_ov(Offset->getAPInt(), Overflow);
+if (Overflow)
+  return false;
 AccessSizeSCEV = SE.getAddExpr(PtrDiff, Offset);
 Base = NewBase->getValue();
   } else
diff --git a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll 
b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
index 8a326c9d0c083..7c2c3883e1dc7 100644
--- a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
+++ b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
@@ -753,3 +753,133 @@ exit:
   call void @llvm.memcpy.p0.p0.i64(ptr %dest, ptr %local_dest, i64 1024, i1 
false)
   ret void
 }
+
+define void @adding_offset_overflows(i32 %n, ptr %A) {
+; CHECK-LABEL: @adding_offset_overflows(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[B:%.*]] = alloca [62 x i32], align 4
+; CHECK-NEXT:[[C:%.*]] = alloca [144 x i32], align 4
+; CHECK-NEXT:call void @init(ptr [[B]])
+; CHECK-NEXT:call void @init(ptr [[C]])
+; CHECK-NEXT:[[PRE:%.*]] = icmp slt i32 [[N:%.*]], 1
+; CHECK-NEXT:br i1 [[PRE]], label [[EXIT:%.*]], label [[PH:%.*]]
+; CHECK:   ph:
+; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
+; CHECK-NEXT:[[TMP0:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1
+; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 2
+; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 2
+; CHECK-NEXT:[[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT:[[TMP1:%.*]] = add i64 1, [[N_VEC]]
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE3:%.*]] ]
+; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, ptr [[A:%.*]], i64 
[[OFFSET_IDX]]
+; CHECK-NEXT:[[TMP23:%.*]] = getelementptr i32, ptr [[TMP2]], i32 0
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP23]], align 4
+; CHECK-NEXT:[[TMP3:%.*]] = icmp ne <2 x i32> [[WIDE_LOAD]], 
zeroinitializer
+; CHECK-NEXT:[[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
+; CHECK-NEXT:br i1 [[TMP4]], label [[PRED_LOAD_IF:%.*]], label 
[[PRED_LOAD_CONTINUE:%.*]]
+; CHECK:   pred.load.if:
+; CHECK-NEXT:[[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT:[[TMP16:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP15]]
+; CHECK-NEXT:[[TMP17:%.*]] = load i32, ptr [[TMP16]], align 4
+; CHECK-NEXT:[[TMP18:%.*]] = insertelement <2 x i32> poison, i32 
[[TMP17]], i32 0
+; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE]]
+; CHECK:   pred.load.continue:
+; CHECK-NEXT:[[TMP19:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ 
[[TMP18]], [[PRED_LOAD_IF]] ]
+; CHECK-NEXT:[[TMP20:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
+; CHECK-NEXT:br i1 [[TMP20]], label [[PRED_LOAD_IF1:%.*]], label 
[[PRED_LOAD_CONTINUE2:%.*]]
+; CHECK:   pred.load.if1:
+; CHECK-NEXT:[[TMP21:%.*]] = add i64 [[OFFSET_IDX]], 1
+; CHECK-NEXT:[[TMP22:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP21]]
+; CHECK-NEXT:[[TMP13:%.*]] = load i32, ptr [[TMP22]], align 4
+; CHECK-NEXT:[[TMP14:%.*]] = insertelement <2 x i32> [[TMP19]], i32 
[[TMP13]], i32 1
+; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE2]]
+; CHECK:   pred.load.continue2:
+; CHECK-NEXT:[[WIDE_LOAD1:%.*]] = phi <2 x i32> [ [[TMP19]], 
[[PRED_LOAD_CONTINUE]] ], [ [[TMP14]], [[PRED_LOAD_IF1]] ]
+; CHECK-NEXT:[[TMP5:%.*]] = sext <2 x i32> [[WIDE_LOAD1]] to <2 x i64>
+; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
+; CHECK-NEXT:br i1 [[TMP6]], la

[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)

2025-09-16 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/158918
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)

2025-09-16 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)


Changes

MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov to 
check for overflow.

(cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db)

---
Full diff: https://github.com/llvm/llvm-project/pull/158918.diff


2 Files Affected:

- (modified) llvm/lib/Analysis/Loads.cpp (+4-1) 
- (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+130) 


``diff
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 393f2648de3c9..fcc2cf2f7e8e7 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -382,7 +382,10 @@ bool llvm::isDereferenceableAndAlignedInLoop(
 if (Offset->getAPInt().urem(Alignment.value()) != 0)
   return false;
 
-AccessSize = MaxPtrDiff + Offset->getAPInt();
+bool Overflow = false;
+AccessSize = MaxPtrDiff.uadd_ov(Offset->getAPInt(), Overflow);
+if (Overflow)
+  return false;
 AccessSizeSCEV = SE.getAddExpr(PtrDiff, Offset);
 Base = NewBase->getValue();
   } else
diff --git a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll 
b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
index 8a326c9d0c083..7c2c3883e1dc7 100644
--- a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
+++ b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll
@@ -753,3 +753,133 @@ exit:
   call void @llvm.memcpy.p0.p0.i64(ptr %dest, ptr %local_dest, i64 1024, i1 
false)
   ret void
 }
+
+define void @adding_offset_overflows(i32 %n, ptr %A) {
+; CHECK-LABEL: @adding_offset_overflows(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[B:%.*]] = alloca [62 x i32], align 4
+; CHECK-NEXT:[[C:%.*]] = alloca [144 x i32], align 4
+; CHECK-NEXT:call void @init(ptr [[B]])
+; CHECK-NEXT:call void @init(ptr [[C]])
+; CHECK-NEXT:[[PRE:%.*]] = icmp slt i32 [[N:%.*]], 1
+; CHECK-NEXT:br i1 [[PRE]], label [[EXIT:%.*]], label [[PH:%.*]]
+; CHECK:   ph:
+; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
+; CHECK-NEXT:[[TMP0:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1
+; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 2
+; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 2
+; CHECK-NEXT:[[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT:[[TMP1:%.*]] = add i64 1, [[N_VEC]]
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE3:%.*]] ]
+; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, ptr [[A:%.*]], i64 
[[OFFSET_IDX]]
+; CHECK-NEXT:[[TMP23:%.*]] = getelementptr i32, ptr [[TMP2]], i32 0
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP23]], align 4
+; CHECK-NEXT:[[TMP3:%.*]] = icmp ne <2 x i32> [[WIDE_LOAD]], 
zeroinitializer
+; CHECK-NEXT:[[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
+; CHECK-NEXT:br i1 [[TMP4]], label [[PRED_LOAD_IF:%.*]], label 
[[PRED_LOAD_CONTINUE:%.*]]
+; CHECK:   pred.load.if:
+; CHECK-NEXT:[[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT:[[TMP16:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP15]]
+; CHECK-NEXT:[[TMP17:%.*]] = load i32, ptr [[TMP16]], align 4
+; CHECK-NEXT:[[TMP18:%.*]] = insertelement <2 x i32> poison, i32 
[[TMP17]], i32 0
+; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE]]
+; CHECK:   pred.load.continue:
+; CHECK-NEXT:[[TMP19:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ 
[[TMP18]], [[PRED_LOAD_IF]] ]
+; CHECK-NEXT:[[TMP20:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
+; CHECK-NEXT:br i1 [[TMP20]], label [[PRED_LOAD_IF1:%.*]], label 
[[PRED_LOAD_CONTINUE2:%.*]]
+; CHECK:   pred.load.if1:
+; CHECK-NEXT:[[TMP21:%.*]] = add i64 [[OFFSET_IDX]], 1
+; CHECK-NEXT:[[TMP22:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP21]]
+; CHECK-NEXT:[[TMP13:%.*]] = load i32, ptr [[TMP22]], align 4
+; CHECK-NEXT:[[TMP14:%.*]] = insertelement <2 x i32> [[TMP19]], i32 
[[TMP13]], i32 1
+; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE2]]
+; CHECK:   pred.load.continue2:
+; CHECK-NEXT:[[WIDE_LOAD1:%.*]] = phi <2 x i32> [ [[TMP19]], 
[[PRED_LOAD_CONTINUE]] ], [ [[TMP14]], [[PRED_LOAD_IF1]] ]
+; CHECK-NEXT:[[TMP5:%.*]] = sext <2 x i32> [[WIDE_LOAD1]] to <2 x i64>
+; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
+; CHECK-NEXT:br i1 [[TMP6]], label [[PRED_STORE_IF:%.*]], label 
[[PRED_STORE_CONTINUE:%.*]]
+; CHECK:   pred.store.if:
+; CHECK-NEXT:[[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
+; CHECK-NEXT:[[TMP8:%.*]] = getelementptr i32, ptr [[C]], i64 [[TMP7]]
+; CHECK-NEXT:store i32 0, ptr [[TMP8]], align 4
+; CHECK-NEX

[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)

2025-09-16 Thread Florian Hahn via llvm-branch-commits

fhahn wrote:

This fixes a mis-compile when bootstrapping Clang with sanitizers on macOS

https://github.com/llvm/llvm-project/pull/158918
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-16 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah commented:

LGTM once the existing comments are addressed.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156837

>From ccf3696848367835c15e973c7a7b0d76297be31c Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH 1/2] [flang][OpenMP] Support multi-block reduction combiner 
 regions on the GPU

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +
 .../omptarget-multi-block-reduction.mlir  | 85 +++
 2 files changed, 88 insertions(+)
 create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index d1f78c32596ba..f4acb60a99bf0 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3507,6 +3507,8 @@ Expected 
OpenMPIRBuilder::createReductionFunction(
 return AfterIP.takeError();
   if (!Builder.GetInsertBlock())
 return ReductionFunc;
+
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHSPtr);
 }
   }
@@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createReductionsGPU(
   RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
   if (!AfterIP)
 return AfterIP.takeError();
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHS, false);
 }
   }
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir 
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 0..aaf06d2d0e0c2
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = 
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+  llvm.func @bar() {}
+  llvm.func @baz() {}
+
+  omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+  } init {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+  } combiner {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+  ^bb3:  // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+  }
+  llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> 
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) 
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+  %c1_2 = llvm.mlir.constant(1 : i32) : i32
+  %c10 = llvm.mlir.constant(10 : i32) : i32
+  omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 
: !llvm.ptr) {
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step 
(%c1_2) {
+omp.yield
+  }
+} {omp.composite}
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+omp.terminator
+  }
+  omp.terminator
+}
+llvm.return
+  }
+}
+
+// CHECK:  call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME:   ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK:   .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK:   [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK:   [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef 
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK:   call void @b

[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-16 Thread Abid Qadeer via llvm-branch-commits

abidh wrote:

Thanks for handling my comments. It looks good to me but I have one question. 
This patch sets the insertion point so that store instruction gets generated at 
the correct place. But the test does not have any store instruction. I was just 
wondering if the test is checking the right thing.

https://github.com/llvm/llvm-project/pull/156837
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [RISCV] Reduce RISCV code generation build time (PR #158164)

2025-09-16 Thread Saleem Abdulrasool via llvm-branch-commits

compnerd wrote:

> I do not know what this error means or how to fix it:
> 
> ```
> error: Expected version 21.1.2 but found version 21.1.1
> ```

This just needs to be updated in CMakeLists.txt

https://github.com/llvm/llvm-project/pull/158164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CodeGen][CFI] Generalize transparent union in args of args of functions (PR #158194)

2025-09-16 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka converted_to_draft 
https://github.com/llvm/llvm-project/pull/158194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies for epilogue (PR #159163)

2025-09-16 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl updated 
https://github.com/llvm/llvm-project/pull/159163

>From 5a9959313c0aebc1c707d19e30055cb925be7760 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" 
Date: Tue, 16 Sep 2025 16:03:11 -0400
Subject: [PATCH 1/2] [LoopUnroll] Fix block frequencies for epilogue

As another step in issue #135812, this patch fixes block frequencies
for partial loop unrolling with an epilogue remainder loop.  It does
not fully handle the case when the epilogue loop itself is unrolled.
That will be handled in the next patch.

For the guard and latch of each of the unrolled loop and epilogue
loop, this patch sets branch weights derived directly from the
original loop latch branch weights.  The total frequency of the
original loop body, summed across all its occurrences in the unrolled
loop and epilogue loop, is the same as in the original loop.  This
patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop
instead of relying on the epilogue's latch branch weights to imply it.

This patch removes the XFAIL directives that PR #157754 added to the
test suite.
---
 .../include/llvm/Transforms/Utils/LoopUtils.h |  32 
 .../llvm/Transforms/Utils/UnrollLoop.h|   4 +-
 llvm/lib/Transforms/Utils/LoopUnroll.cpp  |  31 ++--
 .../Transforms/Utils/LoopUnrollRuntime.cpp|  94 --
 llvm/lib/Transforms/Utils/LoopUtils.cpp   |  48 ++
 .../branch-weights-freq/unroll-epilog.ll  | 160 ++
 .../runtime-exit-phi-scev-invalidation.ll |   4 +-
 .../LoopUnroll/runtime-loop-branchweight.ll   |  56 +-
 .../Transforms/LoopUnroll/runtime-loop.ll |   9 +-
 .../LoopUnroll/unroll-heuristics-pgo.ll   |  64 +--
 10 files changed, 448 insertions(+), 54 deletions(-)
 create mode 100644 
llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll

diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index c5dbb2bdd1dd8..71754b8f62a16 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -365,6 +365,38 @@ LLVM_ABI bool setLoopEstimatedTripCount(
 Loop *L, unsigned EstimatedTripCount,
 std::optional EstimatedLoopInvocationWeight = std::nullopt);
 
+/// Based on branch weight metadata, return either:
+/// - \c std::nullopt if the implementation is unable to handle the loop form
+///   of \p L (e.g., \p L must have a latch block that controls the loop exit).
+/// - Else, the estimated probability that, at the end of any iteration, the
+///   latch of \p L will start another iteration.  The result \c P is such that
+///   `0 <= P <= 1`, and `1 - P` is the probability of exiting the loop.
+std::optional getLoopProbability(Loop *L);
+
+/// Set branch weight metadata for the latch of \p L to indicate that, at the
+/// end of any iteration, its estimated probability of starting another
+/// iteration is \p P.  Return false if the implementation is unable to handle
+/// the loop form of \p L (e.g., \p L must have a latch block that controls the
+/// loop exit).  Otherwise, return true.
+bool setLoopProbability(Loop *L, double P);
+
+/// Based on branch weight metadata, return either:
+/// - \c std::nullopt if the implementation cannot extract the probability
+///   (e.g., \p B must have exactly two target labels, so it must be a
+///   conditional branch).
+/// - The probability \c P that control flows from \p B to its first target
+///   label such that `1 - P` is the probability of control flowing to its
+///   second target label, or vice-versa if \p ForFirstTarget is false.
+std::optional getBranchProbability(BranchInst *B, bool ForFirstTarget);
+
+/// Set branch weight metadata for \p B to indicate that \p P and `1 - P` are
+/// the probabilities of control flowing to its first and second target labels,
+/// respectively, or vice-versa if \p ForFirstTarget is false.  Return false if
+/// the implementation cannot set the probability (e.g., \p B must have exactly
+/// two target labels, so it must be a conditional branch).  Otherwise, return
+/// true.
+bool setBranchProbability(BranchInst *B, double P, bool ForFirstTarget);
+
 /// Check inner loop (L) backedge count is known to be invariant on all
 /// iterations of its outer loop. If the loop has no parent, this is trivially
 /// true.
diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h 
b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
index 871c13d972470..571a0af6fd0db 100644
--- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
@@ -97,7 +97,9 @@ LLVM_ABI bool UnrollRuntimeLoopRemainder(
 LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
 const TargetTransformInfo *TTI, bool PreserveLCSSA,
 unsigned SCEVExpansionBudget, bool RuntimeUnrollMultiExit,
-Loop **ResultLoop = nullptr);
+Loop **ResultLoop = nullptr,
+std::optional OriginalTripCount = std::nullopt,

[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-16 Thread Florian Hahn via llvm-branch-commits




fhahn wrote:

Sounds good to me!

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Use static create methods to initialize resources in arrays (PR #157005)

2025-09-16 Thread Chris B via llvm-branch-commits

https://github.com/llvm-beanz approved this pull request.


https://github.com/llvm/llvm-project/pull/157005
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Stop using aligned VGPR classes for addRegisterClass (PR #158278)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158278

>From f6208fe1d18e2406ca9b6e84adbb35051b6ce94d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 12 Sep 2025 20:45:56 +0900
Subject: [PATCH] AMDGPU: Stop using aligned VGPR classes for addRegisterClass

This is unnecessary. At use emission time, InstrEmitter will
use the common subclass of the value type's register class and
the use instruction register classes. This removes one of the
obstacles to treating special case instructions that do not have
the alignment requirement overly conservatively.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 32 +++
 llvm/test/CodeGen/AMDGPU/mfma-loop.ll | 14 +-
 2 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 6a4df5eeb9779..4369b40e65103 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -111,52 +111,52 @@ SITargetLowering::SITargetLowering(const TargetMachine 
&TM,
   addRegisterClass(MVT::Untyped, V64RegClass);
 
   addRegisterClass(MVT::v3i32, &AMDGPU::SGPR_96RegClass);
-  addRegisterClass(MVT::v3f32, TRI->getVGPRClassForBitWidth(96));
+  addRegisterClass(MVT::v3f32, &AMDGPU::VReg_96RegClass);
 
   addRegisterClass(MVT::v2i64, &AMDGPU::SGPR_128RegClass);
   addRegisterClass(MVT::v2f64, &AMDGPU::SGPR_128RegClass);
 
   addRegisterClass(MVT::v4i32, &AMDGPU::SGPR_128RegClass);
-  addRegisterClass(MVT::v4f32, TRI->getVGPRClassForBitWidth(128));
+  addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);
 
   addRegisterClass(MVT::v5i32, &AMDGPU::SGPR_160RegClass);
-  addRegisterClass(MVT::v5f32, TRI->getVGPRClassForBitWidth(160));
+  addRegisterClass(MVT::v5f32, &AMDGPU::VReg_160RegClass);
 
   addRegisterClass(MVT::v6i32, &AMDGPU::SGPR_192RegClass);
-  addRegisterClass(MVT::v6f32, TRI->getVGPRClassForBitWidth(192));
+  addRegisterClass(MVT::v6f32, &AMDGPU::VReg_192RegClass);
 
   addRegisterClass(MVT::v3i64, &AMDGPU::SGPR_192RegClass);
-  addRegisterClass(MVT::v3f64, TRI->getVGPRClassForBitWidth(192));
+  addRegisterClass(MVT::v3f64, &AMDGPU::VReg_192RegClass);
 
   addRegisterClass(MVT::v7i32, &AMDGPU::SGPR_224RegClass);
-  addRegisterClass(MVT::v7f32, TRI->getVGPRClassForBitWidth(224));
+  addRegisterClass(MVT::v7f32, &AMDGPU::VReg_224RegClass);
 
   addRegisterClass(MVT::v8i32, &AMDGPU::SGPR_256RegClass);
-  addRegisterClass(MVT::v8f32, TRI->getVGPRClassForBitWidth(256));
+  addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass);
 
   addRegisterClass(MVT::v4i64, &AMDGPU::SGPR_256RegClass);
-  addRegisterClass(MVT::v4f64, TRI->getVGPRClassForBitWidth(256));
+  addRegisterClass(MVT::v4f64, &AMDGPU::VReg_256RegClass);
 
   addRegisterClass(MVT::v9i32, &AMDGPU::SGPR_288RegClass);
-  addRegisterClass(MVT::v9f32, TRI->getVGPRClassForBitWidth(288));
+  addRegisterClass(MVT::v9f32, &AMDGPU::VReg_288RegClass);
 
   addRegisterClass(MVT::v10i32, &AMDGPU::SGPR_320RegClass);
-  addRegisterClass(MVT::v10f32, TRI->getVGPRClassForBitWidth(320));
+  addRegisterClass(MVT::v10f32, &AMDGPU::VReg_320RegClass);
 
   addRegisterClass(MVT::v11i32, &AMDGPU::SGPR_352RegClass);
-  addRegisterClass(MVT::v11f32, TRI->getVGPRClassForBitWidth(352));
+  addRegisterClass(MVT::v11f32, &AMDGPU::VReg_352RegClass);
 
   addRegisterClass(MVT::v12i32, &AMDGPU::SGPR_384RegClass);
-  addRegisterClass(MVT::v12f32, TRI->getVGPRClassForBitWidth(384));
+  addRegisterClass(MVT::v12f32, &AMDGPU::VReg_384RegClass);
 
   addRegisterClass(MVT::v16i32, &AMDGPU::SGPR_512RegClass);
-  addRegisterClass(MVT::v16f32, TRI->getVGPRClassForBitWidth(512));
+  addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass);
 
   addRegisterClass(MVT::v8i64, &AMDGPU::SGPR_512RegClass);
-  addRegisterClass(MVT::v8f64, TRI->getVGPRClassForBitWidth(512));
+  addRegisterClass(MVT::v8f64, &AMDGPU::VReg_512RegClass);
 
   addRegisterClass(MVT::v16i64, &AMDGPU::SGPR_1024RegClass);
-  addRegisterClass(MVT::v16f64, TRI->getVGPRClassForBitWidth(1024));
+  addRegisterClass(MVT::v16f64, &AMDGPU::VReg_1024RegClass);
 
   if (Subtarget->has16BitInsts()) {
 if (Subtarget->useRealTrue16Insts()) {
@@ -188,7 +188,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   addRegisterClass(MVT::v32i32, &AMDGPU::VReg_1024RegClass);
-  addRegisterClass(MVT::v32f32, TRI->getVGPRClassForBitWidth(1024));
+  addRegisterClass(MVT::v32f32, &AMDGPU::VReg_1024RegClass);
 
   computeRegisterProperties(Subtarget->getRegisterInfo());
 
diff --git a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll 
b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
index d39daaade677f..3657a6b1b7415 100644
--- a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
+++ b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
@@ -2430,8 +2430,9 @@ define amdgpu_kernel void 
@test_mfma_nested_loop_zeroinit(ptr addrspace(1) %arg)
 ; GFX90A-NEXT:v_accvgpr_mov_b32 a29, a0
 ; GFX90A-NEXT:v_accvgpr_mov_b32 a30, a0
 ; GFX90A-NEXT

[llvm-branch-commits] [llvm] AMDGPU: Stop using aligned VGPR classes for addRegisterClass (PR #158278)

2025-09-16 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158278

>From f6208fe1d18e2406ca9b6e84adbb35051b6ce94d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 12 Sep 2025 20:45:56 +0900
Subject: [PATCH] AMDGPU: Stop using aligned VGPR classes for addRegisterClass

This is unnecessary. At use emission time, InstrEmitter will
use the common subclass of the value type's register class and
the use instruction register classes. This removes one of the
obstacles to treating special case instructions that do not have
the alignment requirement overly conservatively.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 32 +++
 llvm/test/CodeGen/AMDGPU/mfma-loop.ll | 14 +-
 2 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 6a4df5eeb9779..4369b40e65103 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -111,52 +111,52 @@ SITargetLowering::SITargetLowering(const TargetMachine 
&TM,
   addRegisterClass(MVT::Untyped, V64RegClass);
 
   addRegisterClass(MVT::v3i32, &AMDGPU::SGPR_96RegClass);
-  addRegisterClass(MVT::v3f32, TRI->getVGPRClassForBitWidth(96));
+  addRegisterClass(MVT::v3f32, &AMDGPU::VReg_96RegClass);
 
   addRegisterClass(MVT::v2i64, &AMDGPU::SGPR_128RegClass);
   addRegisterClass(MVT::v2f64, &AMDGPU::SGPR_128RegClass);
 
   addRegisterClass(MVT::v4i32, &AMDGPU::SGPR_128RegClass);
-  addRegisterClass(MVT::v4f32, TRI->getVGPRClassForBitWidth(128));
+  addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);
 
   addRegisterClass(MVT::v5i32, &AMDGPU::SGPR_160RegClass);
-  addRegisterClass(MVT::v5f32, TRI->getVGPRClassForBitWidth(160));
+  addRegisterClass(MVT::v5f32, &AMDGPU::VReg_160RegClass);
 
   addRegisterClass(MVT::v6i32, &AMDGPU::SGPR_192RegClass);
-  addRegisterClass(MVT::v6f32, TRI->getVGPRClassForBitWidth(192));
+  addRegisterClass(MVT::v6f32, &AMDGPU::VReg_192RegClass);
 
   addRegisterClass(MVT::v3i64, &AMDGPU::SGPR_192RegClass);
-  addRegisterClass(MVT::v3f64, TRI->getVGPRClassForBitWidth(192));
+  addRegisterClass(MVT::v3f64, &AMDGPU::VReg_192RegClass);
 
   addRegisterClass(MVT::v7i32, &AMDGPU::SGPR_224RegClass);
-  addRegisterClass(MVT::v7f32, TRI->getVGPRClassForBitWidth(224));
+  addRegisterClass(MVT::v7f32, &AMDGPU::VReg_224RegClass);
 
   addRegisterClass(MVT::v8i32, &AMDGPU::SGPR_256RegClass);
-  addRegisterClass(MVT::v8f32, TRI->getVGPRClassForBitWidth(256));
+  addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass);
 
   addRegisterClass(MVT::v4i64, &AMDGPU::SGPR_256RegClass);
-  addRegisterClass(MVT::v4f64, TRI->getVGPRClassForBitWidth(256));
+  addRegisterClass(MVT::v4f64, &AMDGPU::VReg_256RegClass);
 
   addRegisterClass(MVT::v9i32, &AMDGPU::SGPR_288RegClass);
-  addRegisterClass(MVT::v9f32, TRI->getVGPRClassForBitWidth(288));
+  addRegisterClass(MVT::v9f32, &AMDGPU::VReg_288RegClass);
 
   addRegisterClass(MVT::v10i32, &AMDGPU::SGPR_320RegClass);
-  addRegisterClass(MVT::v10f32, TRI->getVGPRClassForBitWidth(320));
+  addRegisterClass(MVT::v10f32, &AMDGPU::VReg_320RegClass);
 
   addRegisterClass(MVT::v11i32, &AMDGPU::SGPR_352RegClass);
-  addRegisterClass(MVT::v11f32, TRI->getVGPRClassForBitWidth(352));
+  addRegisterClass(MVT::v11f32, &AMDGPU::VReg_352RegClass);
 
   addRegisterClass(MVT::v12i32, &AMDGPU::SGPR_384RegClass);
-  addRegisterClass(MVT::v12f32, TRI->getVGPRClassForBitWidth(384));
+  addRegisterClass(MVT::v12f32, &AMDGPU::VReg_384RegClass);
 
   addRegisterClass(MVT::v16i32, &AMDGPU::SGPR_512RegClass);
-  addRegisterClass(MVT::v16f32, TRI->getVGPRClassForBitWidth(512));
+  addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass);
 
   addRegisterClass(MVT::v8i64, &AMDGPU::SGPR_512RegClass);
-  addRegisterClass(MVT::v8f64, TRI->getVGPRClassForBitWidth(512));
+  addRegisterClass(MVT::v8f64, &AMDGPU::VReg_512RegClass);
 
   addRegisterClass(MVT::v16i64, &AMDGPU::SGPR_1024RegClass);
-  addRegisterClass(MVT::v16f64, TRI->getVGPRClassForBitWidth(1024));
+  addRegisterClass(MVT::v16f64, &AMDGPU::VReg_1024RegClass);
 
   if (Subtarget->has16BitInsts()) {
 if (Subtarget->useRealTrue16Insts()) {
@@ -188,7 +188,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   }
 
   addRegisterClass(MVT::v32i32, &AMDGPU::VReg_1024RegClass);
-  addRegisterClass(MVT::v32f32, TRI->getVGPRClassForBitWidth(1024));
+  addRegisterClass(MVT::v32f32, &AMDGPU::VReg_1024RegClass);
 
   computeRegisterProperties(Subtarget->getRegisterInfo());
 
diff --git a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll 
b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
index d39daaade677f..3657a6b1b7415 100644
--- a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
+++ b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll
@@ -2430,8 +2430,9 @@ define amdgpu_kernel void 
@test_mfma_nested_loop_zeroinit(ptr addrspace(1) %arg)
 ; GFX90A-NEXT:v_accvgpr_mov_b32 a29, a0
 ; GFX90A-NEXT:v_accvgpr_mov_b32 a30, a0
 ; GFX90A-NEXT

[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-16 Thread Kareem Ergawy via llvm-branch-commits

ergawy wrote:

> Thanks for handling my comments. It looks good to me but I have one question. 
> This patch sets the insertion point so that store instruction gets generated 
> at the correct place. But the test does not check for any store instruction. 
> I was just wondering if the test is checking the right thing.

Without the changes in the PR, the test crashes flang. However, I agree that 
the test should be expanded a bit. Added more checks to capture better the 
code-gen of the reduction.

https://github.com/llvm/llvm-project/pull/156837
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)

2025-09-16 Thread Ryotaro Kasuga via llvm-branch-commits


@@ -807,3 +807,123 @@ for.body: ; preds 
= %entry, %for.body
 for.end:  ; preds = %for.body
   ret void
 }
+
+;; max_i = INT64_MAX/6  // 1537228672809129301
+;; for (long long i = 0; i <= max_i; i++) {
+;;   A[-6*i + INT64_MAX] = 0;
+;;   if (i)
+;; A[3*i - 2] = 1;
+;; }
+;;
+;; FIXME: There is a loop-carried dependency between
+;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example,

kasuga-fj wrote:

Thanks, fixed

https://github.com/llvm/llvm-project/pull/157085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits