[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits


@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache {
   SanitizerAnnotateDebugInfo(ArrayRef 
Ordinals,
  SanitizerHandler Handler);
 
-  /// Emit additional metadata used by the AllocToken instrumentation.
+  /// Emit metadata used by the AllocToken instrumentation.
+  llvm::MDNode *EmitAllocTokenHint(QualType AllocType);

melver wrote:

Yes, LLVM permits sharing MD nodes - MDNode::get should intern nodes, although 
if you want to skip the whole calculation involved, that needs introducing a 
separate lookup table.

Changing it to buildAllocToken.

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits


@@ -5760,6 +5764,24 @@ bool Sema::BuiltinAllocaWithAlign(CallExpr *TheCall) {
   return false;
 }
 
+bool Sema::BuiltinAllocTokenInfer(CallExpr *TheCall) {

melver wrote:

I'm indifferent here. Switching to a static function.

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Mehdi Amini via llvm-branch-commits

joker-eph wrote:

> > That isn't in MLIR right now, so that's not generally usable.
> 
> I've added `complex.powi -> complex.pow` conversion to the 
> `ComplexToStandard` MLIR pass.

Thanks, LG!

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)

2025-09-18 Thread Deák Lajos via llvm-branch-commits

deaklajos wrote:

@vitalybuka 

https://github.com/llvm/llvm-project/pull/159551
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: MC: Better handle backslash-escaped symbols (PR #159420)

2025-09-18 Thread Nikita Popov via llvm-branch-commits

nikic wrote:

The diff here is fairly large, but also very mechanical. This fixes a 
regression for the Rust defmt crate with LLVM 21.

https://github.com/llvm/llvm-project/pull/159420
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] CodeGen: Emit .prefalign directives based on the prefalign attribute. (PR #155529)

2025-09-18 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic commented:

Can you split "implement basic codegen support for prefalign" (the bits which 
don't depend on the .prefalign directive) into a separate patch?  It's not 
clear what's causing the test changes here.

https://github.com/llvm/llvm-project/pull/155529
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][docs] Fix implicit-int-conversion-on-negation typos (PR #156815)

2025-09-18 Thread via llvm-branch-commits

github-actions[bot] wrote:

@correctmost (or anyone else). If you would like to add a note about this fix 
in the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/156815
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-18 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler updated 
https://github.com/llvm/llvm-project/pull/156715

>From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001
From: Tobias Stadler 
Date: Thu, 18 Sep 2025 12:34:55 +0100
Subject: [PATCH 1/2] Replace bitstream blobs by yaml

Created using spr 1.3.7-wip
---
 llvm/lib/Remarks/BitstreamRemarkParser.cpp|   5 +-
 .../dsymutil/ARM/remarks-linking-bundle.test  |  13 +-
 .../basic1.macho.remarks.arm64.opt.bitstream  | Bin 824 -> 0 bytes
 .../basic1.macho.remarks.arm64.opt.yaml   |  47 +
 ...c1.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic2.macho.remarks.arm64.opt.bitstream  | Bin 1696 -> 0 bytes
 .../basic2.macho.remarks.arm64.opt.yaml   | 194 ++
 ...c2.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic3.macho.remarks.arm64.opt.bitstream  | Bin 1500 -> 0 bytes
 .../basic3.macho.remarks.arm64.opt.yaml   | 181 
 ...c3.macho.remarks.empty.arm64.opt.bitstream |   0
 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes
 .../remarks/fat.macho.remarks.x86_64.opt.yaml |  53 +
 .../fat.macho.remarks.x86_64h.opt.bitstream   | Bin 820 -> 0 bytes
 .../fat.macho.remarks.x86_64h.opt.yaml|  53 +
 .../X86/remarks-linking-fat-bundle.test   |   8 +-
 16 files changed, 543 insertions(+), 11 deletions(-)
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml

diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp 
b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
index 63b16bd2df0ec..2b27a0f661d88 100644
--- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp
+++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
@@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() {
 return E;
 
   if (ContainerType != BitstreamRemarkContainerType::RemarksFile)
-return error(
-"Error while parsing external file's BLOCK_META: wrong container "
-"type.");
+return ParserHelper->MetaHelper.error(
+"Wrong container type in external file.");
 
   return Error::success();
 }
diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test 
b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
index 09a60d7d044c6..e1b04455b0d9d 100644
--- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
+++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
@@ -1,22 +1,25 @@
 RUN: rm -rf %t
-RUN: mkdir -p %t
+RUN: mkdir -p %t/private/tmp/remarks
 RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> 
%t/basic.macho.remarks.arm64
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 
-RUN: dsymutil -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64
+RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t 
%t/basic.macho.remarks.arm64
 
 Check that the remark file in the bundle exists and is sane:
 RUN: llvm-bcanalyzer -dump 
%t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64
 | FileCheck %s
 
-RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.r

[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)

2025-09-18 Thread Benjamin Maxwell via llvm-branch-commits

https://github.com/MacDue updated 
https://github.com/llvm/llvm-project/pull/142391

>From 0dfb0725e2a4f82af47821946bfbbfcd7ed08e10 Mon Sep 17 00:00:00 2001
From: Benjamin Maxwell 
Date: Thu, 8 May 2025 17:38:27 +
Subject: [PATCH] [AArch64] Prepare for split ZPR and PPR area allocation
 (NFCI)

This patch attempts to refactor AArch64FrameLowering to allow the size
of the ZPR and PPR areas to be calculated separately. This will be used
by a subsequent patch to support allocating ZPRs and PPRs to separate
areas. This patch should be an NFC and is split out to make later
functional changes easier to spot.
---
 .../Target/AArch64/AArch64FrameLowering.cpp   | 220 ++
 .../lib/Target/AArch64/AArch64FrameLowering.h |  20 +-
 .../AArch64/AArch64MachineFunctionInfo.cpp|  20 +-
 .../AArch64/AArch64MachineFunctionInfo.h  |  63 ++---
 .../AArch64/AArch64PrologueEpilogue.cpp   | 128 ++
 .../Target/AArch64/AArch64RegisterInfo.cpp|   4 +-
 .../DebugInfo/AArch64/asan-stack-vars.mir |   3 +-
 .../compiler-gen-bbs-livedebugvalues.mir  |   3 +-
 8 files changed, 288 insertions(+), 173 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 20b0d697827c5..f5f7b6522ddec 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -324,6 +324,36 @@ 
AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
 static bool produceCompactUnwindFrame(const AArch64FrameLowering &,
   MachineFunction &MF);
 
+enum class AssignObjectOffsets { No, Yes };
+/// Process all the SVE stack objects and the SVE stack size and offsets for
+/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
+/// stack sizes set). Returns the size of the SVE stack.
+static SVEStackSizes determineSVEStackSizes(MachineFunction &MF,
+AssignObjectOffsets AssignOffsets,
+bool SplitSVEObjects = false);
+
+static unsigned getStackHazardSize(const MachineFunction &MF) {
+  return MF.getSubtarget().getStreamingHazardSize();
+}
+
+/// Returns true if PPRs are spilled as ZPRs.
+static bool arePPRsSpilledAsZPR(const MachineFunction &MF) {
+  return MF.getSubtarget().getRegisterInfo()->getSpillSize(
+ AArch64::PPRRegClass) == 16;
+}
+
+StackOffset
+AArch64FrameLowering::getZPRStackSize(const MachineFunction &MF) const {
+  const AArch64FunctionInfo *AFI = MF.getInfo();
+  return StackOffset::getScalable(AFI->getStackSizeZPR());
+}
+
+StackOffset
+AArch64FrameLowering::getPPRStackSize(const MachineFunction &MF) const {
+  const AArch64FunctionInfo *AFI = MF.getInfo();
+  return StackOffset::getScalable(AFI->getStackSizePPR());
+}
+
 // Conservatively, returns true if the function is likely to have SVE vectors
 // on the stack. This function is safe to be called before callee-saves or
 // object offsets have been determined.
@@ -482,13 +512,6 @@ AArch64FrameLowering::getFixedObjectSize(const 
MachineFunction &MF,
   }
 }
 
-/// Returns the size of the entire SVE stackframe (calleesaves + spills).
-StackOffset
-AArch64FrameLowering::getSVEStackSize(const MachineFunction &MF) const {
-  const AArch64FunctionInfo *AFI = MF.getInfo();
-  return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
-}
-
 bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
   if (!EnableRedZone)
 return false;
@@ -514,7 +537,7 @@ bool AArch64FrameLowering::canUseRedZone(const 
MachineFunction &MF) const {
  !Subtarget.hasSVE();
 
   return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
-   getSVEStackSize(MF) || LowerQRegCopyThroughMem);
+   AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
 }
 
 /// hasFPImpl - Return true if the specified function should have a dedicated
@@ -557,7 +580,7 @@ bool AArch64FrameLowering::hasFPImpl(const MachineFunction 
&MF) const {
   // CFA in either of these cases.
   if (AFI.needsDwarfUnwindInfo(MF) &&
   ((requiresSaveVG(MF) || AFI.getSMEFnAttrs().hasStreamingBody()) &&
-   (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
+   (!AFI.hasCalculatedStackSizeSVE() || AFI.hasSVEStackSize(
 return true;
   // With large callframes around we may need to use FP to access the 
scavenging
   // emergency spillslot.
@@ -1126,10 +1149,6 @@ static bool isTargetWindows(const MachineFunction &MF) {
   return MF.getSubtarget().isTargetWindows();
 }
 
-static unsigned getStackHazardSize(const MachineFunction &MF) {
-  return MF.getSubtarget().getStreamingHazardSize();
-}
-
 void AArch64FrameLowering::emitPacRetPlusLeafHardening(
 MachineFunction &MF) const {
   const AArch64Subtarget &Subtarget = MF.getSubtarget();
@@ -1212,7 +1231,9 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const 

[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Slava Zakharin via llvm-branch-commits

https://github.com/vzakhari commented:

LGTM with some final comments.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Slava Zakharin via llvm-branch-commits


@@ -1272,7 +1272,18 @@ mlir::Value genMathOp(fir::FirOpBuilder &builder, 
mlir::Location loc,
 LLVM_DEBUG(llvm::dbgs() << "Generating '" << mathLibFuncName
 << "' operation with type ";
mathLibFuncType.dump(); llvm::dbgs() << "\n");
-result = T::create(builder, loc, args);
+if constexpr (std::is_same_v) {
+  auto resultType = mathLibFuncType.getResult(0);
+  result = T::create(builder, loc, resultType, args);
+} else if constexpr (std::is_same_v) {
+  auto resultType = mathLibFuncType.getResult(0);
+  auto fmfAttr = mlir::arith::FastMathFlagsAttr::get(
+  builder.getContext(), builder.getFastMathFlags());
+  result = builder.create(loc, resultType, args[0],
+ args[1], fmfAttr);
+} else {

vzakhari wrote:

Do we really need all this code?  I believe just a simple `T::create(buider, 
loc, args)` should work, because of the type constraints in the operations 
definitions.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Slava Zakharin via llvm-branch-commits


@@ -175,12 +176,20 @@ PowIStrengthReduction::matchAndRewrite(
 
   Value one;
   Type opType = getElementTypeOrSelf(op.getType());
-  if constexpr (std::is_same_v)
+  if constexpr (std::is_same_v) {
 one = arith::ConstantOp::create(rewriter, loc,
 rewriter.getFloatAttr(opType, 1.0));
-  else
+  } else if constexpr (std::is_same_v) {
+auto complexTy = cast(opType);
+Type elementType = complexTy.getElementType();
+auto realPart = rewriter.getFloatAttr(elementType, 1.0);
+auto imagPart = rewriter.getFloatAttr(elementType, 0.0);
+one = rewriter.create(

vzakhari wrote:

I believe all the `create` methods of the rewriter will become deprecated soon, 
so `complex::ConstantOp::create` is a better alternative.  There are other 
cases below.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Slava Zakharin via llvm-branch-commits

https://github.com/vzakhari edited 
https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (PR #158246)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/158246

This is special for the same reason av_mov_b64_imm_pseudo is special.

>From e5032294b4979c4b7f2367cee30c24d42901714b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 5 Sep 2025 17:27:37 +0900
Subject: [PATCH] AMDGPU: Move spill pseudo special case out of
 adjustAllocatableRegClass

This is special for the same reason av_mov_b64_imm_pseudo is special.
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 8 +++-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h   | 6 --
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 5c3340703ba3b..b1a61886802f4 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -5976,8 +5976,7 @@ SIInstrInfo::getWholeWaveFunctionSetup(MachineFunction 
&MF) const {
 static const TargetRegisterClass *
 adjustAllocatableRegClass(const GCNSubtarget &ST, const SIRegisterInfo &RI,
   const MCInstrDesc &TID, unsigned RCID) {
-  if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore()) &&
-!(TID.TSFlags & SIInstrFlags::Spill {
+  if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore() {
 switch (RCID) {
 case AMDGPU::AV_32RegClassID:
   RCID = AMDGPU::VGPR_32RegClassID;
@@ -6012,10 +6011,9 @@ const TargetRegisterClass 
*SIInstrInfo::getRegClass(const MCInstrDesc &TID,
   if (OpNum >= TID.getNumOperands())
 return nullptr;
   auto RegClass = TID.operands()[OpNum].RegClass;
-  if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO) {
-// Special pseudos have no alignment requirement
+  // Special pseudos have no alignment requirement
+  if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO || isSpill(TID))
 return RI.getRegClass(RegClass);
-  }
 
   return adjustAllocatableRegClass(ST, RI, TID, RegClass);
 }
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index f7dde2b90b68e..e0373e7768435 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -797,10 +797,12 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return get(Opcode).TSFlags & SIInstrFlags::Spill;
   }
 
-  static bool isSpill(const MachineInstr &MI) {
-return MI.getDesc().TSFlags & SIInstrFlags::Spill;
+  static bool isSpill(const MCInstrDesc &Desc) {
+return Desc.TSFlags & SIInstrFlags::Spill;
   }
 
+  static bool isSpill(const MachineInstr &MI) { return isSpill(MI.getDesc()); }
+
   static bool isWWMRegSpillOpcode(uint16_t Opcode) {
 return Opcode == AMDGPU::SI_SPILL_WWM_V32_SAVE ||
Opcode == AMDGPU::SI_SPILL_WWM_AV32_SAVE ||

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo (PR #158224)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/158224

Both conceptually belong to the same subtarget, so it should not
be necessary to pass in the context TargetRegisterInfo to any
TargetInstrInfo member. Add this reference so those superfluous
arguments can be removed.

Most targets placed their TargetRegisterInfo as a member
in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo,
so unify all targets to look the same.

>From 532af14dba99fbaf1ccfbd4ac63e22fce9aa371b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 12 Sep 2025 14:11:48 +0900
Subject: [PATCH] CodeGen: Keep reference to TargetRegisterInfo in
 TargetInstrInfo

Both conceptually belong to the same subtarget, so it should not
be necessary to pass in the context TargetRegisterInfo to any
TargetInstrInfo member. Add this reference so those superfluous
arguments can be removed.

Most targets placed their TargetRegisterInfo as a member
in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo,
so unify all targets to look the same.
---
 llvm/include/llvm/CodeGen/TargetInstrInfo.h   | 11 ++-
 llvm/lib/CodeGen/TargetInstrInfo.cpp  | 68 ---
 llvm/lib/Target/AArch64/AArch64InstrInfo.cpp  |  2 +-
 llvm/lib/Target/AMDGPU/R600InstrInfo.cpp  |  2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|  3 +-
 llvm/lib/Target/ARC/ARCInstrInfo.cpp  |  3 +-
 llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp  |  5 +-
 llvm/lib/Target/ARM/ARMBaseInstrInfo.h|  9 ++-
 llvm/lib/Target/ARM/ARMInstrInfo.cpp  |  3 +-
 llvm/lib/Target/ARM/ARMInstrInfo.h|  2 +-
 llvm/lib/Target/ARM/Thumb1InstrInfo.cpp   |  2 +-
 llvm/lib/Target/ARM/Thumb1InstrInfo.h |  2 +-
 llvm/lib/Target/ARM/Thumb2InstrInfo.cpp   |  2 +-
 llvm/lib/Target/ARM/Thumb2InstrInfo.h |  2 +-
 llvm/lib/Target/AVR/AVRInstrInfo.cpp  |  4 +-
 llvm/lib/Target/BPF/BPFInstrInfo.cpp  |  2 +-
 llvm/lib/Target/CSKY/CSKYInstrInfo.cpp|  2 +-
 llvm/lib/Target/DirectX/DirectXInstrInfo.cpp  |  2 +-
 llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp  |  4 +-
 llvm/lib/Target/Hexagon/HexagonInstrInfo.h|  5 ++
 llvm/lib/Target/Hexagon/HexagonSubtarget.cpp  |  3 +-
 llvm/lib/Target/Hexagon/HexagonSubtarget.h|  3 +-
 llvm/lib/Target/Lanai/LanaiInstrInfo.cpp  |  3 +-
 .../Target/LoongArch/LoongArchInstrInfo.cpp   |  4 +-
 .../lib/Target/LoongArch/LoongArchInstrInfo.h |  4 ++
 .../Target/LoongArch/LoongArchSubtarget.cpp   |  2 +-
 .../lib/Target/LoongArch/LoongArchSubtarget.h |  3 +-
 llvm/lib/Target/MSP430/MSP430InstrInfo.cpp|  3 +-
 llvm/lib/Target/Mips/Mips16InstrInfo.cpp  |  6 +-
 llvm/lib/Target/Mips/Mips16InstrInfo.h|  2 +-
 llvm/lib/Target/Mips/MipsInstrInfo.cpp|  5 +-
 llvm/lib/Target/Mips/MipsInstrInfo.h  |  8 ++-
 llvm/lib/Target/Mips/MipsSEInstrInfo.cpp  |  6 +-
 llvm/lib/Target/Mips/MipsSEInstrInfo.h|  2 +-
 llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp  |  2 +-
 llvm/lib/Target/PowerPC/PPCInstrInfo.cpp  |  2 +-
 llvm/lib/Target/RISCV/RISCVInstrInfo.cpp  |  5 +-
 llvm/lib/Target/RISCV/RISCVInstrInfo.h|  3 +
 llvm/lib/Target/RISCV/RISCVSubtarget.cpp  |  2 +-
 llvm/lib/Target/RISCV/RISCVSubtarget.h|  3 +-
 llvm/lib/Target/SPIRV/SPIRVInstrInfo.cpp  |  2 +-
 llvm/lib/Target/Sparc/SparcInstrInfo.cpp  |  4 +-
 llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp  |  2 +-
 llvm/lib/Target/VE/VEInstrInfo.cpp|  2 +-
 .../WebAssembly/WebAssemblyInstrInfo.cpp  |  2 +-
 llvm/lib/Target/X86/X86InstrInfo.cpp  |  2 +-
 llvm/lib/Target/XCore/XCoreInstrInfo.cpp  |  2 +-
 llvm/lib/Target/Xtensa/XtensaInstrInfo.cpp|  3 +-
 llvm/unittests/CodeGen/MFCommon.inc   |  4 +-
 llvm/utils/TableGen/InstrInfoEmitter.cpp  | 12 ++--
 50 files changed, 127 insertions(+), 114 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h 
b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 6a624a7052cdd..802cca6022074 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -113,9 +113,12 @@ struct ExtAddrMode {
 ///
 class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
 protected:
-  TargetInstrInfo(unsigned CFSetupOpcode = ~0u, unsigned CFDestroyOpcode = ~0u,
-  unsigned CatchRetOpcode = ~0u, unsigned ReturnOpcode = ~0u)
-  : CallFrameSetupOpcode(CFSetupOpcode),
+  const TargetRegisterInfo &TRI;
+
+  TargetInstrInfo(const TargetRegisterInfo &TRI, unsigned CFSetupOpcode = ~0u,
+  unsigned CFDestroyOpcode = ~0u, unsigned CatchRetOpcode = 
~0u,
+  unsigned ReturnOpcode = ~0u)
+  : TRI(TRI), CallFrameSetupOpcode(CFSetupOpcode),
 CallFrameDestroyOpcode(CFDestroyOpcode), 
CatchRetOpcode(CatchRetOpcode),
 ReturnOpcode(ReturnOpcode) {}
 
@@ -124,6 +127,8 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
   TargetInstrInfo &operator=(co

[llvm-branch-commits] [compiler-rt] Backport AArch64 sanitizer fixes to 21.x. (PR #157848)

2025-09-18 Thread Michał Górny via llvm-branch-commits

https://github.com/mgorny milestoned 
https://github.com/llvm/llvm-project/pull/157848
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/158722

>From 6976910364aa2fe18603aefcb27b10bd0120513d Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Mon, 15 Sep 2025 20:35:29 +0100
Subject: [PATCH 1/7] Add complex.powi op.

---
 flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 20 ++--
 .../Transforms/ConvertComplexPow.cpp  | 94 +--
 flang/test/Lower/HLFIR/binary-ops.f90 |  2 +-
 .../test/Lower/Intrinsics/pow_complex16i.f90  |  2 +-
 .../test/Lower/Intrinsics/pow_complex16k.f90  |  2 +-
 flang/test/Lower/amdgcn-complex.f90   |  9 ++
 flang/test/Lower/power-operator.f90   |  9 +-
 .../mlir/Dialect/Complex/IR/ComplexOps.td | 26 +
 .../ComplexToROCDLLibraryCalls.cpp| 41 +++-
 .../Transforms/AlgebraicSimplification.cpp| 24 +++--
 .../Dialect/Math/Transforms/CMakeLists.txt|  1 +
 .../complex-to-rocdl-library-calls.mlir   | 14 +++
 mlir/test/Dialect/Complex/powi-simplify.mlir  | 20 
 13 files changed, 188 insertions(+), 76 deletions(-)
 create mode 100644 mlir/test/Dialect/Complex/powi-simplify.mlir

diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp 
b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index 466458c05dba7..74a4e8f85c8ff 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -1331,14 +1331,20 @@ mlir::Value genComplexPow(fir::FirOpBuilder &builder, 
mlir::Location loc,
 return genLibCall(builder, loc, mathOp, mathLibFuncType, args);
   auto complexTy = mlir::cast(mathLibFuncType.getInput(0));
   mlir::Value exp = args[1];
-  if (!mlir::isa(exp.getType())) {
-auto realTy = complexTy.getElementType();
-mlir::Value realExp = builder.createConvert(loc, realTy, exp);
-mlir::Value zero = builder.createRealConstant(loc, realTy, 0);
-exp =
-builder.create(loc, complexTy, realExp, zero);
+  mlir::Value result;
+  if (mlir::isa(exp.getType()) ||
+  mlir::isa(exp.getType())) {
+result = builder.create(loc, args[0], exp);
+  } else {
+if (!mlir::isa(exp.getType())) {
+  auto realTy = complexTy.getElementType();
+  mlir::Value realExp = builder.createConvert(loc, realTy, exp);
+  mlir::Value zero = builder.createRealConstant(loc, realTy, 0);
+  exp = builder.create(loc, complexTy, realExp,
+zero);
+}
+result = builder.create(loc, args[0], exp);
   }
-  mlir::Value result = builder.create(loc, args[0], exp);
   result = builder.createConvert(loc, mathLibFuncType.getResult(0), result);
   return result;
 }
diff --git a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp 
b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
index 78f9d9e4f639a..d76451459def9 100644
--- a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
+++ b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
@@ -58,63 +58,57 @@ void ConvertComplexPowPass::runOnOperation() {
   ModuleOp mod = getOperation();
   fir::FirOpBuilder builder(mod, fir::getKindMapping(mod));
 
-  mod.walk([&](complex::PowOp op) {
+  mod.walk([&](complex::PowiOp op) {
 builder.setInsertionPoint(op);
 Location loc = op.getLoc();
 auto complexTy = cast(op.getType());
 auto elemTy = complexTy.getElementType();
-
 Value base = op.getLhs();
-Value rhs = op.getRhs();
-
-Value intExp;
-if (auto create = rhs.getDefiningOp()) {
-  if (isZero(create.getImaginary())) {
-if (auto conv = create.getReal().getDefiningOp()) {
-  if (auto intTy = dyn_cast(conv.getValue().getType()))
-intExp = conv.getValue();
-}
-  }
-}
-
+Value intExp = op.getRhs();
 func::FuncOp callee;
-SmallVector args;
-if (intExp) {
-  unsigned realBits = cast(elemTy).getWidth();
-  unsigned intBits = cast(intExp.getType()).getWidth();
-  auto funcTy = builder.getFunctionType(
-  {complexTy, builder.getIntegerType(intBits)}, {complexTy});
-  if (realBits == 32 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
-  else if (realBits == 32 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
-  else if (realBits == 64 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
-  else if (realBits == 64 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
-  else if (realBits == 128 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
-  else if (realBits == 128 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
-  else
-return;
-  args = {base, intExp};
-} else {
-  unsigned realBits = cast(elemTy).getWidth();
-  auto funcTy =
-  builder.getFunctionType({complexTy, complexTy}, {complexTy});
-  

[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits


@@ -1272,7 +1272,18 @@ mlir::Value genMathOp(fir::FirOpBuilder &builder, 
mlir::Location loc,
 LLVM_DEBUG(llvm::dbgs() << "Generating '" << mathLibFuncName
 << "' operation with type ";
mathLibFuncType.dump(); llvm::dbgs() << "\n");
-result = T::create(builder, loc, args);
+if constexpr (std::is_same_v) {
+  auto resultType = mathLibFuncType.getResult(0);
+  result = T::create(builder, loc, resultType, args);
+} else if constexpr (std::is_same_v) {
+  auto resultType = mathLibFuncType.getResult(0);
+  auto fmfAttr = mlir::arith::FastMathFlagsAttr::get(
+  builder.getContext(), builder.getFastMathFlags());
+  result = builder.create(loc, resultType, args[0],
+ args[1], fmfAttr);
+} else {

TIFitis wrote:

You're right, I've simplified it. Thanks for catching.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits


@@ -175,12 +176,20 @@ PowIStrengthReduction::matchAndRewrite(
 
   Value one;
   Type opType = getElementTypeOrSelf(op.getType());
-  if constexpr (std::is_same_v)
+  if constexpr (std::is_same_v) {
 one = arith::ConstantOp::create(rewriter, loc,
 rewriter.getFloatAttr(opType, 1.0));
-  else
+  } else if constexpr (std::is_same_v) {
+auto complexTy = cast(opType);
+Type elementType = complexTy.getElementType();
+auto realPart = rewriter.getFloatAttr(elementType, 1.0);
+auto imagPart = rewriter.getFloatAttr(elementType, 0.0);
+one = rewriter.create(

TIFitis wrote:

Done.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)

2025-09-18 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl edited 
https://github.com/llvm/llvm-project/pull/157754
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)

2025-09-18 Thread via llvm-branch-commits

github-actions[bot] wrote:

⚠️ We detected that you are using a GitHub private e-mail address to contribute 
to the repo. Please turn off [Keep my email addresses 
private](https://github.com/settings/emails) setting in your account. See 
[LLVM Developer 
Policy](https://llvm.org/docs/DeveloperPolicy.html#email-addresses) and [LLVM 
Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it)
 for more information.

https://github.com/llvm/llvm-project/pull/159551
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits


@@ -1274,6 +1274,12 @@ def AllocaWithAlignUninitialized : Builtin {
   let Prototype = "void*(size_t, _Constant size_t)";
 }
 
+def AllocTokenInfer : Builtin {
+  let Spellings = ["__builtin_alloc_token_infer"];

melver wrote:

Renaming to __builtin_infer_alloc_token

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156840

>From 14c75441e84aa32e4f5876598b9a2c59d4ecbe65 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 21:32:21 +0200
Subject: [PATCH 1/2] fixup! fix for incomplete types

Created using spr 1.3.8-beta.1
---
 clang/lib/CodeGen/CGExpr.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 288b41bc42203..455de644daf00 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1289,6 +1289,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   // Check if QualType contains a pointer. Implements a simple DFS to
   // recursively check if a type contains a pointer type.
   llvm::SmallPtrSet VisitedRD;
+  bool IncompleteType = false;
   auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
 QualType CanonicalType = T.getCanonicalType();
 if (CanonicalType->isPointerType())
@@ -1312,6 +1313,10 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   return self(self, AT->getElementType());
 // The type is a struct, class, or union.
 if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
+  if (!RD->isCompleteDefinition()) {
+IncompleteType = true;
+return false;
+  }
   if (!VisitedRD.insert(RD).second)
 return false; // already visited
   // Check all fields.
@@ -1333,6 +1338,8 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
 return false;
   };
   const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
+  if (!ContainsPtr && IncompleteType)
+return nullptr;
   auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
   auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
 

>From 7f706618ddc40375d4085bc2ebe03f02ec78823a Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 21:58:01 +0200
Subject: [PATCH 2/2] fixup!

Created using spr 1.3.8-beta.1
---
 clang/lib/CodeGen/CGExpr.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 455de644daf00..e7a0e7696e204 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1339,7 +1339,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   };
   const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
   if (!ContainsPtr && IncompleteType)
-return nullptr;
+return;
   auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
   auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156842

>From 48227c8f7712b2dc807b252d18353c91905b1fb5 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 17:19:04 +0200
Subject: [PATCH] fixup!

Created using spr 1.3.8-beta.1
---
 llvm/lib/Transforms/Instrumentation/AllocToken.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp 
b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
index d5ac3035df71b..3a28705d87523 100644
--- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
@@ -151,7 +151,8 @@ STATISTIC(NumAllocations, "Allocations found");
 /// Expected format is: !{, }
 MDNode *getAllocTokenHintMetadata(const CallBase &CB) {
   MDNode *Ret = nullptr;
-  if (auto *II = dyn_cast(&CB)) {
+  if (auto *II = dyn_cast(&CB);
+  II && II->getIntrinsicID() == Intrinsic::alloc_token_id) {
 auto *MDV = cast(II->getArgOperand(0));
 Ret = cast(MDV->getMetadata());
 // If the intrinsic has an empty MDNode, type inference failed.
@@ -358,7 +359,7 @@ bool AllocToken::instrumentFunction(Function &F) {
   // Collect all allocation calls to avoid iterator invalidation.
   for (Instruction &I : instructions(F)) {
 // Collect all alloc_token_* intrinsics.
-if (IntrinsicInst *II = dyn_cast(&I);
+if (auto *II = dyn_cast(&I);
 II && II->getIntrinsicID() == Intrinsic::alloc_token_id) {
   IntrinsicInsts.emplace_back(II);
   continue;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156842

>From 48227c8f7712b2dc807b252d18353c91905b1fb5 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 17:19:04 +0200
Subject: [PATCH] fixup!

Created using spr 1.3.8-beta.1
---
 llvm/lib/Transforms/Instrumentation/AllocToken.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp 
b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
index d5ac3035df71b..3a28705d87523 100644
--- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
@@ -151,7 +151,8 @@ STATISTIC(NumAllocations, "Allocations found");
 /// Expected format is: !{, }
 MDNode *getAllocTokenHintMetadata(const CallBase &CB) {
   MDNode *Ret = nullptr;
-  if (auto *II = dyn_cast(&CB)) {
+  if (auto *II = dyn_cast(&CB);
+  II && II->getIntrinsicID() == Intrinsic::alloc_token_id) {
 auto *MDV = cast(II->getArgOperand(0));
 Ret = cast(MDV->getMetadata());
 // If the intrinsic has an empty MDNode, type inference failed.
@@ -358,7 +359,7 @@ bool AllocToken::instrumentFunction(Function &F) {
   // Collect all allocation calls to avoid iterator invalidation.
   for (Instruction &I : instructions(F)) {
 // Collect all alloc_token_* intrinsics.
-if (IntrinsicInst *II = dyn_cast(&I);
+if (auto *II = dyn_cast(&I);
 II && II->getIntrinsicID() == Intrinsic::alloc_token_id) {
   IntrinsicInsts.emplace_back(II);
   continue;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156841


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156841


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156839

>From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Sep 2025 12:07:26 +0200
Subject: [PATCH] fixup! Insert AllocToken into index.rst

Created using spr 1.3.8-beta.1
---
 clang/docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index be654af57f890..aa2b3a73dc11b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -40,6 +40,7 @@ Using Clang as a Compiler
SanitizerCoverage
SanitizerStats
SanitizerSpecialCaseList
+   AllocToken
BoundsSafety
BoundsSafetyAdoptionGuide
BoundsSafetyImplPlans

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156839

>From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Sep 2025 12:07:26 +0200
Subject: [PATCH] fixup! Insert AllocToken into index.rst

Created using spr 1.3.8-beta.1
---
 clang/docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index be654af57f890..aa2b3a73dc11b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -40,6 +40,7 @@ Using Clang as a Compiler
SanitizerCoverage
SanitizerStats
SanitizerSpecialCaseList
+   AllocToken
BoundsSafety
BoundsSafetyAdoptionGuide
BoundsSafetyImplPlans

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Generate [x]vldi instructions with special constant splats (PR #159258)

2025-09-18 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/159258

>From e1a23dd6e31734b05af239bb827a280d403564ee Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Wed, 17 Sep 2025 10:20:46 +0800
Subject: [PATCH 1/3] [LoongArch] Generate [x]vldi instructions with special
 constant splats

---
 .../LoongArch/LoongArchISelDAGToDAG.cpp   | 52 +++
 .../LoongArch/LoongArchISelLowering.cpp   | 87 ++-
 .../Target/LoongArch/LoongArchISelLowering.h  |  5 ++
 .../CodeGen/LoongArch/lasx/build-vector.ll| 80 +
 .../lasx/fdiv-reciprocal-estimate.ll  | 87 +++
 .../lasx/fsqrt-reciprocal-estimate.ll | 39 +++--
 llvm/test/CodeGen/LoongArch/lasx/fsqrt.ll |  3 +-
 .../LoongArch/lasx/ir-instruction/fdiv.ll |  3 +-
 llvm/test/CodeGen/LoongArch/lasx/vselect.ll   | 31 +++
 .../CodeGen/LoongArch/lsx/build-vector.ll | 77 +---
 .../LoongArch/lsx/fdiv-reciprocal-estimate.ll | 87 +++
 .../lsx/fsqrt-reciprocal-estimate.ll  | 70 +--
 llvm/test/CodeGen/LoongArch/lsx/fsqrt.ll  |  3 +-
 .../LoongArch/lsx/ir-instruction/fdiv.ll  |  3 +-
 llvm/test/CodeGen/LoongArch/lsx/vselect.ll| 31 +++
 15 files changed, 289 insertions(+), 369 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
index 07e722b9a6591..fda313e693760 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
@@ -113,10 +113,11 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
 APInt SplatValue, SplatUndef;
 unsigned SplatBitSize;
 bool HasAnyUndefs;
-unsigned Op;
+unsigned Op = 0;
 EVT ResTy = BVN->getValueType(0);
 bool Is128Vec = BVN->getValueType(0).is128BitVector();
 bool Is256Vec = BVN->getValueType(0).is256BitVector();
+SDNode *Res;
 
 if (!Subtarget->hasExtLSX() || (!Is128Vec && !Is256Vec))
   break;
@@ -124,26 +125,25 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
   HasAnyUndefs, 8))
   break;
 
-switch (SplatBitSize) {
-default:
-  break;
-case 8:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B;
-  break;
-case 16:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H;
-  break;
-case 32:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W;
-  break;
-case 64:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D;
-  break;
-}
-
-SDNode *Res;
 // If we have a signed 10 bit integer, we can splat it directly.
 if (SplatValue.isSignedIntN(10)) {
+  switch (SplatBitSize) {
+  default:
+break;
+  case 8:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B;
+break;
+  case 16:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H;
+break;
+  case 32:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W;
+break;
+  case 64:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D;
+break;
+  }
+
   EVT EleType = ResTy.getVectorElementType();
   APInt Val = SplatValue.sextOrTrunc(EleType.getSizeInBits());
   SDValue Imm = CurDAG->getTargetConstant(Val, DL, EleType);
@@ -151,6 +151,20 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
   ReplaceNode(Node, Res);
   return;
 }
+
+// Select appropriate [x]vldi instructions for some special constant 
splats,
+// where the immediate value `imm[12] == 1` for used [x]vldi instructions.
+std::pair ConvertVLDI =
+LoongArchTargetLowering::isImmVLDILegalForMode1(SplatValue,
+SplatBitSize);
+if (ConvertVLDI.first) {
+  Op = Is256Vec ? LoongArch::XVLDI : LoongArch::VLDI;
+  SDValue Imm = CurDAG->getSignedTargetConstant(
+  SignExtend32<13>(ConvertVLDI.second), DL, MVT::i32);
+  Res = CurDAG->getMachineNode(Op, DL, ResTy, Imm);
+  ReplaceNode(Node, Res);
+  return;
+}
 break;
   }
   }
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index e8668860c2b38..460e2d7c87af7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -2679,9 +2679,10 @@ SDValue 
LoongArchTargetLowering::lowerBUILD_VECTOR(SDValue Op,
 
 if (SplatBitSize == 64 && !Subtarget.is64Bit()) {
   // We can only handle 64-bit elements that are within
-  // the signed 10-bit range on 32-bit targets.
+  // the signed 10-bit range or match vldi patterns on 32-bit targets.
   // See the BUILD_VECTOR case in LoongArchDAGToDAGISel::Select().
- 

[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-18 Thread Marco Elver via llvm-branch-commits


@@ -1349,6 +1350,98 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
 }
 
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+  if (UET->isArgumentType()) {
+return UET->getArgumentTypeInfo()->getType();
+  } else {
+return UET->getArgumentExpr()->getType();
+  }
+}
+  }
+  return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  // The argument is a lone sizeof expression.
+  QualType QT = inferTypeFromSizeofExpr(Arg);
+  if (!QT.isNull())
+return QT;
+  if (const auto *BO = dyn_cast(Arg)) {
+// Argument is an arithmetic expression. Cover common arithmetic patterns
+// involving sizeof.
+switch (BO->getOpcode()) {
+case BO_Add:
+case BO_Div:
+case BO_Mul:
+case BO_Shl:
+case BO_Shr:
+case BO_Sub:
+  QT = inferTypeFromArithSizeofExpr(BO->getLHS());

melver wrote:

The Linux kernel has structs with flexible array members, and it's not uncommon 
to see this:
```
struct A {
  int len;
  struct Foo *foo;
  int array[];
};

... = kmalloc(sizeof(A) + sizeof(int) * N, ...);
```

I'm willing to accept some degree of unsoundness in complex cases to get 
completeness here, but am assuming that in the majority of cases the first type 
is the one we want to pick.

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/158722

>From 6976910364aa2fe18603aefcb27b10bd0120513d Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Mon, 15 Sep 2025 20:35:29 +0100
Subject: [PATCH 1/6] Add complex.powi op.

---
 flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 20 ++--
 .../Transforms/ConvertComplexPow.cpp  | 94 +--
 flang/test/Lower/HLFIR/binary-ops.f90 |  2 +-
 .../test/Lower/Intrinsics/pow_complex16i.f90  |  2 +-
 .../test/Lower/Intrinsics/pow_complex16k.f90  |  2 +-
 flang/test/Lower/amdgcn-complex.f90   |  9 ++
 flang/test/Lower/power-operator.f90   |  9 +-
 .../mlir/Dialect/Complex/IR/ComplexOps.td | 26 +
 .../ComplexToROCDLLibraryCalls.cpp| 41 +++-
 .../Transforms/AlgebraicSimplification.cpp| 24 +++--
 .../Dialect/Math/Transforms/CMakeLists.txt|  1 +
 .../complex-to-rocdl-library-calls.mlir   | 14 +++
 mlir/test/Dialect/Complex/powi-simplify.mlir  | 20 
 13 files changed, 188 insertions(+), 76 deletions(-)
 create mode 100644 mlir/test/Dialect/Complex/powi-simplify.mlir

diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp 
b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index 466458c05dba7..74a4e8f85c8ff 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -1331,14 +1331,20 @@ mlir::Value genComplexPow(fir::FirOpBuilder &builder, 
mlir::Location loc,
 return genLibCall(builder, loc, mathOp, mathLibFuncType, args);
   auto complexTy = mlir::cast(mathLibFuncType.getInput(0));
   mlir::Value exp = args[1];
-  if (!mlir::isa(exp.getType())) {
-auto realTy = complexTy.getElementType();
-mlir::Value realExp = builder.createConvert(loc, realTy, exp);
-mlir::Value zero = builder.createRealConstant(loc, realTy, 0);
-exp =
-builder.create(loc, complexTy, realExp, zero);
+  mlir::Value result;
+  if (mlir::isa(exp.getType()) ||
+  mlir::isa(exp.getType())) {
+result = builder.create(loc, args[0], exp);
+  } else {
+if (!mlir::isa(exp.getType())) {
+  auto realTy = complexTy.getElementType();
+  mlir::Value realExp = builder.createConvert(loc, realTy, exp);
+  mlir::Value zero = builder.createRealConstant(loc, realTy, 0);
+  exp = builder.create(loc, complexTy, realExp,
+zero);
+}
+result = builder.create(loc, args[0], exp);
   }
-  mlir::Value result = builder.create(loc, args[0], exp);
   result = builder.createConvert(loc, mathLibFuncType.getResult(0), result);
   return result;
 }
diff --git a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp 
b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
index 78f9d9e4f639a..d76451459def9 100644
--- a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
+++ b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp
@@ -58,63 +58,57 @@ void ConvertComplexPowPass::runOnOperation() {
   ModuleOp mod = getOperation();
   fir::FirOpBuilder builder(mod, fir::getKindMapping(mod));
 
-  mod.walk([&](complex::PowOp op) {
+  mod.walk([&](complex::PowiOp op) {
 builder.setInsertionPoint(op);
 Location loc = op.getLoc();
 auto complexTy = cast(op.getType());
 auto elemTy = complexTy.getElementType();
-
 Value base = op.getLhs();
-Value rhs = op.getRhs();
-
-Value intExp;
-if (auto create = rhs.getDefiningOp()) {
-  if (isZero(create.getImaginary())) {
-if (auto conv = create.getReal().getDefiningOp()) {
-  if (auto intTy = dyn_cast(conv.getValue().getType()))
-intExp = conv.getValue();
-}
-  }
-}
-
+Value intExp = op.getRhs();
 func::FuncOp callee;
-SmallVector args;
-if (intExp) {
-  unsigned realBits = cast(elemTy).getWidth();
-  unsigned intBits = cast(intExp.getType()).getWidth();
-  auto funcTy = builder.getFunctionType(
-  {complexTy, builder.getIntegerType(intBits)}, {complexTy});
-  if (realBits == 32 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
-  else if (realBits == 32 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
-  else if (realBits == 64 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
-  else if (realBits == 64 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
-  else if (realBits == 128 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
-  else if (realBits == 128 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
-  else
-return;
-  args = {base, intExp};
-} else {
-  unsigned realBits = cast(elemTy).getWidth();
-  auto funcTy =
-  builder.getFunctionType({complexTy, complexTy}, {complexTy});
-  

[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies for epilogue (PR #159163)

2025-09-18 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl edited 
https://github.com/llvm/llvm-project/pull/159163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Generate [x]vldi instructions with special constant splats (PR #159258)

2025-09-18 Thread Zhaoxin Yang via llvm-branch-commits

https://github.com/ylzsx updated 
https://github.com/llvm/llvm-project/pull/159258

>From e1a23dd6e31734b05af239bb827a280d403564ee Mon Sep 17 00:00:00 2001
From: yangzhaoxin 
Date: Wed, 17 Sep 2025 10:20:46 +0800
Subject: [PATCH 1/3] [LoongArch] Generate [x]vldi instructions with special
 constant splats

---
 .../LoongArch/LoongArchISelDAGToDAG.cpp   | 52 +++
 .../LoongArch/LoongArchISelLowering.cpp   | 87 ++-
 .../Target/LoongArch/LoongArchISelLowering.h  |  5 ++
 .../CodeGen/LoongArch/lasx/build-vector.ll| 80 +
 .../lasx/fdiv-reciprocal-estimate.ll  | 87 +++
 .../lasx/fsqrt-reciprocal-estimate.ll | 39 +++--
 llvm/test/CodeGen/LoongArch/lasx/fsqrt.ll |  3 +-
 .../LoongArch/lasx/ir-instruction/fdiv.ll |  3 +-
 llvm/test/CodeGen/LoongArch/lasx/vselect.ll   | 31 +++
 .../CodeGen/LoongArch/lsx/build-vector.ll | 77 +---
 .../LoongArch/lsx/fdiv-reciprocal-estimate.ll | 87 +++
 .../lsx/fsqrt-reciprocal-estimate.ll  | 70 +--
 llvm/test/CodeGen/LoongArch/lsx/fsqrt.ll  |  3 +-
 .../LoongArch/lsx/ir-instruction/fdiv.ll  |  3 +-
 llvm/test/CodeGen/LoongArch/lsx/vselect.ll| 31 +++
 15 files changed, 289 insertions(+), 369 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
index 07e722b9a6591..fda313e693760 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp
@@ -113,10 +113,11 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
 APInt SplatValue, SplatUndef;
 unsigned SplatBitSize;
 bool HasAnyUndefs;
-unsigned Op;
+unsigned Op = 0;
 EVT ResTy = BVN->getValueType(0);
 bool Is128Vec = BVN->getValueType(0).is128BitVector();
 bool Is256Vec = BVN->getValueType(0).is256BitVector();
+SDNode *Res;
 
 if (!Subtarget->hasExtLSX() || (!Is128Vec && !Is256Vec))
   break;
@@ -124,26 +125,25 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
   HasAnyUndefs, 8))
   break;
 
-switch (SplatBitSize) {
-default:
-  break;
-case 8:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B;
-  break;
-case 16:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H;
-  break;
-case 32:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W;
-  break;
-case 64:
-  Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D;
-  break;
-}
-
-SDNode *Res;
 // If we have a signed 10 bit integer, we can splat it directly.
 if (SplatValue.isSignedIntN(10)) {
+  switch (SplatBitSize) {
+  default:
+break;
+  case 8:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B;
+break;
+  case 16:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H;
+break;
+  case 32:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W;
+break;
+  case 64:
+Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D;
+break;
+  }
+
   EVT EleType = ResTy.getVectorElementType();
   APInt Val = SplatValue.sextOrTrunc(EleType.getSizeInBits());
   SDValue Imm = CurDAG->getTargetConstant(Val, DL, EleType);
@@ -151,6 +151,20 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) {
   ReplaceNode(Node, Res);
   return;
 }
+
+// Select appropriate [x]vldi instructions for some special constant 
splats,
+// where the immediate value `imm[12] == 1` for used [x]vldi instructions.
+std::pair ConvertVLDI =
+LoongArchTargetLowering::isImmVLDILegalForMode1(SplatValue,
+SplatBitSize);
+if (ConvertVLDI.first) {
+  Op = Is256Vec ? LoongArch::XVLDI : LoongArch::VLDI;
+  SDValue Imm = CurDAG->getSignedTargetConstant(
+  SignExtend32<13>(ConvertVLDI.second), DL, MVT::i32);
+  Res = CurDAG->getMachineNode(Op, DL, ResTy, Imm);
+  ReplaceNode(Node, Res);
+  return;
+}
 break;
   }
   }
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index e8668860c2b38..460e2d7c87af7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -2679,9 +2679,10 @@ SDValue 
LoongArchTargetLowering::lowerBUILD_VECTOR(SDValue Op,
 
 if (SplatBitSize == 64 && !Subtarget.is64Bit()) {
   // We can only handle 64-bit elements that are within
-  // the signed 10-bit range on 32-bit targets.
+  // the signed 10-bit range or match vldi patterns on 32-bit targets.
   // See the BUILD_VECTOR case in LoongArchDAGToDAGISel::Select().
- 

[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)

2025-09-18 Thread Ross Brunton via llvm-branch-commits

https://github.com/RossBrunton converted_to_draft 
https://github.com/llvm/llvm-project/pull/157484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146076

>From 3b0c210862015dc304004641990fea429f8e31c7 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 05:38:52 -0400
Subject: [PATCH 1/3] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by
 default

Also removes the command line option to control this feature.

There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
  since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
  some transformations that were sometimes harmful.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  10 +-
 .../identical-subrange-spill-infloop.ll   | 352 +++---
 .../AMDGPU/infer-addrspace-flat-atomic.ll |  14 +-
 llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll  |   8 +-
 .../AMDGPU/lower-module-lds-via-hybrid.ll |   4 +-
 .../AMDGPU/lower-module-lds-via-table.ll  |  16 +-
 .../match-perm-extract-vector-elt-bug.ll  |  22 +-
 llvm/test/CodeGen/AMDGPU/memmove-var-size.ll  |  16 +-
 .../AMDGPU/preload-implicit-kernargs.ll   |   6 +-
 .../AMDGPU/promote-constOffset-to-imm.ll  |   8 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |   7 +-
 .../AMDGPU/ptradd-sdag-optimizations.ll   |  94 ++---
 .../AMDGPU/ptradd-sdag-undef-poison.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   |  27 +-
 llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll |  29 +-
 15 files changed, 310 insertions(+), 309 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 78d608556f056..ac3d322ad65c3 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -64,14 +64,6 @@ static cl::opt UseDivergentRegisterIndexing(
 cl::desc("Use indirect register addressing for divergent indexes"),
 cl::init(false));
 
-// TODO: This option should be removed once we switch to always using PTRADD in
-// the SelectionDAG.
-static cl::opt UseSelectionDAGPTRADD(
-"amdgpu-use-sdag-ptradd", cl::Hidden,
-cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the "
- "SelectionDAG ISel"),
-cl::init(false));
-
 static bool denormalModeIsFlushAllF32(const MachineFunction &MF) {
   const SIMachineFunctionInfo *Info = MF.getInfo();
   return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign();
@@ -11473,7 +11465,7 @@ static bool isNoUnsignedWrap(SDValue Addr) {
 
 bool SITargetLowering::shouldPreservePtrArith(const Function &F,
   EVT PtrVT) const {
-  return UseSelectionDAGPTRADD && PtrVT == MVT::i64;
+  return PtrVT == MVT::i64;
 }
 
 bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll 
b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 2c03113e8af47..805cdd37d6e70 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -6,96 +6,150 @@ define void @main(i1 %arg) #0 {
 ; CHECK:   ; %bb.0: ; %bb
 ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill
+; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
 ; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v5, s30, 0
-; CHECK-NEXT:v_writelane_b32 v5, s31, 1
-; CHECK-NEXT:v_writelane_b32 v5, s36, 2
-; CHECK-NEXT:v_writelane_b32 v5, s37, 3
-; CHECK-NEXT:v_writelane_b32 v5, s38, 4
-; CHECK-NEXT:v_writelane_b32 v5, s39, 5
-; CHECK-NEXT:v_writelane_b32 v5, s48, 6
-; CHECK-NEXT:v_writelane_b32 v5, s49, 7
-; CHECK-NEXT:v_writelane_b32 v5, s50, 8
-; CHECK-NEXT:v_writelane_b32 v5, s51, 9
-; CHECK-NEXT:v_writelane_b32 v5, s52, 10
-; CHECK-NEXT:v_writelane_b32 v5, s53, 11
-; CHECK-NEXT:v_writelane_b32 v5, s54, 12
-; CHECK-NEXT:v_writelane_b32 v5, s55, 13
-; CHECK-NEXT:s_getpc_b64 s[24:25]
-; CHECK-NEXT:v_writelane_b32 v5, s64, 14
-; CHECK-NEXT:s_movk_i32 s4, 0xf0
-; CHECK-NEXT:s_mov_b32 s5, s24
-; CHECK-NEXT:v_writelane_b32 v5, s65, 15
-; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0
-; CHECK-NEXT:s_mov_b64 s[4:5], 0
-; CHECK-NEXT:v_writelane_b32 v5, s66, 16
-; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0
-; CHECK-NEXT:v_writelane_b32 v5, s67, 17
-; CHECK-NEXT:s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:s_movk_i32 s6, 0x130
-; CHECK-NEXT:s_mov_b32 s7, s24
-; CHECK-NEXT:v_writelane_b32 v5

[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146074

>From b484d75cff9bd4703dd2c90d041d4df0aefd0e3c Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 26 Jun 2025 06:10:35 -0400
Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD
 transforms

This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.

This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|   7 +
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  59 ++---
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |   3 +
 4 files changed, 94 insertions(+), 100 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 46be271320fdd..4c2d991308d30 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase {
 return false;
   }
 
+  /// True if the target allows transformations of in-bounds pointer
+  /// arithmetic that cause out-of-bounds intermediate results.
+  virtual bool canTransformPtrArithOutOfBounds(const Function &F,
+   EVT PtrVT) const {
+return false;
+  }
+
   /// Does this target support complex deinterleaving
   virtual bool isComplexDeinterleavingSupported() const { return false; }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 77bc47f28fc80..67db08c3f9bac 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) {
   if (PtrVT == IntVT && isNullConstant(N0))
 return N1;
 
-  if (N0.getOpcode() != ISD::PTRADD ||
-  reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1))
-return SDValue();
-
-  SDValue X = N0.getOperand(0);
-  SDValue Y = N0.getOperand(1);
-  SDValue Z = N1;
-  bool N0OneUse = N0.hasOneUse();
-  bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
-  bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
-
-  // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
-  //   * y is a constant and (ptradd x, y) has one use; or
-  //   * y and z are both constants.
-  if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
-// If both additions in the original were NUW, the new ones are as well.
-SDNodeFlags Flags =
-(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
-SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
-AddToWorklist(Add.getNode());
-return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+  if (N0.getOpcode() == ISD::PTRADD &&
+  !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) 
{
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+SDValue Z = N1;
+bool N0OneUse = N0.hasOneUse();
+bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
+bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
+
+// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
+//   * y is a constant and (ptradd x, y) has one use; or
+//   * y and z are both constants.
+if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
+  // If both additions in the original were NUW, the new ones are as well.
+  SDNodeFlags Flags =
+  (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
+  SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
+  AddToWorklist(Add.getNode());
+  return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+}
+  }
+
+  // The following combines can turn in-bounds pointer arithmetic out of 
bounds.
+  // That is problematic for settings like AArch64's CPA, which checks that
+  // intermediate results of pointer arithmetic remain in bounds. The target
+  // therefore needs to opt-in to enable them.
+  if (!TLI.canTransformPtrArithOutOfBounds(
+  DAG.getMachineFunction().getFunction(), PtrVT))
+return SDValue();
+
+  if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) {
+// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with
+// global address GA and constant c, such that c can be folded into GA.
+SDValue GAValue = N0.getOperand(0);
+if (const GlobalAddressSDNode *GA =
+dyn_cast(GAValue)) {
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (!LegalOperations && TLI.

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145330

>From da5b337fef36cdee209845b51bba323e84272334 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 04:03:53 -0400
Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special
 cases

There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
---
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  21 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |   6 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  67 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 196 ++
 6 files changed, 105 insertions(+), 194 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 93ddba93b8034..42d3b36f222d7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8600,7 +8600,7 @@ static bool isMemSrcFromConstant(SDValue Src, 
ConstantDataArraySlice &Slice) {
   GlobalAddressSDNode *G = nullptr;
   if (Src.getOpcode() == ISD::GlobalAddress)
 G = cast(Src);
-  else if (Src.getOpcode() == ISD::ADD &&
+  else if (Src->isAnyAdd() &&
Src.getOperand(0).getOpcode() == ISD::GlobalAddress &&
Src.getOperand(1).getOpcode() == ISD::Constant) {
 G = cast(Src.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 177aa0d11ff90..7465c9b310cb9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned 
BitWidth,
   // operands on the new node are also disjoint.
   SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint
  : SDNodeFlags::None);
+  unsigned Opcode = Op.getOpcode();
+  if (Opcode == ISD::PTRADD) {
+// It isn't a ptradd anymore if it doesn't operate on the entire
+// pointer.
+Opcode = ISD::ADD;
+  }
   SDValue X = DAG.getNode(
-  Op.getOpcode(), dl, SmallVT,
+  Opcode, dl, SmallVT,
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags);
   assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
@@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits(
   return TLO.CombineTo(Op, And1);
 }
 [[fallthrough]];
+  case ISD::PTRADD:
+if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType())
+  break;
+// PTRADD behaves like ADD if pointers are represented as integers.
+[[fallthrough]];
   case ISD::ADD:
   case ISD::SUB: {
 // Add, Sub, and Mul don't demand any bits in positions beyond that
@@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 if (Op.getOpcode() == ISD::MUL) {
   Known = KnownBits::mul(KnownOp0, KnownOp1);
-} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB.
+} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB.
   Known = KnownBits::computeForAddSub(
-  Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(),
-  Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1);
+  Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(),
+  KnownOp0, KnownOp1);
 }
 break;
   }
@@ -5679,7 +5690,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const 
GlobalValue *&GA,
 return true;
   }
 
-  if (N->getOpcode() == ISD::ADD) {
+  if (N->isAnyAdd()) {
 SDValue N1 = N->getOperand(0);
 SDValue N2 = N->getOperand(1);
 if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index c2fca79979e1b..312de262490f4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, 
SDValue &Ptr, SDValue &VAddr,
   C1 = nullptr;
   }
 
-  if (N0.getOpcode() == ISD::ADD) {
+  if (N0->isAnyAdd()) {
 // (add N2, N3) -> addr64, or
 // (add (add N2, N3), C1) -> addr64
 SDValue N2 = N0.getOperand(0);
@@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
   }
 
   // Match the variable offset.
-  if (Addr.getOpcode() == ISD::ADD) {
+  if (Addr->isAnyAdd()) {
 LHS = Addr.getOperand(0);
 
 if (!LHS

[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146074

>From b484d75cff9bd4703dd2c90d041d4df0aefd0e3c Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 26 Jun 2025 06:10:35 -0400
Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD
 transforms

This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.

This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|   7 +
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  59 ++---
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |   3 +
 4 files changed, 94 insertions(+), 100 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 46be271320fdd..4c2d991308d30 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase {
 return false;
   }
 
+  /// True if the target allows transformations of in-bounds pointer
+  /// arithmetic that cause out-of-bounds intermediate results.
+  virtual bool canTransformPtrArithOutOfBounds(const Function &F,
+   EVT PtrVT) const {
+return false;
+  }
+
   /// Does this target support complex deinterleaving
   virtual bool isComplexDeinterleavingSupported() const { return false; }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 77bc47f28fc80..67db08c3f9bac 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) {
   if (PtrVT == IntVT && isNullConstant(N0))
 return N1;
 
-  if (N0.getOpcode() != ISD::PTRADD ||
-  reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1))
-return SDValue();
-
-  SDValue X = N0.getOperand(0);
-  SDValue Y = N0.getOperand(1);
-  SDValue Z = N1;
-  bool N0OneUse = N0.hasOneUse();
-  bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
-  bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
-
-  // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
-  //   * y is a constant and (ptradd x, y) has one use; or
-  //   * y and z are both constants.
-  if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
-// If both additions in the original were NUW, the new ones are as well.
-SDNodeFlags Flags =
-(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
-SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
-AddToWorklist(Add.getNode());
-return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+  if (N0.getOpcode() == ISD::PTRADD &&
+  !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) 
{
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+SDValue Z = N1;
+bool N0OneUse = N0.hasOneUse();
+bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
+bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
+
+// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
+//   * y is a constant and (ptradd x, y) has one use; or
+//   * y and z are both constants.
+if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
+  // If both additions in the original were NUW, the new ones are as well.
+  SDNodeFlags Flags =
+  (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
+  SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
+  AddToWorklist(Add.getNode());
+  return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+}
+  }
+
+  // The following combines can turn in-bounds pointer arithmetic out of 
bounds.
+  // That is problematic for settings like AArch64's CPA, which checks that
+  // intermediate results of pointer arithmetic remain in bounds. The target
+  // therefore needs to opt-in to enable them.
+  if (!TLI.canTransformPtrArithOutOfBounds(
+  DAG.getMachineFunction().getFunction(), PtrVT))
+return SDValue();
+
+  if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) {
+// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with
+// global address GA and constant c, such that c can be folded into GA.
+SDValue GAValue = N0.getOperand(0);
+if (const GlobalAddressSDNode *GA =
+dyn_cast(GAValue)) {
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (!LegalOperations && TLI.

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145330

>From da5b337fef36cdee209845b51bba323e84272334 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 04:03:53 -0400
Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special
 cases

There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
---
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  21 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |   6 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  67 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 196 ++
 6 files changed, 105 insertions(+), 194 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 93ddba93b8034..42d3b36f222d7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8600,7 +8600,7 @@ static bool isMemSrcFromConstant(SDValue Src, 
ConstantDataArraySlice &Slice) {
   GlobalAddressSDNode *G = nullptr;
   if (Src.getOpcode() == ISD::GlobalAddress)
 G = cast(Src);
-  else if (Src.getOpcode() == ISD::ADD &&
+  else if (Src->isAnyAdd() &&
Src.getOperand(0).getOpcode() == ISD::GlobalAddress &&
Src.getOperand(1).getOpcode() == ISD::Constant) {
 G = cast(Src.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 177aa0d11ff90..7465c9b310cb9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned 
BitWidth,
   // operands on the new node are also disjoint.
   SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint
  : SDNodeFlags::None);
+  unsigned Opcode = Op.getOpcode();
+  if (Opcode == ISD::PTRADD) {
+// It isn't a ptradd anymore if it doesn't operate on the entire
+// pointer.
+Opcode = ISD::ADD;
+  }
   SDValue X = DAG.getNode(
-  Op.getOpcode(), dl, SmallVT,
+  Opcode, dl, SmallVT,
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags);
   assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
@@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits(
   return TLO.CombineTo(Op, And1);
 }
 [[fallthrough]];
+  case ISD::PTRADD:
+if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType())
+  break;
+// PTRADD behaves like ADD if pointers are represented as integers.
+[[fallthrough]];
   case ISD::ADD:
   case ISD::SUB: {
 // Add, Sub, and Mul don't demand any bits in positions beyond that
@@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 if (Op.getOpcode() == ISD::MUL) {
   Known = KnownBits::mul(KnownOp0, KnownOp1);
-} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB.
+} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB.
   Known = KnownBits::computeForAddSub(
-  Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(),
-  Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1);
+  Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(),
+  KnownOp0, KnownOp1);
 }
 break;
   }
@@ -5679,7 +5690,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const 
GlobalValue *&GA,
 return true;
   }
 
-  if (N->getOpcode() == ISD::ADD) {
+  if (N->isAnyAdd()) {
 SDValue N1 = N->getOperand(0);
 SDValue N2 = N->getOperand(1);
 if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index c2fca79979e1b..312de262490f4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, 
SDValue &Ptr, SDValue &VAddr,
   C1 = nullptr;
   }
 
-  if (N0.getOpcode() == ISD::ADD) {
+  if (N0->isAnyAdd()) {
 // (add N2, N3) -> addr64, or
 // (add (add N2, N3), C1) -> addr64
 SDValue N2 = N0.getOperand(0);
@@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
   }
 
   // Match the variable offset.
-  if (Addr.getOpcode() == ISD::ADD) {
+  if (Addr->isAnyAdd()) {
 LHS = Addr.getOperand(0);
 
 if (!LHS

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146075

>From 7c417c4c1413a3807d476b7fc490256084a0ac62 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 04:23:50 -0400
Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR

If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which we will soon need to fold offsets
into FLAT instructions. Currently, disjoint ORs can still be used for
offset folding, so that part of the logic can't be tested.

The PR contains a hacky workaround for a situation where an AssertAlign
operand of a PTRADD is not DAGCombined before the PTRADD, causing the
PTRADD to be turned into a disjoint OR although reassociating it with
the operand of the AssertAlign would be better. This wouldn't be a
problem if the DAGCombiner ensured that a node is only processed after
all its operands have been processed.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 56 ++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 78d608556f056..ffaaef65569ae 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16145,6 +16145,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;
+});
+
+if (!TransformCanBreakAddrMode)
+  return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint);
+  }
+
   if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse())
 return SDValue();
 
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 199c1f61d2522..7d7fe141e5440 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) {
 
 ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in 
the
 ; assertalign DAG combine.
-define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  #0 {
+define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) {
 ; GFX942-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX942:   ; %bb.0:
 ; GFX942-NEXT:v_mov_b32_e32 v0, 0
@@ -415,6 +415,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign no

[llvm-branch-commits] [llvm] [Offload] Add olGetMemInfo with platform-less API (PR #159581)

2025-09-18 Thread Ross Brunton via llvm-branch-commits

https://github.com/RossBrunton created 
https://github.com/llvm/llvm-project/pull/159581

None

>From 149a8e88c447d10e9181ba0940c5d05ace6f0d5a Mon Sep 17 00:00:00 2001
From: Ross Brunton 
Date: Thu, 18 Sep 2025 15:23:45 +0100
Subject: [PATCH] [Offload] Add olGetMemInfo with platform-less API

---
 offload/liboffload/API/Memory.td  |  50 +++
 offload/liboffload/src/OffloadImpl.cpp|  54 
 offload/unittests/OffloadAPI/CMakeLists.txt   |   4 +-
 .../OffloadAPI/memory/olGetMemInfo.cpp| 130 ++
 .../OffloadAPI/memory/olGetMemInfoSize.cpp|  63 +
 5 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 offload/unittests/OffloadAPI/memory/olGetMemInfo.cpp
 create mode 100644 offload/unittests/OffloadAPI/memory/olGetMemInfoSize.cpp

diff --git a/offload/liboffload/API/Memory.td b/offload/liboffload/API/Memory.td
index debda165d2b23..3e47b586edd23 100644
--- a/offload/liboffload/API/Memory.td
+++ b/offload/liboffload/API/Memory.td
@@ -45,6 +45,56 @@ def olMemFree : Function {
   let returns = [];
 }
 
+def ol_mem_info_t : Enum {
+  let desc = "Supported memory info.";
+  let is_typed = 1;
+  let etors = [
+TaggedEtor<"DEVICE", "ol_device_handle_t", "The handle of the device 
associated with the allocation.">,
+TaggedEtor<"BASE", "void *", "Base address of this allocation.">,
+TaggedEtor<"SIZE", "size_t", "Size of this allocation in bytes.">,
+TaggedEtor<"TYPE", "ol_alloc_type_t", "Type of this allocation.">,
+  ];
+}
+
+def olGetMemInfo : Function {
+  let desc = "Queries the given property of a memory allocation allocated with 
olMemAlloc.";
+  let details = [
+"`olGetMemInfoSize` can be used to query the storage size required for the 
given query.",
+"The provided pointer can point to any location inside the allocation.",
+  ];
+  let params = [
+Param<"const void *", "Ptr", "pointer to the allocated memory", PARAM_IN>,
+Param<"ol_mem_info_t", "PropName", "type of the info to retrieve", 
PARAM_IN>,
+Param<"size_t", "PropSize", "the number of bytes pointed to by 
PropValue.", PARAM_IN>,
+TypeTaggedParam<"void*", "PropValue", "array of bytes holding the info. "
+  "If Size is not equal to or greater to the real number of bytes needed 
to return the info "
+  "then the OL_ERRC_INVALID_SIZE error is returned and pPlatformInfo is 
not used.", PARAM_OUT,
+  TypeInfo<"PropName" , "PropSize">>
+  ];
+  let returns = [
+Return<"OL_ERRC_INVALID_SIZE", [
+  "`PropSize == 0`",
+  "If `PropSize` is less than the real number of bytes needed to return 
the info."
+]>,
+Return<"OL_ERRC_NOT_FOUND", ["memory was not allocated by this platform"]>
+  ];
+}
+
+def olGetMemInfoSize : Function {
+  let desc = "Returns the storage size of the given queue query.";
+  let details = [
+"The provided pointer can point to any location inside the allocation.",
+  ];
+  let params = [
+Param<"const void *", "Ptr", "pointer to the allocated memory", PARAM_IN>,
+Param<"ol_mem_info_t", "PropName", "type of the info to query", PARAM_IN>,
+Param<"size_t*", "PropSizeRet", "pointer to the number of bytes required 
to store the query", PARAM_OUT>
+  ];
+  let returns = [
+Return<"OL_ERRC_NOT_FOUND", ["memory was not allocated by this platform"]>
+  ];
+}
+
 def olMemcpy : Function {
 let desc = "Enqueue a memcpy operation.";
 let details = [
diff --git a/offload/liboffload/src/OffloadImpl.cpp 
b/offload/liboffload/src/OffloadImpl.cpp
index 4a253c61a657b..2a0e238125dd7 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -700,6 +700,60 @@ Error olMemFree_impl(void *Address) {
   return Error::success();
 }
 
+Error olGetMemInfoImplDetail(const void *Ptr, ol_mem_info_t PropName,
+ size_t PropSize, void *PropValue,
+ size_t *PropSizeRet) {
+  InfoWriter Info(PropSize, PropValue, PropSizeRet);
+  std::lock_guard Lock(OffloadContext::get().AllocInfoMapMutex);
+
+  auto &AllocBases = OffloadContext::get().AllocBases;
+  auto &AllocInfoMap = OffloadContext::get().AllocInfoMap;
+  const AllocInfo *Alloc = nullptr;
+  if (AllocInfoMap.contains(Ptr)) {
+// Fast case, we have been given the base pointer directly
+Alloc = &AllocInfoMap.at(Ptr);
+  } else {
+// Slower case, we need to look up the base pointer first
+// Find the first memory allocation whose end is after the target pointer,
+// and then check to see if it is in range
+auto Loc = std::lower_bound(AllocBases.begin(), AllocBases.end(), Ptr,
+[&](const void *Iter, const void *Val) {
+  return AllocInfoMap.at(Iter).End <= Val;
+});
+if (Loc == AllocBases.end() || Ptr < AllocInfoMap.at(*Loc).Start)
+  return Plugin::error(ErrorCode::NOT_FOUND,
+   "allocated memory information 

[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)

2025-09-18 Thread Ross Brunton via llvm-branch-commits

https://github.com/RossBrunton closed 
https://github.com/llvm/llvm-project/pull/157484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Offload] `olGetMemInfo` (PR #157651)

2025-09-18 Thread Ross Brunton via llvm-branch-commits

https://github.com/RossBrunton closed 
https://github.com/llvm/llvm-project/pull/157651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-09-18 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146075

>From 7c417c4c1413a3807d476b7fc490256084a0ac62 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 04:23:50 -0400
Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR

If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which we will soon need to fold offsets
into FLAT instructions. Currently, disjoint ORs can still be used for
offset folding, so that part of the logic can't be tested.

The PR contains a hacky workaround for a situation where an AssertAlign
operand of a PTRADD is not DAGCombined before the PTRADD, causing the
PTRADD to be turned into a disjoint OR although reassociating it with
the operand of the AssertAlign would be better. This wouldn't be a
problem if the DAGCombiner ensured that a node is only processed after
all its operands have been processed.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 56 ++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 78d608556f056..ffaaef65569ae 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16145,6 +16145,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;
+});
+
+if (!TransformCanBreakAddrMode)
+  return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint);
+  }
+
   if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse())
 return SDValue();
 
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 199c1f61d2522..7d7fe141e5440 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) {
 
 ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in 
the
 ; assertalign DAG combine.
-define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  #0 {
+define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) {
 ; GFX942-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX942:   ; %bb.0:
 ; GFX942-NEXT:v_mov_b32_e32 v0, 0
@@ -415,6 +415,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign no

[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-18 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler updated 
https://github.com/llvm/llvm-project/pull/156715

>From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001
From: Tobias Stadler 
Date: Thu, 18 Sep 2025 12:34:55 +0100
Subject: [PATCH] Replace bitstream blobs by yaml

Created using spr 1.3.7-wip
---
 llvm/lib/Remarks/BitstreamRemarkParser.cpp|   5 +-
 .../dsymutil/ARM/remarks-linking-bundle.test  |  13 +-
 .../basic1.macho.remarks.arm64.opt.bitstream  | Bin 824 -> 0 bytes
 .../basic1.macho.remarks.arm64.opt.yaml   |  47 +
 ...c1.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic2.macho.remarks.arm64.opt.bitstream  | Bin 1696 -> 0 bytes
 .../basic2.macho.remarks.arm64.opt.yaml   | 194 ++
 ...c2.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic3.macho.remarks.arm64.opt.bitstream  | Bin 1500 -> 0 bytes
 .../basic3.macho.remarks.arm64.opt.yaml   | 181 
 ...c3.macho.remarks.empty.arm64.opt.bitstream |   0
 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes
 .../remarks/fat.macho.remarks.x86_64.opt.yaml |  53 +
 .../fat.macho.remarks.x86_64h.opt.bitstream   | Bin 820 -> 0 bytes
 .../fat.macho.remarks.x86_64h.opt.yaml|  53 +
 .../X86/remarks-linking-fat-bundle.test   |   8 +-
 16 files changed, 543 insertions(+), 11 deletions(-)
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml

diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp 
b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
index 63b16bd2df0ec..2b27a0f661d88 100644
--- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp
+++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
@@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() {
 return E;
 
   if (ContainerType != BitstreamRemarkContainerType::RemarksFile)
-return error(
-"Error while parsing external file's BLOCK_META: wrong container "
-"type.");
+return ParserHelper->MetaHelper.error(
+"Wrong container type in external file.");
 
   return Error::success();
 }
diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test 
b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
index 09a60d7d044c6..e1b04455b0d9d 100644
--- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
+++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
@@ -1,22 +1,25 @@
 RUN: rm -rf %t
-RUN: mkdir -p %t
+RUN: mkdir -p %t/private/tmp/remarks
 RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> 
%t/basic.macho.remarks.arm64
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 
-RUN: dsymutil -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64
+RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t 
%t/basic.macho.remarks.arm64
 
 Check that the remark file in the bundle exists and is sane:
 RUN: llvm-bcanalyzer -dump 
%t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64
 | FileCheck %s
 
-RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.remar

[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver edited 
https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)

2025-09-18 Thread Deák Lajos via llvm-branch-commits

https://github.com/deaklajos created 
https://github.com/llvm/llvm-project/pull/159551

Backports: 3fc723ec2cf1965aa4eec8883957fbbe1b2e7027 (#136195)

Ran into the issue on Alpine when building with TSAN that `__sanitizer_msghdr` 
and the `msghdr` provided by musl did not match. This caused lots of tsan 
reports and an eventual termination of the application by the oom during a 
`sendmsg`.

From 60b10f56319e62415c61e69c67f9c713ed81172e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?De=C3=A1k=20Lajos?=
 <36414743+deakla...@users.noreply.github.com>
Date: Tue, 22 Jul 2025 20:31:28 +0200
Subject: [PATCH] [compiler-rt][sanitizer] fix msghdr for musl (#136195)

Ran into the issue on Alpine when building with TSAN that
`__sanitizer_msghdr` and the `msghdr` provided by musl did not match.
This caused lots of tsan reports and an eventual termination of the
application by the oom during a `sendmsg`.
---
 .../sanitizer_platform_limits_posix.h | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h 
b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
index f118d53f0df80..24966523f3a02 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
@@ -478,6 +478,30 @@ struct __sanitizer_cmsghdr {
   int cmsg_level;
   int cmsg_type;
 };
+#  elif SANITIZER_MUSL
+struct __sanitizer_msghdr {
+  void *msg_name;
+  unsigned msg_namelen;
+  struct __sanitizer_iovec *msg_iov;
+  int msg_iovlen;
+#if SANITIZER_WORDSIZE == 64
+  int __pad1;
+#endif
+  void *msg_control;
+  unsigned msg_controllen;
+#if SANITIZER_WORDSIZE == 64
+  int __pad2;
+#endif
+  int msg_flags;
+};
+struct __sanitizer_cmsghdr {
+  unsigned cmsg_len;
+#if SANITIZER_WORDSIZE == 64
+  int __pad1;
+#endif
+  int cmsg_level;
+  int cmsg_type;
+};
 #  else
 // In POSIX, int msg_iovlen; socklen_t msg_controllen; socklen_t cmsg_len; but
 // many implementations don't conform to the standard.

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Deák Lajos (deaklajos)


Changes

Backports: 3fc723ec2cf1965aa4eec8883957fbbe1b2e7027 (#136195)

Ran into the issue on Alpine when building with TSAN that `__sanitizer_msghdr` 
and the `msghdr` provided by musl did not match. This caused lots of tsan 
reports and an eventual termination of the application by the oom during a 
`sendmsg`.

---
Full diff: https://github.com/llvm/llvm-project/pull/159551.diff


1 Files Affected:

- (modified) compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h 
(+24) 


``diff
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h 
b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
index f118d53f0df80..24966523f3a02 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
@@ -478,6 +478,30 @@ struct __sanitizer_cmsghdr {
   int cmsg_level;
   int cmsg_type;
 };
+#  elif SANITIZER_MUSL
+struct __sanitizer_msghdr {
+  void *msg_name;
+  unsigned msg_namelen;
+  struct __sanitizer_iovec *msg_iov;
+  int msg_iovlen;
+#if SANITIZER_WORDSIZE == 64
+  int __pad1;
+#endif
+  void *msg_control;
+  unsigned msg_controllen;
+#if SANITIZER_WORDSIZE == 64
+  int __pad2;
+#endif
+  int msg_flags;
+};
+struct __sanitizer_cmsghdr {
+  unsigned cmsg_len;
+#if SANITIZER_WORDSIZE == 64
+  int __pad1;
+#endif
+  int cmsg_level;
+  int cmsg_type;
+};
 #  else
 // In POSIX, int msg_iovlen; socklen_t msg_controllen; socklen_t cmsg_len; but
 // many implementations don't conform to the standard.

``




https://github.com/llvm/llvm-project/pull/159551
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Slava Zakharin via llvm-branch-commits

https://github.com/vzakhari approved this pull request.

The `powi` part looks good to me.  Are you planning to merge it, and then 
rebase the other PR for the Flang changes for the final review?

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis edited 
https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits


@@ -47,74 +47,61 @@ static func::FuncOp getOrDeclare(fir::FirOpBuilder 
&builder, Location loc,
   return func;
 }
 
-static bool isZero(Value v) {
-  if (auto cst = v.getDefiningOp())
-if (auto attr = dyn_cast(cst.getValue()))
-  return attr.getValue().isZero();
-  return false;
-}
-
 void ConvertComplexPowPass::runOnOperation() {
   ModuleOp mod = getOperation();
   fir::FirOpBuilder builder(mod, fir::getKindMapping(mod));
 
-  mod.walk([&](complex::PowOp op) {
+  mod.walk([&](complex::PowiOp op) {
 builder.setInsertionPoint(op);
 Location loc = op.getLoc();
 auto complexTy = cast(op.getType());
 auto elemTy = complexTy.getElementType();
-
 Value base = op.getLhs();
-Value rhs = op.getRhs();
-
-Value intExp;
-if (auto create = rhs.getDefiningOp()) {
-  if (isZero(create.getImaginary())) {
-if (auto conv = create.getReal().getDefiningOp()) {
-  if (auto intTy = dyn_cast(conv.getValue().getType()))
-intExp = conv.getValue();
-}
-  }
-}
-
+Value intExp = op.getRhs();
 func::FuncOp callee;
-SmallVector args;
-if (intExp) {
-  unsigned realBits = cast(elemTy).getWidth();
-  unsigned intBits = cast(intExp.getType()).getWidth();
-  auto funcTy = builder.getFunctionType(
-  {complexTy, builder.getIntegerType(intBits)}, {complexTy});
-  if (realBits == 32 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
-  else if (realBits == 32 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
-  else if (realBits == 64 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
-  else if (realBits == 64 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
-  else if (realBits == 128 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
-  else if (realBits == 128 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
-  else
-return;
-  args = {base, intExp};
-} else {
-  unsigned realBits = cast(elemTy).getWidth();
-  auto funcTy =
-  builder.getFunctionType({complexTy, complexTy}, {complexTy});
-  if (realBits == 32)
-callee = getOrDeclare(builder, loc, "cpowf", funcTy);
-  else if (realBits == 64)
-callee = getOrDeclare(builder, loc, "cpow", funcTy);
-  else if (realBits == 128)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(CPowF128), funcTy);
-  else
-return;
-  args = {base, rhs};
-}
+unsigned realBits = cast(elemTy).getWidth();
+unsigned intBits = cast(intExp.getType()).getWidth();
+auto funcTy = builder.getFunctionType(
+{complexTy, builder.getIntegerType(intBits)}, {complexTy});
+if (realBits == 32 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
+else if (realBits == 32 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
+else if (realBits == 64 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
+else if (realBits == 64 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
+else if (realBits == 128 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
+else if (realBits == 128 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
+else
+  return;
+auto call = fir::CallOp::create(builder, loc, callee, {base, intExp});
+if (auto fmf = op.getFastmathAttr())
+  call.setFastmathAttr(fmf);
+op.replaceAllUsesWith(call.getResult(0));
+op.erase();
+  });
 
-auto call = fir::CallOp::create(builder, loc, callee, args);
+  mod.walk([&](complex::PowOp op) {

TIFitis wrote:

I've updated this.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #150937)

2025-09-18 Thread Valery Pykhtin via llvm-branch-commits

https://github.com/vpykhtin updated 
https://github.com/llvm/llvm-project/pull/150937

>From ae3589e2c93351349cd1bbb5586c2dfcb075ea68 Mon Sep 17 00:00:00 2001
From: Valery Pykhtin 
Date: Thu, 10 Apr 2025 11:58:13 +
Subject: [PATCH] amdgpu_use_ssaupdaterbulk_in_structurizecfg

---
 llvm/lib/Transforms/Scalar/StructurizeCFG.cpp | 25 +++
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp 
b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
index 2ee91a9b40026..0f3978f56045e 100644
--- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
+++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
@@ -47,6 +47,7 @@
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Transforms/Utils/SSAUpdater.h"
+#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
 #include 
 #include 
 
@@ -321,7 +322,7 @@ class StructurizeCFG {
 
   void collectInfos();
 
-  void insertConditions(bool Loops);
+  void insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter);
 
   void simplifyConditions();
 
@@ -671,10 +672,9 @@ void StructurizeCFG::collectInfos() {
 }
 
 /// Insert the missing branch conditions
-void StructurizeCFG::insertConditions(bool Loops) {
+void StructurizeCFG::insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter) 
{
   BranchVector &Conds = Loops ? LoopConds : Conditions;
   Value *Default = Loops ? BoolTrue : BoolFalse;
-  SSAUpdater PhiInserter;
 
   for (BranchInst *Term : Conds) {
 assert(Term->isConditional());
@@ -683,8 +683,9 @@ void StructurizeCFG::insertConditions(bool Loops) {
 BasicBlock *SuccTrue = Term->getSuccessor(0);
 BasicBlock *SuccFalse = Term->getSuccessor(1);
 
-PhiInserter.Initialize(Boolean, "");
-PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default);
+unsigned Variable = PhiInserter.AddVariable("", Boolean);
+PhiInserter.AddAvailableValue(Variable, Loops ? SuccFalse : Parent,
+  Default);
 
 BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue];
 
@@ -697,7 +698,7 @@ void StructurizeCFG::insertConditions(bool Loops) {
 ParentInfo = PI;
 break;
   }
-  PhiInserter.AddAvailableValue(BB, PI.Pred);
+  PhiInserter.AddAvailableValue(Variable, BB, PI.Pred);
   Dominator.addAndRememberBlock(BB);
 }
 
@@ -706,9 +707,9 @@ void StructurizeCFG::insertConditions(bool Loops) {
   CondBranchWeights::setMetadata(*Term, ParentInfo.Weights);
 } else {
   if (!Dominator.resultIsRememberedBlock())
-PhiInserter.AddAvailableValue(Dominator.result(), Default);
+PhiInserter.AddAvailableValue(Variable, Dominator.result(), Default);
 
-  Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent));
+  PhiInserter.AddUse(Variable, &Term->getOperandUse(0));
 }
   }
 }
@@ -1414,8 +1415,12 @@ bool StructurizeCFG::run(Region *R, DominatorTree *DT,
   orderNodes();
   collectInfos();
   createFlow();
-  insertConditions(false);
-  insertConditions(true);
+
+  SSAUpdaterBulk PhiInserter;
+  insertConditions(false, PhiInserter);
+  insertConditions(true, PhiInserter);
+  PhiInserter.RewriteAndOptimizeAllUses(*DT);
+
   setPhiValues();
   simplifyHoistedPhis();
   simplifyConditions();

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Akash Banerjee via llvm-branch-commits

TIFitis wrote:

> The `powi` part looks good to me. Are you planning to merge it, and then 
> rebase the other PR for the Flang changes for the final review?

I plan on landing both PRs at once. This PR depends on #158642, which should 
land first.
All the work should have been in a single PR but I split it up to make it 
easier to review.

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: Helena Kotas (hekota)


Changes

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated 
to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923

---
Full diff: https://github.com/llvm/llvm-project/pull/159655.diff


5 Files Affected:

- (modified) clang/include/clang/Basic/Builtins.td (+6) 
- (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) 
- (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) 
- (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) 
- (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) 


``diff
diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..96676bd810631 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "void(...)";
 }
 
+def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_resource_nonuniformindex"];
+  let Attributes = [NoThrow];
+  let Prototype = "uint32_t(uint32_t)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 7b5b924b1fe82..9f87afa5a8a3d 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name};
 return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args);
   }
+  case Builtin::BI__builtin_hlsl_resource_nonuniformindex: {
+Value *IndexOp = EmitScalarExpr(E->getArg(0));
+llvm::Type *RetTy = ConvertType(E->getType());
+return Builder.CreateIntrinsic(
+RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
+ArrayRef{IndexOp});
+  }
   case Builtin::BI__builtin_hlsl_all: {
 Value *Op0 = EmitScalarExpr(E->getArg(0));
 return Builder.CreateIntrinsic(
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h 
b/clang/lib/CodeGen/CGHLSLRuntime.h
index 370f3d5c5d30d..f4b410664d60c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -129,6 +129,8 @@ class CGHLSLRuntime {
resource_handlefrombinding)
   GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding,
resource_handlefromimplicitbinding)
+  GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex,
+   resource_nonuniformindex)
   GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter)
   GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync,
group_memory_barrier_with_group_sync)
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h 
b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index d9d87c827e6a4..0eab2ff56c519 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) {
   return __detail::d3d_color_to_ubyte4_impl(V);
 }
 
+//===--===//
+// NonUniformResourceIndex builtin
+//===--===//
+
+/// \fn uint NonUniformResourceIndex(uint I)
+/// \brief A compiler hint to indicate that a resource index varies across
+/// threads.
+// / within a wave (i.e., it is non-uniform).
+/// \param I [in] Resource array index
+///
+/// The return value is the \Index parameter.
+///
+/// When indexing into an array of shader resources (e.g., textures, buffers),
+/// some GPU hardware and drivers require the compiler to know whether the 
index
+/// is uniform (same for all threads) or non-uniform (varies per thread).
+///
+/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, .
+/// disabling certain assumptions or optimizations that could lead to incorrect
+/// behavior when dynamically accessing resource arrays with non-uniform
+/// indices.
+
+constexpr uint32_t NonUniformResourceIndex(uint32_t Index) {
+  return __builtin_hlsl_resource_nonuniformindex(Index);
+}
+
 
//===--===//
 // reflect builtin
 
//===--===//
diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl 
b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl
new file mode 100644
index 0..ab512ce111d19
--- /dev/null
+++ b/clang/test/CodeGenHLS

[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang-codegen

Author: Helena Kotas (hekota)


Changes

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated 
to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923

---
Full diff: https://github.com/llvm/llvm-project/pull/159655.diff


5 Files Affected:

- (modified) clang/include/clang/Basic/Builtins.td (+6) 
- (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) 
- (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) 
- (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) 
- (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) 


``diff
diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..96676bd810631 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "void(...)";
 }
 
+def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_resource_nonuniformindex"];
+  let Attributes = [NoThrow];
+  let Prototype = "uint32_t(uint32_t)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 7b5b924b1fe82..9f87afa5a8a3d 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name};
 return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args);
   }
+  case Builtin::BI__builtin_hlsl_resource_nonuniformindex: {
+Value *IndexOp = EmitScalarExpr(E->getArg(0));
+llvm::Type *RetTy = ConvertType(E->getType());
+return Builder.CreateIntrinsic(
+RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
+ArrayRef{IndexOp});
+  }
   case Builtin::BI__builtin_hlsl_all: {
 Value *Op0 = EmitScalarExpr(E->getArg(0));
 return Builder.CreateIntrinsic(
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h 
b/clang/lib/CodeGen/CGHLSLRuntime.h
index 370f3d5c5d30d..f4b410664d60c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -129,6 +129,8 @@ class CGHLSLRuntime {
resource_handlefrombinding)
   GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding,
resource_handlefromimplicitbinding)
+  GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex,
+   resource_nonuniformindex)
   GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter)
   GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync,
group_memory_barrier_with_group_sync)
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h 
b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index d9d87c827e6a4..0eab2ff56c519 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) {
   return __detail::d3d_color_to_ubyte4_impl(V);
 }
 
+//===--===//
+// NonUniformResourceIndex builtin
+//===--===//
+
+/// \fn uint NonUniformResourceIndex(uint I)
+/// \brief A compiler hint to indicate that a resource index varies across
+/// threads.
+// / within a wave (i.e., it is non-uniform).
+/// \param I [in] Resource array index
+///
+/// The return value is the \Index parameter.
+///
+/// When indexing into an array of shader resources (e.g., textures, buffers),
+/// some GPU hardware and drivers require the compiler to know whether the 
index
+/// is uniform (same for all threads) or non-uniform (varies per thread).
+///
+/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, .
+/// disabling certain assumptions or optimizations that could lead to incorrect
+/// behavior when dynamically accessing resource arrays with non-uniform
+/// indices.
+
+constexpr uint32_t NonUniformResourceIndex(uint32_t Index) {
+  return __builtin_hlsl_resource_nonuniformindex(Index);
+}
+
 
//===--===//
 // reflect builtin
 
//===--===//
diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl 
b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl
new file mode 100644
index 0..ab512ce111d19
--- /dev/null

[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)

2025-09-18 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota created 
https://github.com/llvm/llvm-project/pull/159655

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated 
to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923

>From 108bf356e743d36b4eb5d0217720cf47ab85f33f Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Thu, 18 Sep 2025 14:31:38 -0700
Subject: [PATCH] [HLSL] NonUniformResourceIndex implementation

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls
a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to
LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923
---
 clang/include/clang/Basic/Builtins.td |  6 +++
 clang/lib/CodeGen/CGHLSLBuiltins.cpp  |  7 
 clang/lib/CodeGen/CGHLSLRuntime.h |  2 +
 clang/lib/Headers/hlsl/hlsl_intrinsics.h  | 25 
 .../resources/NonUniformResourceIndex.hlsl| 38 +++
 5 files changed, 78 insertions(+)
 create mode 100644 
clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl

diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..96676bd810631 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "void(...)";
 }
 
+def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_resource_nonuniformindex"];
+  let Attributes = [NoThrow];
+  let Prototype = "uint32_t(uint32_t)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 7b5b924b1fe82..9f87afa5a8a3d 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name};
 return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args);
   }
+  case Builtin::BI__builtin_hlsl_resource_nonuniformindex: {
+Value *IndexOp = EmitScalarExpr(E->getArg(0));
+llvm::Type *RetTy = ConvertType(E->getType());
+return Builder.CreateIntrinsic(
+RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
+ArrayRef{IndexOp});
+  }
   case Builtin::BI__builtin_hlsl_all: {
 Value *Op0 = EmitScalarExpr(E->getArg(0));
 return Builder.CreateIntrinsic(
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h 
b/clang/lib/CodeGen/CGHLSLRuntime.h
index 370f3d5c5d30d..f4b410664d60c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -129,6 +129,8 @@ class CGHLSLRuntime {
resource_handlefrombinding)
   GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding,
resource_handlefromimplicitbinding)
+  GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex,
+   resource_nonuniformindex)
   GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter)
   GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync,
group_memory_barrier_with_group_sync)
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h 
b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index d9d87c827e6a4..0eab2ff56c519 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) {
   return __detail::d3d_color_to_ubyte4_impl(V);
 }
 
+//===--===//
+// NonUniformResourceIndex builtin
+//===--===//
+
+/// \fn uint NonUniformResourceIndex(uint I)
+/// \brief A compiler hint to indicate that a resource index varies across
+/// threads.
+// / within a wave (i.e., it is non-uniform).
+/// \param I [in] Resource array index
+///
+/// The return value is the \Index parameter.
+///
+/// When indexing into an array of shader resources (e.g., textures, buffers),
+/// some GPU hardware and drivers require the compiler to know whether the 
index
+/// is uniform (same for all threads) or non-uniform (varies per thread).
+///
+/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, .
+/// disabling certain assumptions or optimizations that could lead to incorrect
+/// behavior when dynamically accessing resource arrays with non-uniform
+/// indices.
+
+constexpr uint32_t NonUniformResourceIndex(uint32_t Index) {
+  return __builtin_hlsl_resource

[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Helena Kotas (hekota)


Changes

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated 
to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923

---
Full diff: https://github.com/llvm/llvm-project/pull/159655.diff


5 Files Affected:

- (modified) clang/include/clang/Basic/Builtins.td (+6) 
- (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) 
- (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) 
- (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) 
- (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) 


``diff
diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..96676bd810631 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "void(...)";
 }
 
+def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_resource_nonuniformindex"];
+  let Attributes = [NoThrow];
+  let Prototype = "uint32_t(uint32_t)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 7b5b924b1fe82..9f87afa5a8a3d 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name};
 return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args);
   }
+  case Builtin::BI__builtin_hlsl_resource_nonuniformindex: {
+Value *IndexOp = EmitScalarExpr(E->getArg(0));
+llvm::Type *RetTy = ConvertType(E->getType());
+return Builder.CreateIntrinsic(
+RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
+ArrayRef{IndexOp});
+  }
   case Builtin::BI__builtin_hlsl_all: {
 Value *Op0 = EmitScalarExpr(E->getArg(0));
 return Builder.CreateIntrinsic(
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h 
b/clang/lib/CodeGen/CGHLSLRuntime.h
index 370f3d5c5d30d..f4b410664d60c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -129,6 +129,8 @@ class CGHLSLRuntime {
resource_handlefrombinding)
   GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding,
resource_handlefromimplicitbinding)
+  GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex,
+   resource_nonuniformindex)
   GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter)
   GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync,
group_memory_barrier_with_group_sync)
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h 
b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index d9d87c827e6a4..0eab2ff56c519 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) {
   return __detail::d3d_color_to_ubyte4_impl(V);
 }
 
+//===--===//
+// NonUniformResourceIndex builtin
+//===--===//
+
+/// \fn uint NonUniformResourceIndex(uint I)
+/// \brief A compiler hint to indicate that a resource index varies across
+/// threads.
+// / within a wave (i.e., it is non-uniform).
+/// \param I [in] Resource array index
+///
+/// The return value is the \Index parameter.
+///
+/// When indexing into an array of shader resources (e.g., textures, buffers),
+/// some GPU hardware and drivers require the compiler to know whether the 
index
+/// is uniform (same for all threads) or non-uniform (varies per thread).
+///
+/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, .
+/// disabling certain assumptions or optimizations that could lead to incorrect
+/// behavior when dynamically accessing resource arrays with non-uniform
+/// indices.
+
+constexpr uint32_t NonUniformResourceIndex(uint32_t Index) {
+  return __builtin_hlsl_resource_nonuniformindex(Index);
+}
+
 
//===--===//
 // reflect builtin
 
//===--===//
diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl 
b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl
new file mode 100644
index 0..ab512ce111d19
--- /dev/null
+++ b/clang/test/CodeGenHLSL/reso

[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158274

>From 1a85c9cf7cdf944be302c00efd231eba5d46bdc6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 9 Sep 2025 11:15:47 +0900
Subject: [PATCH] X86: Switch to RegClassByHwMode

Replace the target uses of PointerLikeRegClass with RegClassByHwMode
---
 .../X86/MCTargetDesc/X86MCTargetDesc.cpp  |  3 ++
 llvm/lib/Target/X86/X86.td|  2 ++
 llvm/lib/Target/X86/X86InstrInfo.td   |  8 ++---
 llvm/lib/Target/X86/X86InstrOperands.td   | 30 +++-
 llvm/lib/Target/X86/X86InstrPredicates.td | 14 
 llvm/lib/Target/X86/X86RegisterInfo.cpp   | 35 +--
 llvm/lib/Target/X86/X86Subtarget.h|  4 +--
 llvm/utils/TableGen/X86FoldTablesEmitter.cpp  |  4 +--
 8 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp 
b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
index bb1e716c33ed5..1d5ef8b0996dc 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
@@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) {
   else
 FS = "-64bit-mode,-32bit-mode,+16bit-mode";
 
+  if (TT.isX32())
+FS += ",+x32";
+
   return FS;
 }
 
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 7c9e821c02fda..3af8b3e060a16 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", 
"true",
"32-bit mode (80386)">;
 def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true",
"16-bit mode (i8086)">;
+def IsX32 : SubtargetFeature<"x32", "IsX32", "true",
+ "64-bit with ILP32 programming model (e.g. x32 
ABI)">;
 
 
//===--===//
 // X86 Subtarget ISA features
diff --git a/llvm/lib/Target/X86/X86InstrInfo.td 
b/llvm/lib/Target/X86/X86InstrInfo.td
index 7f6c5614847e3..0c4abc2c400f6 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.td
+++ b/llvm/lib/Target/X86/X86InstrInfo.td
@@ -18,14 +18,14 @@ include "X86InstrFragments.td"
 include "X86InstrFragmentsSIMD.td"
 
 
//===--===//
-// X86 Operand Definitions.
+// X86 Predicate Definitions.
 //
-include "X86InstrOperands.td"
+include "X86InstrPredicates.td"
 
 
//===--===//
-// X86 Predicate Definitions.
+// X86 Operand Definitions.
 //
-include "X86InstrPredicates.td"
+include "X86InstrOperands.td"
 
 
//===--===//
 // X86 Instruction Format Definitions.
diff --git a/llvm/lib/Target/X86/X86InstrOperands.td 
b/llvm/lib/Target/X86/X86InstrOperands.td
index 80843f6bb80e6..5207ecad127a2 100644
--- a/llvm/lib/Target/X86/X86InstrOperands.td
+++ b/llvm/lib/Target/X86/X86InstrOperands.td
@@ -6,9 +6,15 @@
 //
 
//===--===//
 
+def x86_ptr_rc : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32, GR64, LOW32_ADDR_ACCESS]>;
+
 // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for
 // the index operand of an address, to conform to x86 encoding restrictions.
-def ptr_rc_nosp : PointerLikeRegClass<1>;
+def ptr_rc_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOSP, GR64_NOSP, GR32_NOSP]>;
 
 // *mem - Operand definitions for the funky X86 addressing mode operands.
 //
@@ -53,7 +59,7 @@ class X86MemOperand : Operand {
   let PrintMethod = printMethod;
-  let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, 
SEGMENT_REG);
   let ParserMatchClass = parserMatchClass;
   let OperandType = "OPERAND_MEMORY";
   int Size = size;
@@ -63,7 +69,7 @@ class X86MemOperand
 : X86MemOperand {
-  let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
 }
 
 def anymem : X86MemOperand<"printMemReference">;
@@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", 
X86Mem64AsmOperand>;
 
 // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead
 // of a plain GPR, so that it doesn't potentially require a REX prefix.
-def ptr_rc_norex : PointerLikeRegClass<2>;
-def ptr_rc_norex_nosp : PointerLikeRegClass<3>;
+def ptr_rc_norex : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX, GR64_NOREX, GR32_NOREX]>;
+
+def ptr_rc_norex_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>;
+
 
 def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> {
   let MIOpe

[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP3 dpp support (PR #159654)

2025-09-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/159654

None

>From b83405b879b471da983f885bfdffb3d1f58130de Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 18 Sep 2025 14:30:20 -0700
Subject: [PATCH] [AMDGPU] gfx1251 VOP3 dpp support

---
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   1 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  64 ++--
 llvm/lib/Target/AMDGPU/VOPInstructions.td |  78 +
 llvm/test/CodeGen/AMDGPU/dpp64_combine.ll |   4 +
 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_dpp16.s  | 150 ++
 .../AMDGPU/gfx1251_asm_vop3_from_vop1_dpp16.s |  58 +++
 .../AMDGPU/gfx1251_asm_vop3_from_vop1_err.s   | 150 ++
 .../AMDGPU/gfx1251_asm_vop3_from_vop2_dpp16.s |  34 
 .../AMDGPU/gfx1251_asm_vop3_from_vop2_err.s   |  93 +++
 llvm/test/MC/AMDGPU/vop3-gfx9.s   |   4 +-
 .../AMDGPU/gfx1251_dasm_vop3_dpp16.txt|  94 +++
 .../gfx1251_dasm_vop3_from_vop1_dpp16.txt |  43 +
 .../gfx1251_dasm_vop3_from_vop2_dpp16.txt |  25 +++
 13 files changed, 745 insertions(+), 53 deletions(-)
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_dpp16.s
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop1_dpp16.s
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop1_err.s
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop2_dpp16.s
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop2_err.s
 create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_dpp16.txt
 create mode 100644 
llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_from_vop1_dpp16.txt
 create mode 100644 
llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_from_vop2_dpp16.txt

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index c49f1930705aa..18fae6cfc7ed9 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -1969,6 +1969,7 @@ class getVOP3DPPSrcForVT {
   RegisterOperand ret =
   !cond(!eq(VT, i1) : SSrc_i1,
 !eq(VT, i16): !if (IsFake16, VCSrc_b16, VCSrcT_b16),
+!eq(VT, i64): VCSrc_b64,
 !eq(VT, f16): !if (IsFake16, VCSrc_f16, VCSrcT_f16),
 !eq(VT, bf16)   : !if (IsFake16, VCSrc_bf16, VCSrcT_bf16),
 !eq(VT, v2i16)  : VCSrc_v2b16,
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 582a353632436..e6a7c35dce0be 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -24,6 +24,7 @@ def VOP_F32_F32_F32_F32_VCC : VOPProfile<[f32, f32, f32, 
f32]> {
 }
 def VOP_F64_F64_F64_F64_VCC : VOPProfile<[f64, f64, f64, f64]> {
   let Outs64 = (outs DstRC.RegClass:$vdst);
+  let HasExt64BitDPP = 1;
   let IsSingle = 1;
 }
 }
@@ -51,7 +52,24 @@ def VOP3b_I64_I1_I32_I32_I64 : VOPProfile<[i64, i32, i32, 
i64]> {
 
 let HasExt64BitDPP = 1 in {
 def VOP3b_F32_I1_F32_F32_F32 : VOP3b_Profile;
-def VOP3b_F64_I1_F64_F64_F64 : VOP3b_Profile;
+def VOP3b_F64_I1_F64_F64_F64 : VOP3b_Profile {
+  let OutsVOP3DPP = Outs64;
+  let AsmVOP3DPP = getAsmVOP3DPP.ret;
+  let AsmVOP3DPP16 = getAsmVOP3DPP16.ret;
+  let AsmVOP3DPP8 = getAsmVOP3DPP8.ret;
+}
+
+def VOP3b_I64_I1_I32_I32_I64_DPP : VOPProfile<[i64, i32, i32, i64]> {
+  let HasClamp = 1;
+
+  let IsSingle = 1;
+  let Outs64 = (outs DstRC:$vdst, VOPDstS64orS32:$sdst);
+  let OutsVOP3DPP = Outs64;
+  let Asm64 = "$vdst, $sdst, $src0, $src1, $src2$clamp";
+  let AsmVOP3DPP = getAsmVOP3DPP.ret;
+  let AsmVOP3DPP16 = getAsmVOP3DPP16.ret;
+  let AsmVOP3DPP8 = getAsmVOP3DPP8.ret;
+}
 
 class V_MUL_PROF : VOP3_Profile {
   let HasExtVOP3DPP = 0;
@@ -229,7 +247,7 @@ defm V_DIV_FMAS_F32 : VOP3Inst_Pseudo_Wrapper 
<"v_div_fmas_f32", VOP_F32_F32_F32
 // result *= 2^64
 //
 let SchedRW = [WriteDouble], FPDPRounding = 1 in
-defm V_DIV_FMAS_F64 : VOP3Inst_Pseudo_Wrapper  <"v_div_fmas_f64", 
VOP_F64_F64_F64_F64_VCC, []>;
+defm V_DIV_FMAS_F64 : VOP3Inst <"v_div_fmas_f64", VOP_F64_F64_F64_F64_VCC>;
 } // End Uses = [MODE, VCC, EXEC]
 
 } // End isCommutable = 1
@@ -294,7 +312,7 @@ defm V_CVT_PK_U8_F32 : VOP3Inst<"v_cvt_pk_u8_f32", 
VOP3_Profile;
 
 let SchedRW = [WriteDoubleAdd], FPDPRounding = 1 in {
-  defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", 
VOP3_Profile, AMDGPUdiv_fixup>;
+  defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", 
VOP_F64_F64_F64_F64_DPP_PROF, AMDGPUdiv_fixup>;
   defm V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile, 
any_fldexp>;
 } // End SchedRW = [WriteDoubleAdd], FPDPRounding = 1
 } // End isReMaterializable = 1
@@ -335,7 +353,7 @@ let mayRaiseFPException = 0 in { // Seems suspicious but 
manual doesn't say it d
 
   // Double precision division pre-scale.
   let SchedRW = [WriteDouble, WriteSALU], FPDPRounding = 1 in
-  defm V_DIV_SCALE_F64 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f64", 
VOP3b_F64_I1_F64_F64_F64>;
+  defm V_DIV_SCALE_F64 : VOP3Inst <

[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP3 dpp support (PR #159654)

2025-09-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#159654** https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#159641** https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#159637** https://app.graphite.dev/github/pr/llvm/llvm-project/159637?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/159654
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc++] Annotate classes with _LIBCXX_PFP to enable pointer field protection (PR #151652)

2025-09-18 Thread Peter Collingbourne via llvm-branch-commits

pcc wrote:

> What is the reasoning behind this? Could we document something when to apply 
> the attribute?

I added this to types which are commonly used, as mentioned in the commit 
message. I will document that in the coding guidelines.

https://github.com/llvm/llvm-project/pull/151652
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)

2025-09-18 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz updated 
https://github.com/llvm/llvm-project/pull/159632

>From 7bb9fb5b3b9a2dfcd1d00f01c86fe26c5d14c30f Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Thu, 18 Sep 2025 08:49:38 -0500
Subject: [PATCH] [flang][OpenMP] Use OmpDirectiveSpecification in
 THREADPRIVATE

Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList),
some semantics-checking functions needed to be updated to operate on a
single object at a time.
---
 flang/include/flang/Parser/openmp-utils.h|  4 +-
 flang/include/flang/Parser/parse-tree.h  |  3 +-
 flang/include/flang/Semantics/openmp-utils.h |  3 +-
 flang/lib/Parser/openmp-parsers.cpp  |  7 +-
 flang/lib/Parser/unparse.cpp |  7 +-
 flang/lib/Semantics/check-omp-structure.cpp  | 89 +++-
 flang/lib/Semantics/check-omp-structure.h|  3 +
 flang/lib/Semantics/openmp-utils.cpp | 22 +++--
 flang/lib/Semantics/resolve-directives.cpp   | 11 ++-
 9 files changed, 86 insertions(+), 63 deletions(-)

diff --git a/flang/include/flang/Parser/openmp-utils.h 
b/flang/include/flang/Parser/openmp-utils.h
index 032fb8996fe48..1372945427955 100644
--- a/flang/include/flang/Parser/openmp-utils.h
+++ b/flang/include/flang/Parser/openmp-utils.h
@@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, 
D::OMPD_declare_simd);
 MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target);
 MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate);
 MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires);
-MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate);
 
 #undef MAKE_CONSTR_ID
 
@@ -111,8 +110,7 @@ struct DirectiveNameScope {
   std::is_same_v ||
   std::is_same_v ||
   std::is_same_v ||
-  std::is_same_v ||
-  std::is_same_v) {
+  std::is_same_v) {
 return MakeName(std::get(x.t).source, ConstructId::id);
   } else {
 return GetFromTuple(
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 09a45476420df..8cb6d2e744876 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct {
 
 // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list)
 struct OpenMPThreadprivate {
-  TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause]
diff --git a/flang/include/flang/Semantics/openmp-utils.h 
b/flang/include/flang/Semantics/openmp-utils.h
index 68318d6093a1e..65441728c5549 100644
--- a/flang/include/flang/Semantics/openmp-utils.h
+++ b/flang/include/flang/Semantics/openmp-utils.h
@@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const 
parser::OmpObject &object);
 const parser::ArrayElement *GetArrayElementFromObj(
 const parser::OmpObject &object);
 const Symbol *GetObjectSymbol(const parser::OmpObject &object);
-const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
 std::optional GetObjectSource(
 const parser::OmpObject &object);
+const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
+const parser::OmpObject *GetArgumentObject(const parser::OmpArgument 
&argument);
 
 bool IsCommonBlock(const Symbol &sym);
 bool IsExtendedListItem(const Symbol &sym);
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 66526ba00b5ed..60ce71cf983f6 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct(
 verbatim("REQUIRES"_tok), Parser{})))
 
 // 2.15.2 Threadprivate directive
-TYPE_PARSER(sourced(construct(
-verbatim("THREADPRIVATE"_tok), parenthesized(Parser{}
+TYPE_PARSER(sourced( //
+construct(
+predicated(OmpDirectiveNameParser{},
+IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >=
+Parser{})))
 
 // 2.11.3 Declarative Allocate directive
 TYPE_PARSER(
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 189a34ee1dc56..db46525ac57b1 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2611,12 +2611,11 @@ class UnparseVisitor {
   }
   void Unparse(const OpenMPThreadprivate &x) {
 BeginOpenMP();
-Word("!$OMP THREADPRIVATE (");
-Walk(std::get(x.t));
-Put(")\n");
+Word("!$OMP ");
+Walk(x.v);
+Put("\n");
 EndOpenMP();
   }
-
   bool Pre(const OmpMessageClause &x) {
 Walk(x.v);
 return false;
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 1ee5385fb38a1..507957dfecb3d 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -669,11 +669,6 @@ template  struct 
DirectiveSpellingVisi

[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)

2025-09-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/159641

None

>From 344bfe15f023e965348da4d92738b48683768887 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 18 Sep 2025 12:58:41 -0700
Subject: [PATCH] [AMDGPU] gfx1251 VOP2 dpp support

---
 llvm/lib/Target/AMDGPU/VOP2Instructions.td|  79 +++--
 llvm/test/CodeGen/AMDGPU/dpp_combine.ll   |   6 +-
 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s  |  74 
 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s| 106 ++
 .../AMDGPU/gfx1251_dasm_vop2_dpp16.txt|  37 ++
 5 files changed, 267 insertions(+), 35 deletions(-)
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s
 create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s
 create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop2_dpp16.txt

diff --git a/llvm/lib/Target/AMDGPU/VOP2Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP2Instructions.td
index 46a1a4bf1ab4a..37d92bc5076de 100644
--- a/llvm/lib/Target/AMDGPU/VOP2Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP2Instructions.td
@@ -287,10 +287,14 @@ multiclass VOP2bInst ,
  Commutable_REV;
 
-  let SubtargetPredicate = isGFX11Plus in {
-if P.HasExtVOP3DPP then
-  def _e64_dpp  : VOP3_DPP_Pseudo ;
-  } // End SubtargetPredicate = isGFX11Plus
+  if P.HasExtVOP3DPP then
+def _e64_dpp  : VOP3_DPP_Pseudo  {
+  let SubtargetPredicate = isGFX11Plus;
+}
+  else if P.HasExt64BitDPP then
+def _e64_dpp  : VOP3_DPP_Pseudo  {
+  let OtherPredicates = [HasDPALU_DPP];
+  }
 }
 }
 
@@ -345,10 +349,14 @@ multiclass
  VOPD_Component;
 }
 
-let SubtargetPredicate = isGFX11Plus in {
-  if P.HasExtVOP3DPP then
-def _e64_dpp  : VOP3_DPP_Pseudo ;
-} // End SubtargetPredicate = isGFX11Plus
+if P.HasExtVOP3DPP then
+  def _e64_dpp  : VOP3_DPP_Pseudo  {
+let SubtargetPredicate = isGFX11Plus;
+  }
+else if P.HasExt64BitDPP then
+  def _e64_dpp  : VOP3_DPP_Pseudo  {
+let OtherPredicates = [HasDPALU_DPP];
+  }
   }
 }
 
@@ -1607,8 +1615,9 @@ multiclass VOP2_Real_dpp op> {
 }
 
 multiclass VOP2_Real_dpp8 op> {
-  if !cast(NAME#"_e32").Pfl.HasExtDPP then
-  def _dpp8#Gen.Suffix : VOP2_DPP8_Gen(NAME#"_e32"), 
Gen>;
+  defvar ps = !cast(NAME#"_e32");
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then
+def _dpp8#Gen.Suffix : VOP2_DPP8_Gen;
 }
 
 //===- VOP2 (with name) -===//
@@ -1643,10 +1652,10 @@ multiclass VOP2_Real_dpp_with_name 
op, string opName,
 multiclass VOP2_Real_dpp8_with_name op, string opName,
 string asmName> {
   defvar ps = !cast(opName#"_e32");
-  if ps.Pfl.HasExtDPP then
-  def _dpp8#Gen.Suffix : VOP2_DPP8_Gen {
-let AsmString = asmName # ps.Pfl.AsmDPP8;
-  }
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then
+def _dpp8#Gen.Suffix : VOP2_DPP8_Gen {
+  let AsmString = asmName # ps.Pfl.AsmDPP8;
+}
 }
 
 //===-- VOP2be --===//
@@ -1687,32 +1696,32 @@ multiclass VOP2be_Real_dpp op, 
string opName, string asmName
 }
 }
 multiclass VOP2be_Real_dpp8 op, string opName, string 
asmName> {
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
+  defvar ps = !cast(opName#"_e32");
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then {
   def _dpp8#Gen.Suffix :
-VOP2_DPP8_Gen(opName#"_e32"), Gen> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8_Gen {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # !subst(", vcc", "", AsmDPP8);
 }
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
   def _dpp8_w32#Gen.Suffix :
-VOP2_DPP8(opName#"_e32")> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8 {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP8);
   let isAsmParserOnly = 1;
   let WaveSizePredicate = isWave32;
   let AssemblerPredicate = Gen.AssemblerPredicate;
   let DecoderNamespace = Gen.DecoderNamespace;
 }
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
   def _dpp8_w64#Gen.Suffix :
-VOP2_DPP8(opName#"_e32")> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8 {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # AsmDPP8;
   let isAsmParserOnly = 1;
   let WaveSizePredicate = isWave64;
   let AssemblerPredicate = Gen.AssemblerPredicate;
   let DecoderNamespace = Gen.DecoderNamespace;
 }
+  }
 }
 
 // We don't want to override separate decoderNamespaces within these
@@ -1777,9 +1786,11 @@ multiclass VOP2_Real_NO_DPP_with_name op, string opName,
   }
 }
 
-multiclass VOP2_Real_NO_DPP_with_alias op, string alias> {
+multiclass VOP2_Real_with_DPP16_with_alias op, string 
alias> {
   defm NAME

[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)

2025-09-18 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi approved this pull request.

LGTM. IMO this is a much nicer way to test a property on `stdin`'s positioning. 
Lets get a bit more consensus from other maintainers before landing though.

https://github.com/llvm/llvm-project/pull/157232
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)

2025-09-18 Thread S. VenkataKeerthy via llvm-branch-commits

https://github.com/svkeerthy edited 
https://github.com/llvm/llvm-project/pull/158376
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)

2025-09-18 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/157232

>From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Sun, 7 Sep 2025 17:39:19 +
Subject: [PATCH 1/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 17 +
 llvm/test/MC/COFF/stdin.s  |  1 -
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/COFF/stdin.py
 delete mode 100644 llvm/test/MC/COFF/stdin.s

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
new file mode 100644
index 0..8b7b6ae1fba13
--- /dev/null
+++ b/llvm/test/MC/COFF/stdin.py
@@ -0,0 +1,17 @@
+# RUN: echo "// comment" > %t.input
+# RUN: which llvm-mc | %python %s %t
+
+import subprocess
+import sys
+
+llvm_mc_binary = sys.stdin.readlines()[0].strip()
+temp_file = sys.argv[1]
+input_file = temp_file + ".input"
+
+with open(temp_file, "w") as mc_stdout:
+mc_stdout.seek(4)
+subprocess.run(
+[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+stdout=mc_stdout,
+check=True,
+)
diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s
deleted file mode 100644
index 8ceae7fdef501..0
--- a/llvm/test/MC/COFF/stdin.s
+++ /dev/null
@@ -1 +0,0 @@
-// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s 
) > %t'

>From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:28:59 +
Subject: [PATCH 2/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 8b7b6ae1fba13..1d9b50c022523 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,14 +1,22 @@
 # RUN: echo "// comment" > %t.input
 # RUN: which llvm-mc | %python %s %t
 
+import argparse
 import subprocess
 import sys
 
+parser = argparse.ArgumentParser()
+parser.add_argument("temp_file")
+arguments = parser.parse_args()
+
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = sys.argv[1]
+temp_file = arguments.temp_file
 input_file = temp_file + ".input"
 
 with open(temp_file, "w") as mc_stdout:
+## We need to test that starting on an input stream with a non-zero offset
+## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
+## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
 [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],

>From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:43:39 +
Subject: [PATCH 3/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 1d9b50c022523..0da1b4895142b 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,25 +1,30 @@
 # RUN: echo "// comment" > %t.input
-# RUN: which llvm-mc | %python %s %t
+# RUN: which llvm-mc | %python %s %t.input %t
 
 import argparse
 import subprocess
 import sys
 
 parser = argparse.ArgumentParser()
+parser.add_argument("input_file")
 parser.add_argument("temp_file")
 arguments = parser.parse_args()
 
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = arguments.temp_file
-input_file = temp_file + ".input"
 
-with open(temp_file, "w") as mc_stdout:
+with open(arguments.temp_file, "w") as mc_stdout:
 ## We need to test that starting on an input stream with a non-zero offset
 ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
 ## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
-[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+[
+llvm_mc_binary,
+"-filetype=obj",
+"-triple",
+"i686-pc-win32",
+arguments.input_file,
+],
 stdout=mc_stdout,
 check=True,
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)

2025-09-18 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156610

>From 3b73016ad3984069441409516598caf1161c7448 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 08:36:34 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
---
 .../OpenMP/DoConcurrentConversion.cpp | 117 ++
 .../DoConcurrent/reduce_device.mlir   |  53 
 2 files changed, 121 insertions(+), 49 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir

diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d00a4fdd2cf2e..6e308499100fa 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
   for (mlir::Value local : loop.getLocalVars())
 liveIns.push_back(local);
+
+  for (mlir::Value reduce : loop.getReduceVars())
+liveIns.push_back(reduce);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -319,7 +322,7 @@ class DoConcurrentConversion
   targetOp =
   genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns,
   targetClauseOps, loopNestClauseOps, liveInShapeInfoMap);
-  genTeamsOp(doLoop.getLoc(), rewriter);
+  genTeamsOp(rewriter, loop, mapper);
 }
 
 mlir::omp::ParallelOp parallelOp =
@@ -492,46 +495,7 @@ class DoConcurrentConversion
 if (!mapToDevice)
   genPrivatizers(rewriter, mapper, loop, wsloopClauseOps);
 
-if (!loop.getReduceVars().empty()) {
-  for (auto [op, byRef, sym, arg] : llvm::zip_equal(
-   loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
-   loop.getReduceSymsAttr().getAsRange(),
-   loop.getRegionReduceArgs())) {
-auto firReducer = moduleSymbolTable.lookup(
-sym.getLeafReference());
-
-mlir::OpBuilder::InsertionGuard guard(rewriter);
-rewriter.setInsertionPointAfter(firReducer);
-std::string ompReducerName = sym.getLeafReference().str() + ".omp";
-
-auto ompReducer =
-moduleSymbolTable.lookup(
-rewriter.getStringAttr(ompReducerName));
-
-if (!ompReducer) {
-  ompReducer = mlir::omp::DeclareReductionOp::create(
-  rewriter, firReducer.getLoc(), ompReducerName,
-  firReducer.getTypeAttr().getValue());
-
-  cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(),
-  ompReducer.getAllocRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(),
-  ompReducer.getInitializerRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(),
-  ompReducer.getReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(),
-  ompReducer.getAtomicReductionRegion());
-  cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(),
-  ompReducer.getCleanupRegion());
-  moduleSymbolTable.insert(ompReducer);
-}
-
-wsloopClauseOps.reductionVars.push_back(op);
-wsloopClauseOps.reductionByref.push_back(byRef);
-wsloopClauseOps.reductionSyms.push_back(
-mlir::SymbolRefAttr::get(ompReducer));
-  }
-}
+genReductions(rewriter, mapper, loop, wsloopClauseOps);
 
 auto wsloopOp =
 mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps);
@@ -553,8 +517,6 @@ class DoConcurrentConversion
 
 rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back());
 mlir::omp::YieldOp::create(rewriter, loop->getLoc());
-loop->getParentOfType().print(
-llvm::errs(), mlir::OpPrintingFlags().assumeVerified());
 
 return {loopNestOp, wsloopOp};
   }
@@ -778,15 +740,26 @@ class DoConcurrentConversion
 liveInName, shape);
   }
 
-  mlir::omp::TeamsOp
-  genTeamsOp(mlir::Location loc,
- mlir::ConversionPatternRewriter &rewriter) const {
-auto teamsOp = rewriter.create(
-loc, /*clauses=*/mlir::omp::TeamsOperands{});
+  mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter,
+fir::DoConcurrentLoopOp loop,
+mlir::IRMapping &mapper) const {
+mlir::omp::TeamsOperands teamsOps;
+genReductions(rewriter, mapper, loop, teamsOps);
+
+mlir::Location loc = loop.getLoc();
+aut

[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 22.81 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/159641.diff


5 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/VOP2Instructions.td (+45-34) 
- (modified) llvm/test/CodeGen/AMDGPU/dpp_combine.ll (+5-1) 
- (added) llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s (+74) 
- (added) llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s (+106) 
- (added) llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop2_dpp16.txt (+37) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/VOP2Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP2Instructions.td
index 46a1a4bf1ab4a..37d92bc5076de 100644
--- a/llvm/lib/Target/AMDGPU/VOP2Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP2Instructions.td
@@ -287,10 +287,14 @@ multiclass VOP2bInst ,
  Commutable_REV;
 
-  let SubtargetPredicate = isGFX11Plus in {
-if P.HasExtVOP3DPP then
-  def _e64_dpp  : VOP3_DPP_Pseudo ;
-  } // End SubtargetPredicate = isGFX11Plus
+  if P.HasExtVOP3DPP then
+def _e64_dpp  : VOP3_DPP_Pseudo  {
+  let SubtargetPredicate = isGFX11Plus;
+}
+  else if P.HasExt64BitDPP then
+def _e64_dpp  : VOP3_DPP_Pseudo  {
+  let OtherPredicates = [HasDPALU_DPP];
+  }
 }
 }
 
@@ -345,10 +349,14 @@ multiclass
  VOPD_Component;
 }
 
-let SubtargetPredicate = isGFX11Plus in {
-  if P.HasExtVOP3DPP then
-def _e64_dpp  : VOP3_DPP_Pseudo ;
-} // End SubtargetPredicate = isGFX11Plus
+if P.HasExtVOP3DPP then
+  def _e64_dpp  : VOP3_DPP_Pseudo  {
+let SubtargetPredicate = isGFX11Plus;
+  }
+else if P.HasExt64BitDPP then
+  def _e64_dpp  : VOP3_DPP_Pseudo  {
+let OtherPredicates = [HasDPALU_DPP];
+  }
   }
 }
 
@@ -1607,8 +1615,9 @@ multiclass VOP2_Real_dpp op> {
 }
 
 multiclass VOP2_Real_dpp8 op> {
-  if !cast(NAME#"_e32").Pfl.HasExtDPP then
-  def _dpp8#Gen.Suffix : VOP2_DPP8_Gen(NAME#"_e32"), 
Gen>;
+  defvar ps = !cast(NAME#"_e32");
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then
+def _dpp8#Gen.Suffix : VOP2_DPP8_Gen;
 }
 
 //===- VOP2 (with name) -===//
@@ -1643,10 +1652,10 @@ multiclass VOP2_Real_dpp_with_name 
op, string opName,
 multiclass VOP2_Real_dpp8_with_name op, string opName,
 string asmName> {
   defvar ps = !cast(opName#"_e32");
-  if ps.Pfl.HasExtDPP then
-  def _dpp8#Gen.Suffix : VOP2_DPP8_Gen {
-let AsmString = asmName # ps.Pfl.AsmDPP8;
-  }
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then
+def _dpp8#Gen.Suffix : VOP2_DPP8_Gen {
+  let AsmString = asmName # ps.Pfl.AsmDPP8;
+}
 }
 
 //===-- VOP2be --===//
@@ -1687,32 +1696,32 @@ multiclass VOP2be_Real_dpp op, 
string opName, string asmName
 }
 }
 multiclass VOP2be_Real_dpp8 op, string opName, string 
asmName> {
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
+  defvar ps = !cast(opName#"_e32");
+  if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then {
   def _dpp8#Gen.Suffix :
-VOP2_DPP8_Gen(opName#"_e32"), Gen> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8_Gen {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # !subst(", vcc", "", AsmDPP8);
 }
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
   def _dpp8_w32#Gen.Suffix :
-VOP2_DPP8(opName#"_e32")> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8 {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP8);
   let isAsmParserOnly = 1;
   let WaveSizePredicate = isWave32;
   let AssemblerPredicate = Gen.AssemblerPredicate;
   let DecoderNamespace = Gen.DecoderNamespace;
 }
-  if !cast(opName#"_e32").Pfl.HasExtDPP then
   def _dpp8_w64#Gen.Suffix :
-VOP2_DPP8(opName#"_e32")> {
-  string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8;
+VOP2_DPP8 {
+  string AsmDPP8 = ps.Pfl.AsmDPP8;
   let AsmString = asmName # AsmDPP8;
   let isAsmParserOnly = 1;
   let WaveSizePredicate = isWave64;
   let AssemblerPredicate = Gen.AssemblerPredicate;
   let DecoderNamespace = Gen.DecoderNamespace;
 }
+  }
 }
 
 // We don't want to override separate decoderNamespaces within these
@@ -1777,9 +1786,11 @@ multiclass VOP2_Real_NO_DPP_with_name op, string opName,
   }
 }
 
-multiclass VOP2_Real_NO_DPP_with_alias op, string alias> {
+multiclass VOP2_Real_with_DPP16_with_alias op, string 
alias> {
   defm NAME : VOP2_Real_e32,
-  VOP2_Real_e64;
+  VOP2_Real_dpp,
+  VOP2_Real_e64,
+  VOP3_Real_dpp_Base;
   def Gen.Suffix#"_alias" : AMDGPUMnemonicAlias {
 let AssemblerPredicate = Gen.AssemblerPredicate;
   }
@@ -1808,6 +1819,9 

[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Mircea Trofin (mtrofin)


Changes



---

Patch is 21.02 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/159645.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+75-11) 
- (modified) llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll 
(+42-30) 


``diff
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index a1f759dd1df83..276ca89d715f1 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -84,6 +84,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, 
PHINode *&PHI,
 // Helper function that checks if it is possible to transform a switch with 
only
 // two cases (or two cases + default) that produces a result into a select.
 // TODO: Handle switches with more than 2 cases that map to the same result.
+// The branch weights correspond to the provided Condition (i.e. if Condition 
is
+// modified from the original SwitchInst, the caller must adjust the weights)
 static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector,
  Constant *DefaultResult, Value *Condition,
- IRBuilder<> &Builder, const DataLayout &DL) {
+ IRBuilder<> &Builder, const DataLayout &DL,
+ ArrayRef BranchWeights) {
   // If we are selecting between only two cases transform into a simple
   // select or a two-way select if default is possible.
   // Example:
@@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   //   case 20: return 2;   >  %2 = icmp eq i32 %a, 20
   //   default: return 4;  %3 = select i1 %2, i32 2, i32 %1
   // }
+
+  const bool HasBranchWeights =
+  !BranchWeights.empty() && !ProfcheckDisableMetadataFixes;
+
   if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 &&
   ResultVector[1].second.size() == 1) {
 ConstantInt *FirstCase = ResultVector[0].second[0];
@@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
 if (DefaultResult) {
   Value *ValueCompare =
   Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp");
-  SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first,
- DefaultResult, "switch.select");
+  SelectInst *SelectValueInst = cast(Builder.CreateSelect(
+  ValueCompare, ResultVector[1].first, DefaultResult, 
"switch.select"));
+  SelectValue = SelectValueInst;
+  if (HasBranchWeights) {
+// We start with 3 probabilities, where the numerator is the
+// corresponding BranchWeights[i], and the denominator is the sum over
+// BranchWeights. We want the probability and negative probability of
+// Condition == SecondCase.
+assert(BranchWeights.size() == 3);
+setBranchWeights(SelectValueInst, BranchWeights[2],
+ BranchWeights[0] + BranchWeights[1],
+ /*IsExpected=*/false);
+}
 }
 Value *ValueCompare =
 Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp");
-return Builder.CreateSelect(ValueCompare, ResultVector[0].first,
-SelectValue, "switch.select");
+SelectInst *Ret = cast(Builder.CreateSelect(
+ValueCompare, ResultVector[0].first, SelectValue, "switch.select"));
+if (HasBranchWeights) {
+  // We may have had a DefaultResult. Base the position of the first and
+  // second's branch weights accordingly. Also the proability that 
Condition
+  // != FirstCase needs to take that into account.
+  assert(BranchWeights.size() >= 2);
+  size_t FirstCasePos = (Condition != nullptr);
+  size_t SecondCasePos = FirstCasePos + 1;
+  uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0;
+  setBranchWeights(Ret, BranchWeights[FirstCasePos],
+   DefaultCase + BranchWeights[SecondCasePos],
+   /*IsExpected=*/false);
+}
+return Ret;
   }
 
   // Handle the degenerate case where two cases have the same result value.
@@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   Value *And = Builder.CreateAnd(Condition, AndMask);
   Value *Cmp = Builder.CreateICmpEQ(
   And, Constant::getIntegerValue(And->getType(), AndMask));
-  return Builder.CreateSelect(Cmp, ResultVector[0].first,
-  DefaultResult);
+  SelectInst *Ret = cast(Builder.CreateSelect(Cmp, 
ResultVector[0].first,
+ 

[llvm-branch-commits] [mlir] 301f09f - Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…"

2025-09-18 Thread via llvm-branch-commits

Author: MaheshRavishankar
Date: 2025-09-18T13:49:24-07:00
New Revision: 301f09f236c1439c9313ebc2dda1193d210ab698

URL: 
https://github.com/llvm/llvm-project/commit/301f09f236c1439c9313ebc2dda1193d210ab698
DIFF: 
https://github.com/llvm/llvm-project/commit/301f09f236c1439c9313ebc2dda1193d210ab698.diff

LOG: Revert "[mlir][SCF] Allow using a custom operation to generate loops with 
`ml…"

This reverts commit b8649098a7fcf598406d8d8b7d68891d1444e9c8.

Added: 


Modified: 
mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.cpp
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td

Removed: 
mlir/test/Interfaces/TilingInterface/tile-using-custom-op.mlir



diff  --git a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h 
b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
index 6b05ade37881c..3205da6e448fc 100644
--- a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
+++ b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
@@ -33,14 +33,6 @@ using SCFTileSizeComputationFunction =
 
 /// Options to use to control tiling.
 struct SCFTilingOptions {
-  /// Specify which loop construct to use for tile and fuse.
-  enum class LoopType { ForOp, ForallOp, CustomOp };
-  LoopType loopType = LoopType::ForOp;
-  SCFTilingOptions &setLoopType(LoopType type) {
-loopType = type;
-return *this;
-  }
-
   /// Computation function that returns the tile sizes to use for each loop.
   /// Returning a tile size of zero implies no tiling for that loop. If the
   /// size of the returned vector is smaller than the number of loops, the 
inner
@@ -58,17 +50,6 @@ struct SCFTilingOptions {
   /// proper interaction with folding.
   SCFTilingOptions &setTileSizes(ArrayRef tileSizes);
 
-  /// The interchange vector to reorder the tiled loops.
-  SmallVector interchangeVector = {};
-  SCFTilingOptions &setInterchange(ArrayRef interchange) {
-interchangeVector = llvm::to_vector(interchange);
-return *this;
-  }
-
-  //-//
-  // Options related to tiling using `scf.forall`.
-  //-//
-
   /// Computation function that returns the number of threads to use for
   /// each loop. Returning a num threads of zero implies no tiling for that
   /// loop. If the size of the returned vector is smaller than the number of
@@ -89,6 +70,21 @@ struct SCFTilingOptions {
   /// function that computes num threads at the point they are needed.
   SCFTilingOptions &setNumThreads(ArrayRef numThreads);
 
+  /// The interchange vector to reorder the tiled loops.
+  SmallVector interchangeVector = {};
+  SCFTilingOptions &setInterchange(ArrayRef interchange) {
+interchangeVector = llvm::to_vector(interchange);
+return *this;
+  }
+
+  /// Specify which loop construct to use for tile and fuse.
+  enum class LoopType { ForOp, ForallOp };
+  LoopType loopType = LoopType::ForOp;
+  SCFTilingOptions &setLoopType(LoopType type) {
+loopType = type;
+return *this;
+  }
+
   /// Specify mapping of loops to devices. This is only respected when the loop
   /// constructs support such a mapping (like `scf.forall`). Will be ignored
   /// when using loop constructs that dont support such a mapping (like
@@ -121,98 +117,6 @@ struct SCFTilingOptions {
 reductionDims.insert(dims.begin(), dims.end());
 return *this;
   }
-
-  //-//
-  // Options related to tiling using custom loop.
-  //-//
-
-  // For generating the inter-tile loops using a custom loop, two callback
-  // functions are needed
-  // 1. That generates the "loop header", i.e. the loop that iterates over the
-  //
diff erent tiles.
-  // 2. That generates the loop terminator
-  //
-  // For `scf.forall` case the call back to generate loop header would generate
-  //
-  // ```mlir
-  // scf.forall (...) = ... {
-  //   ..
-  // }
-  // ```
-  //
-  // and the call back to generate the loop terminator would generate the
-  // `scf.in_parallel` region
-  //
-  // ```mlir
-  // scf.forall (...) = ... {
-  //   scf.in_parallel {
-  //  tensor.parallel_insert_slice ...
-  //   }
-  // }
-  // ```
-  //
-
-  // Information that is to be returned by the callback to generate the loop
-  // header needed for the rest of the tiled codegeneration.
-  // - `loops`: The generated loops
-  // - `tileOffset`: The values that represent the offset of the iteration 
space
-  // tile
-  // - `tileSizes` : The values that represent the size of the iteration space
-  // tile.
-  // - `destinationTensor

[llvm-branch-commits] [mlir] 7af3f6e - Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…"

2025-09-18 Thread via llvm-branch-commits

Author: MaheshRavishankar
Date: 2025-09-18T09:29:29-07:00
New Revision: 7af3f6e0317e84900e6683ac0ea3dc60b805904e

URL: 
https://github.com/llvm/llvm-project/commit/7af3f6e0317e84900e6683ac0ea3dc60b805904e
DIFF: 
https://github.com/llvm/llvm-project/commit/7af3f6e0317e84900e6683ac0ea3dc60b805904e.diff

LOG: Revert "[mlir][SCF] Allow using a custom operation to generate loops with 
`ml…"

This reverts commit b8649098a7fcf598406d8d8b7d68891d1444e9c8.

Added: 


Modified: 
mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.cpp
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td

Removed: 
mlir/test/Interfaces/TilingInterface/tile-using-custom-op.mlir



diff  --git a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h 
b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
index 6b05ade37881c..3205da6e448fc 100644
--- a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
+++ b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
@@ -33,14 +33,6 @@ using SCFTileSizeComputationFunction =
 
 /// Options to use to control tiling.
 struct SCFTilingOptions {
-  /// Specify which loop construct to use for tile and fuse.
-  enum class LoopType { ForOp, ForallOp, CustomOp };
-  LoopType loopType = LoopType::ForOp;
-  SCFTilingOptions &setLoopType(LoopType type) {
-loopType = type;
-return *this;
-  }
-
   /// Computation function that returns the tile sizes to use for each loop.
   /// Returning a tile size of zero implies no tiling for that loop. If the
   /// size of the returned vector is smaller than the number of loops, the 
inner
@@ -58,17 +50,6 @@ struct SCFTilingOptions {
   /// proper interaction with folding.
   SCFTilingOptions &setTileSizes(ArrayRef tileSizes);
 
-  /// The interchange vector to reorder the tiled loops.
-  SmallVector interchangeVector = {};
-  SCFTilingOptions &setInterchange(ArrayRef interchange) {
-interchangeVector = llvm::to_vector(interchange);
-return *this;
-  }
-
-  //-//
-  // Options related to tiling using `scf.forall`.
-  //-//
-
   /// Computation function that returns the number of threads to use for
   /// each loop. Returning a num threads of zero implies no tiling for that
   /// loop. If the size of the returned vector is smaller than the number of
@@ -89,6 +70,21 @@ struct SCFTilingOptions {
   /// function that computes num threads at the point they are needed.
   SCFTilingOptions &setNumThreads(ArrayRef numThreads);
 
+  /// The interchange vector to reorder the tiled loops.
+  SmallVector interchangeVector = {};
+  SCFTilingOptions &setInterchange(ArrayRef interchange) {
+interchangeVector = llvm::to_vector(interchange);
+return *this;
+  }
+
+  /// Specify which loop construct to use for tile and fuse.
+  enum class LoopType { ForOp, ForallOp };
+  LoopType loopType = LoopType::ForOp;
+  SCFTilingOptions &setLoopType(LoopType type) {
+loopType = type;
+return *this;
+  }
+
   /// Specify mapping of loops to devices. This is only respected when the loop
   /// constructs support such a mapping (like `scf.forall`). Will be ignored
   /// when using loop constructs that dont support such a mapping (like
@@ -121,98 +117,6 @@ struct SCFTilingOptions {
 reductionDims.insert(dims.begin(), dims.end());
 return *this;
   }
-
-  //-//
-  // Options related to tiling using custom loop.
-  //-//
-
-  // For generating the inter-tile loops using a custom loop, two callback
-  // functions are needed
-  // 1. That generates the "loop header", i.e. the loop that iterates over the
-  //
diff erent tiles.
-  // 2. That generates the loop terminator
-  //
-  // For `scf.forall` case the call back to generate loop header would generate
-  //
-  // ```mlir
-  // scf.forall (...) = ... {
-  //   ..
-  // }
-  // ```
-  //
-  // and the call back to generate the loop terminator would generate the
-  // `scf.in_parallel` region
-  //
-  // ```mlir
-  // scf.forall (...) = ... {
-  //   scf.in_parallel {
-  //  tensor.parallel_insert_slice ...
-  //   }
-  // }
-  // ```
-  //
-
-  // Information that is to be returned by the callback to generate the loop
-  // header needed for the rest of the tiled codegeneration.
-  // - `loops`: The generated loops
-  // - `tileOffset`: The values that represent the offset of the iteration 
space
-  // tile
-  // - `tileSizes` : The values that represent the size of the iteration space
-  // tile.
-  // - `destinationTensor

[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #150937)

2025-09-18 Thread Valery Pykhtin via llvm-branch-commits

https://github.com/vpykhtin updated 
https://github.com/llvm/llvm-project/pull/150937

>From ae3589e2c93351349cd1bbb5586c2dfcb075ea68 Mon Sep 17 00:00:00 2001
From: Valery Pykhtin 
Date: Thu, 10 Apr 2025 11:58:13 +
Subject: [PATCH] amdgpu_use_ssaupdaterbulk_in_structurizecfg

---
 llvm/lib/Transforms/Scalar/StructurizeCFG.cpp | 25 +++
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp 
b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
index 2ee91a9b40026..0f3978f56045e 100644
--- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
+++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
@@ -47,6 +47,7 @@
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Transforms/Utils/SSAUpdater.h"
+#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
 #include 
 #include 
 
@@ -321,7 +322,7 @@ class StructurizeCFG {
 
   void collectInfos();
 
-  void insertConditions(bool Loops);
+  void insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter);
 
   void simplifyConditions();
 
@@ -671,10 +672,9 @@ void StructurizeCFG::collectInfos() {
 }
 
 /// Insert the missing branch conditions
-void StructurizeCFG::insertConditions(bool Loops) {
+void StructurizeCFG::insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter) 
{
   BranchVector &Conds = Loops ? LoopConds : Conditions;
   Value *Default = Loops ? BoolTrue : BoolFalse;
-  SSAUpdater PhiInserter;
 
   for (BranchInst *Term : Conds) {
 assert(Term->isConditional());
@@ -683,8 +683,9 @@ void StructurizeCFG::insertConditions(bool Loops) {
 BasicBlock *SuccTrue = Term->getSuccessor(0);
 BasicBlock *SuccFalse = Term->getSuccessor(1);
 
-PhiInserter.Initialize(Boolean, "");
-PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default);
+unsigned Variable = PhiInserter.AddVariable("", Boolean);
+PhiInserter.AddAvailableValue(Variable, Loops ? SuccFalse : Parent,
+  Default);
 
 BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue];
 
@@ -697,7 +698,7 @@ void StructurizeCFG::insertConditions(bool Loops) {
 ParentInfo = PI;
 break;
   }
-  PhiInserter.AddAvailableValue(BB, PI.Pred);
+  PhiInserter.AddAvailableValue(Variable, BB, PI.Pred);
   Dominator.addAndRememberBlock(BB);
 }
 
@@ -706,9 +707,9 @@ void StructurizeCFG::insertConditions(bool Loops) {
   CondBranchWeights::setMetadata(*Term, ParentInfo.Weights);
 } else {
   if (!Dominator.resultIsRememberedBlock())
-PhiInserter.AddAvailableValue(Dominator.result(), Default);
+PhiInserter.AddAvailableValue(Variable, Dominator.result(), Default);
 
-  Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent));
+  PhiInserter.AddUse(Variable, &Term->getOperandUse(0));
 }
   }
 }
@@ -1414,8 +1415,12 @@ bool StructurizeCFG::run(Region *R, DominatorTree *DT,
   orderNodes();
   collectInfos();
   createFlow();
-  insertConditions(false);
-  insertConditions(true);
+
+  SSAUpdaterBulk PhiInserter;
+  insertConditions(false, PhiInserter);
+  insertConditions(true, PhiInserter);
+  PhiInserter.RewriteAndOptimizeAllUses(*DT);
+
   setPhiValues();
   simplifyHoistedPhis();
   simplifyConditions();

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Offload] `olGetMemInfo` (PR #157651)

2025-09-18 Thread Ross Brunton via llvm-branch-commits

https://github.com/RossBrunton converted_to_draft 
https://github.com/llvm/llvm-project/pull/157651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Validating Root flags are denying shader stage (PR #153287)

2025-09-18 Thread via llvm-branch-commits

https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/153287

>From b1e34ff07fffe96fec438b87027bd2c450b6b36f Mon Sep 17 00:00:00 2001
From: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Date: Tue, 12 Aug 2025 13:07:42 -0700
Subject: [PATCH 01/24] adding validaiton and tests

---
 .../DXILPostOptimizationValidation.cpp| 95 ++-
 .../rootsignature-validation-deny-shader.ll   | 16 
 ...re-validation-fail-deny-multiple-shader.ll | 17 
 ...ture-validation-fail-deny-single-shader.ll | 17 
 4 files changed, 122 insertions(+), 23 deletions(-)
 create mode 100644 
llvm/test/CodeGen/DirectX/rootsignature-validation-deny-shader.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/rootsignature-validation-fail-deny-multiple-shader.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/rootsignature-validation-fail-deny-single-shader.ll

diff --git a/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp 
b/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp
index 3721b5f539b8c..251f4a0daf43a 100644
--- a/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp
+++ b/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp
@@ -21,6 +21,7 @@
 #include "llvm/InitializePasses.h"
 #include "llvm/MC/DXContainerRootSignature.h"
 #include "llvm/Support/DXILABI.h"
+#include "llvm/TargetParser/Triple.h"
 #include 
 
 #define DEBUG_TYPE "dxil-post-optimization-validation"
@@ -169,15 +170,16 @@ reportDescriptorTableMixingTypes(Module &M, uint32_t 
Location,
   M.getContext().diagnose(DiagnosticInfoGeneric(Message));
 }
 
-static void reportOverlowingRange(Module &M, const 
dxbc::RTS0::v2::DescriptorRange &Range) {
+static void
+reportOverlowingRange(Module &M, const dxbc::RTS0::v2::DescriptorRange &Range) 
{
   SmallString<128> Message;
   raw_svector_ostream OS(Message);
-  OS << "Cannot append range with implicit lower " 
-  << "bound after an unbounded range "
-  << 
getResourceClassName(toResourceClass(static_cast(Range.RangeType)))
-  << "(register=" << Range.BaseShaderRegister << ", space=" << 
-  Range.RegisterSpace
-  << ") exceeds maximum allowed value.";
+  OS << "Cannot append range with implicit lower "
+ << "bound after an unbounded range "
+ << getResourceClassName(toResourceClass(
+static_cast(Range.RangeType)))
+ << "(register=" << Range.BaseShaderRegister
+ << ", space=" << Range.RegisterSpace << ") exceeds maximum allowed 
value.";
   M.getContext().diagnose(DiagnosticInfoGeneric(Message));
 }
 
@@ -262,12 +264,57 @@ getRootDescriptorsBindingInfo(const 
mcdxbc::RootSignatureDesc &RSD,
   return RDs;
 }
 
+static void reportIfDeniedShaderStageAccess(Module &M, dxbc::RootFlags Flags,
+dxbc::RootFlags Mask) {
+  if ((Flags & Mask) == Mask) {
+SmallString<128> Message;
+raw_svector_ostream OS(Message);
+OS << "Shader has root bindings but root signature uses a DENY flag to "
+  "disallow root binding access to the shader stage.";
+M.getContext().diagnose(DiagnosticInfoGeneric(Message));
+  }
+}
+
+static void validateRootFlags(Module &M, const mcdxbc::RootSignatureDesc &RSD,
+  const dxil::ModuleMetadataInfo &MMI) {
+  dxbc::RootFlags Flags = dxbc::RootFlags(RSD.Flags);
 
+  switch (MMI.ShaderProfile) {
+  case Triple::Pixel:
+reportIfDeniedShaderStageAccess(M, Flags,
+
dxbc::RootFlags::DenyPixelShaderRootAccess);
+break;
+  case Triple::Vertex:
+reportIfDeniedShaderStageAccess(
+M, Flags, dxbc::RootFlags::DenyVertexShaderRootAccess);
+break;
+  case Triple::Geometry:
+reportIfDeniedShaderStageAccess(
+M, Flags, dxbc::RootFlags::DenyGeometryShaderRootAccess);
+break;
+  case Triple::Hull:
+reportIfDeniedShaderStageAccess(M, Flags,
+dxbc::RootFlags::DenyHullShaderRootAccess);
+break;
+  case Triple::Domain:
+reportIfDeniedShaderStageAccess(
+M, Flags, dxbc::RootFlags::DenyDomainShaderRootAccess);
+break;
+  case Triple::Mesh:
+reportIfDeniedShaderStageAccess(M, Flags,
+dxbc::RootFlags::DenyMeshShaderRootAccess);
+break;
+  case Triple::Amplification:
+reportIfDeniedShaderStageAccess(
+M, Flags, dxbc::RootFlags::DenyAmplificationShaderRootAccess);
+break;
+  default:
+break;
+  }
+}
 
 static void validateDescriptorTables(Module &M,
- const mcdxbc::RootSignatureDesc &RSD,
- dxil::ModuleMetadataInfo &MMI,
- DXILResourceMap &DRM) {
+ const mcdxbc::RootSignatureDesc &RSD) {
   for (const mcdxbc::RootParameterInfo &ParamInfo : RSD.ParametersContainer) {
 if (static_cast(ParamInfo.Header.ParameterType) !=
 dxbc::RootParameterType::DescriptorTable)
@@ -2

[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)

2025-09-18 Thread Mehdi Amini via llvm-branch-commits


@@ -47,74 +47,61 @@ static func::FuncOp getOrDeclare(fir::FirOpBuilder 
&builder, Location loc,
   return func;
 }
 
-static bool isZero(Value v) {
-  if (auto cst = v.getDefiningOp())
-if (auto attr = dyn_cast(cst.getValue()))
-  return attr.getValue().isZero();
-  return false;
-}
-
 void ConvertComplexPowPass::runOnOperation() {
   ModuleOp mod = getOperation();
   fir::FirOpBuilder builder(mod, fir::getKindMapping(mod));
 
-  mod.walk([&](complex::PowOp op) {
+  mod.walk([&](complex::PowiOp op) {
 builder.setInsertionPoint(op);
 Location loc = op.getLoc();
 auto complexTy = cast(op.getType());
 auto elemTy = complexTy.getElementType();
-
 Value base = op.getLhs();
-Value rhs = op.getRhs();
-
-Value intExp;
-if (auto create = rhs.getDefiningOp()) {
-  if (isZero(create.getImaginary())) {
-if (auto conv = create.getReal().getDefiningOp()) {
-  if (auto intTy = dyn_cast(conv.getValue().getType()))
-intExp = conv.getValue();
-}
-  }
-}
-
+Value intExp = op.getRhs();
 func::FuncOp callee;
-SmallVector args;
-if (intExp) {
-  unsigned realBits = cast(elemTy).getWidth();
-  unsigned intBits = cast(intExp.getType()).getWidth();
-  auto funcTy = builder.getFunctionType(
-  {complexTy, builder.getIntegerType(intBits)}, {complexTy});
-  if (realBits == 32 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
-  else if (realBits == 32 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
-  else if (realBits == 64 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
-  else if (realBits == 64 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
-  else if (realBits == 128 && intBits == 32)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
-  else if (realBits == 128 && intBits == 64)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
-  else
-return;
-  args = {base, intExp};
-} else {
-  unsigned realBits = cast(elemTy).getWidth();
-  auto funcTy =
-  builder.getFunctionType({complexTy, complexTy}, {complexTy});
-  if (realBits == 32)
-callee = getOrDeclare(builder, loc, "cpowf", funcTy);
-  else if (realBits == 64)
-callee = getOrDeclare(builder, loc, "cpow", funcTy);
-  else if (realBits == 128)
-callee = getOrDeclare(builder, loc, RTNAME_STRING(CPowF128), funcTy);
-  else
-return;
-  args = {base, rhs};
-}
+unsigned realBits = cast(elemTy).getWidth();
+unsigned intBits = cast(intExp.getType()).getWidth();
+auto funcTy = builder.getFunctionType(
+{complexTy, builder.getIntegerType(intBits)}, {complexTy});
+if (realBits == 32 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy);
+else if (realBits == 32 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy);
+else if (realBits == 64 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy);
+else if (realBits == 64 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy);
+else if (realBits == 128 && intBits == 32)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy);
+else if (realBits == 128 && intBits == 64)
+  callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy);
+else
+  return;
+auto call = fir::CallOp::create(builder, loc, callee, {base, intExp});
+if (auto fmf = op.getFastmathAttr())
+  call.setFastmathAttr(fmf);
+op.replaceAllUsesWith(call.getResult(0));
+op.erase();
+  });
 
-auto call = fir::CallOp::create(builder, loc, callee, args);
+  mod.walk([&](complex::PowOp op) {

joker-eph wrote:

We should not walk multiple times if we can do it in a single traversal, can 
you replace this with a walk on Operation* and dispatch inside the walk?

https://github.com/llvm/llvm-project/pull/158722
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)

2025-09-18 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156840

>From 14c75441e84aa32e4f5876598b9a2c59d4ecbe65 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 21:32:21 +0200
Subject: [PATCH 1/2] fixup! fix for incomplete types

Created using spr 1.3.8-beta.1
---
 clang/lib/CodeGen/CGExpr.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 288b41bc42203..455de644daf00 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1289,6 +1289,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   // Check if QualType contains a pointer. Implements a simple DFS to
   // recursively check if a type contains a pointer type.
   llvm::SmallPtrSet VisitedRD;
+  bool IncompleteType = false;
   auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
 QualType CanonicalType = T.getCanonicalType();
 if (CanonicalType->isPointerType())
@@ -1312,6 +1313,10 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   return self(self, AT->getElementType());
 // The type is a struct, class, or union.
 if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
+  if (!RD->isCompleteDefinition()) {
+IncompleteType = true;
+return false;
+  }
   if (!VisitedRD.insert(RD).second)
 return false; // already visited
   // Check all fields.
@@ -1333,6 +1338,8 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
 return false;
   };
   const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
+  if (!ContainsPtr && IncompleteType)
+return nullptr;
   auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
   auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
 

>From 7f706618ddc40375d4085bc2ebe03f02ec78823a Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Mon, 8 Sep 2025 21:58:01 +0200
Subject: [PATCH 2/2] fixup!

Created using spr 1.3.8-beta.1
---
 clang/lib/CodeGen/CGExpr.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 455de644daf00..e7a0e7696e204 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1339,7 +1339,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   };
   const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
   if (!ContainsPtr && IncompleteType)
-return nullptr;
+return;
   auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
   auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-18 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler updated 
https://github.com/llvm/llvm-project/pull/156715

>From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001
From: Tobias Stadler 
Date: Thu, 18 Sep 2025 12:34:55 +0100
Subject: [PATCH 1/3] Replace bitstream blobs by yaml

Created using spr 1.3.7-wip
---
 llvm/lib/Remarks/BitstreamRemarkParser.cpp|   5 +-
 .../dsymutil/ARM/remarks-linking-bundle.test  |  13 +-
 .../basic1.macho.remarks.arm64.opt.bitstream  | Bin 824 -> 0 bytes
 .../basic1.macho.remarks.arm64.opt.yaml   |  47 +
 ...c1.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic2.macho.remarks.arm64.opt.bitstream  | Bin 1696 -> 0 bytes
 .../basic2.macho.remarks.arm64.opt.yaml   | 194 ++
 ...c2.macho.remarks.empty.arm64.opt.bitstream |   0
 .../basic3.macho.remarks.arm64.opt.bitstream  | Bin 1500 -> 0 bytes
 .../basic3.macho.remarks.arm64.opt.yaml   | 181 
 ...c3.macho.remarks.empty.arm64.opt.bitstream |   0
 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes
 .../remarks/fat.macho.remarks.x86_64.opt.yaml |  53 +
 .../fat.macho.remarks.x86_64h.opt.bitstream   | Bin 820 -> 0 bytes
 .../fat.macho.remarks.x86_64h.opt.yaml|  53 +
 .../X86/remarks-linking-fat-bundle.test   |   8 +-
 16 files changed, 543 insertions(+), 11 deletions(-)
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml
 delete mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream
 create mode 100644 
llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml

diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp 
b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
index 63b16bd2df0ec..2b27a0f661d88 100644
--- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp
+++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp
@@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() {
 return E;
 
   if (ContainerType != BitstreamRemarkContainerType::RemarksFile)
-return error(
-"Error while parsing external file's BLOCK_META: wrong container "
-"type.");
+return ParserHelper->MetaHelper.error(
+"Wrong container type in external file.");
 
   return Error::success();
 }
diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test 
b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
index 09a60d7d044c6..e1b04455b0d9d 100644
--- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
+++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test
@@ -1,22 +1,25 @@
 RUN: rm -rf %t
-RUN: mkdir -p %t
+RUN: mkdir -p %t/private/tmp/remarks
 RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> 
%t/basic.macho.remarks.arm64
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream
+RUN: llvm-remarkutil yaml2bitstream 
%p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o 
%t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream
 
-RUN: dsymutil -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64
+RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t 
%t/basic.macho.remarks.arm64
 
 Check that the remark file in the bundle exists and is sane:
 RUN: llvm-bcanalyzer -dump 
%t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64
 | FileCheck %s
 
-RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs 
-remarks-prepend-path=%p/../Inputs %t/basic.macho.r

[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)

2025-09-18 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz created 
https://github.com/llvm/llvm-project/pull/159632

Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some 
semantics-checking functions needed to be updated to operate on a single object 
at a time.

>From 7bb9fb5b3b9a2dfcd1d00f01c86fe26c5d14c30f Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Thu, 18 Sep 2025 08:49:38 -0500
Subject: [PATCH] [flang][OpenMP] Use OmpDirectiveSpecification in
 THREADPRIVATE

Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList),
some semantics-checking functions needed to be updated to operate on a
single object at a time.
---
 flang/include/flang/Parser/openmp-utils.h|  4 +-
 flang/include/flang/Parser/parse-tree.h  |  3 +-
 flang/include/flang/Semantics/openmp-utils.h |  3 +-
 flang/lib/Parser/openmp-parsers.cpp  |  7 +-
 flang/lib/Parser/unparse.cpp |  7 +-
 flang/lib/Semantics/check-omp-structure.cpp  | 89 +++-
 flang/lib/Semantics/check-omp-structure.h|  3 +
 flang/lib/Semantics/openmp-utils.cpp | 22 +++--
 flang/lib/Semantics/resolve-directives.cpp   | 11 ++-
 9 files changed, 86 insertions(+), 63 deletions(-)

diff --git a/flang/include/flang/Parser/openmp-utils.h 
b/flang/include/flang/Parser/openmp-utils.h
index 032fb8996fe48..1372945427955 100644
--- a/flang/include/flang/Parser/openmp-utils.h
+++ b/flang/include/flang/Parser/openmp-utils.h
@@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, 
D::OMPD_declare_simd);
 MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target);
 MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate);
 MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires);
-MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate);
 
 #undef MAKE_CONSTR_ID
 
@@ -111,8 +110,7 @@ struct DirectiveNameScope {
   std::is_same_v ||
   std::is_same_v ||
   std::is_same_v ||
-  std::is_same_v ||
-  std::is_same_v) {
+  std::is_same_v) {
 return MakeName(std::get(x.t).source, ConstructId::id);
   } else {
 return GetFromTuple(
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 09a45476420df..8cb6d2e744876 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct {
 
 // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list)
 struct OpenMPThreadprivate {
-  TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause]
diff --git a/flang/include/flang/Semantics/openmp-utils.h 
b/flang/include/flang/Semantics/openmp-utils.h
index 68318d6093a1e..65441728c5549 100644
--- a/flang/include/flang/Semantics/openmp-utils.h
+++ b/flang/include/flang/Semantics/openmp-utils.h
@@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const 
parser::OmpObject &object);
 const parser::ArrayElement *GetArrayElementFromObj(
 const parser::OmpObject &object);
 const Symbol *GetObjectSymbol(const parser::OmpObject &object);
-const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
 std::optional GetObjectSource(
 const parser::OmpObject &object);
+const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
+const parser::OmpObject *GetArgumentObject(const parser::OmpArgument 
&argument);
 
 bool IsCommonBlock(const Symbol &sym);
 bool IsExtendedListItem(const Symbol &sym);
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 66526ba00b5ed..60ce71cf983f6 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct(
 verbatim("REQUIRES"_tok), Parser{})))
 
 // 2.15.2 Threadprivate directive
-TYPE_PARSER(sourced(construct(
-verbatim("THREADPRIVATE"_tok), parenthesized(Parser{}
+TYPE_PARSER(sourced( //
+construct(
+predicated(OmpDirectiveNameParser{},
+IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >=
+Parser{})))
 
 // 2.11.3 Declarative Allocate directive
 TYPE_PARSER(
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 189a34ee1dc56..db46525ac57b1 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2611,12 +2611,11 @@ class UnparseVisitor {
   }
   void Unparse(const OpenMPThreadprivate &x) {
 BeginOpenMP();
-Word("!$OMP THREADPRIVATE (");
-Walk(std::get(x.t));
-Put(")\n");
+Word("!$OMP ");
+Walk(x.v);
+Put("\n");
 EndOpenMP();
   }
-
   bool Pre(const OmpMessageClause &x) {
 Walk(x.v);
 return false;
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 1ee5385fb38a1..507957df

[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)

2025-09-18 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-semantics

Author: Krzysztof Parzyszek (kparzysz)


Changes

Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some 
semantics-checking functions needed to be updated to operate on a single object 
at a time.

---
Full diff: https://github.com/llvm/llvm-project/pull/159632.diff


9 Files Affected:

- (modified) flang/include/flang/Parser/openmp-utils.h (+1-3) 
- (modified) flang/include/flang/Parser/parse-tree.h (+1-2) 
- (modified) flang/include/flang/Semantics/openmp-utils.h (+2-1) 
- (modified) flang/lib/Parser/openmp-parsers.cpp (+5-2) 
- (modified) flang/lib/Parser/unparse.cpp (+3-4) 
- (modified) flang/lib/Semantics/check-omp-structure.cpp (+48-41) 
- (modified) flang/lib/Semantics/check-omp-structure.h (+3) 
- (modified) flang/lib/Semantics/openmp-utils.cpp (+15-7) 
- (modified) flang/lib/Semantics/resolve-directives.cpp (+8-3) 


``diff
diff --git a/flang/include/flang/Parser/openmp-utils.h 
b/flang/include/flang/Parser/openmp-utils.h
index 032fb8996fe48..1372945427955 100644
--- a/flang/include/flang/Parser/openmp-utils.h
+++ b/flang/include/flang/Parser/openmp-utils.h
@@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, 
D::OMPD_declare_simd);
 MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target);
 MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate);
 MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires);
-MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate);
 
 #undef MAKE_CONSTR_ID
 
@@ -111,8 +110,7 @@ struct DirectiveNameScope {
   std::is_same_v ||
   std::is_same_v ||
   std::is_same_v ||
-  std::is_same_v ||
-  std::is_same_v) {
+  std::is_same_v) {
 return MakeName(std::get(x.t).source, ConstructId::id);
   } else {
 return GetFromTuple(
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 09a45476420df..8cb6d2e744876 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct {
 
 // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list)
 struct OpenMPThreadprivate {
-  TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause]
diff --git a/flang/include/flang/Semantics/openmp-utils.h 
b/flang/include/flang/Semantics/openmp-utils.h
index 68318d6093a1e..65441728c5549 100644
--- a/flang/include/flang/Semantics/openmp-utils.h
+++ b/flang/include/flang/Semantics/openmp-utils.h
@@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const 
parser::OmpObject &object);
 const parser::ArrayElement *GetArrayElementFromObj(
 const parser::OmpObject &object);
 const Symbol *GetObjectSymbol(const parser::OmpObject &object);
-const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
 std::optional GetObjectSource(
 const parser::OmpObject &object);
+const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument);
+const parser::OmpObject *GetArgumentObject(const parser::OmpArgument 
&argument);
 
 bool IsCommonBlock(const Symbol &sym);
 bool IsExtendedListItem(const Symbol &sym);
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 66526ba00b5ed..60ce71cf983f6 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct(
 verbatim("REQUIRES"_tok), Parser{})))
 
 // 2.15.2 Threadprivate directive
-TYPE_PARSER(sourced(construct(
-verbatim("THREADPRIVATE"_tok), parenthesized(Parser{}
+TYPE_PARSER(sourced( //
+construct(
+predicated(OmpDirectiveNameParser{},
+IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >=
+Parser{})))
 
 // 2.11.3 Declarative Allocate directive
 TYPE_PARSER(
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 189a34ee1dc56..db46525ac57b1 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2611,12 +2611,11 @@ class UnparseVisitor {
   }
   void Unparse(const OpenMPThreadprivate &x) {
 BeginOpenMP();
-Word("!$OMP THREADPRIVATE (");
-Walk(std::get(x.t));
-Put(")\n");
+Word("!$OMP ");
+Walk(x.v);
+Put("\n");
 EndOpenMP();
   }
-
   bool Pre(const OmpMessageClause &x) {
 Walk(x.v);
 return false;
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 1ee5385fb38a1..507957dfecb3d 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -669,11 +669,6 @@ template  struct 
DirectiveSpellingVisitor {
 checker_(x.v.DirName().source, Directive::OMPD_groupprivate);
 return false;
   }
-  bool 

[llvm-branch-commits] [llvm] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo (PR #158224)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits


@@ -1070,8 +1070,8 @@ void InstrInfoEmitter::run(raw_ostream &OS) {
   OS << "namespace llvm {\n";
   OS << "struct " << ClassName << " : public TargetInstrInfo {\n"
  << "  explicit " << ClassName
- << "(const TargetSubtargetInfo &STI, unsigned CFSetupOpcode = ~0u, "
-"unsigned CFDestroyOpcode = ~0u, "
+ << "(const TargetSubtargetInfo &STI, const TargetRegisterInfo &TRI, "

arsenm wrote:

The other option I considered was having unique_ptr in the 
generic base class 

https://github.com/llvm/llvm-project/pull/158224
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)

2025-09-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#159641** https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#159637** https://app.graphite.dev/github/pr/llvm/llvm-project/159637?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/159641
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/159645

None

>From 92728fa5d41bd5f6ef63837bcb3ea8e85b7a8764 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 15 Sep 2025 17:49:18 +
Subject: [PATCH] [profcheck][SimplifyCFG] Propagate !prof from `switch` to
 `select`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 86 ---
 .../SimplifyCFG/switch-to-select-two-case.ll  | 72 +---
 2 files changed, 117 insertions(+), 41 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index a1f759dd1df83..276ca89d715f1 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -84,6 +84,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, 
PHINode *&PHI,
 // Helper function that checks if it is possible to transform a switch with 
only
 // two cases (or two cases + default) that produces a result into a select.
 // TODO: Handle switches with more than 2 cases that map to the same result.
+// The branch weights correspond to the provided Condition (i.e. if Condition 
is
+// modified from the original SwitchInst, the caller must adjust the weights)
 static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector,
  Constant *DefaultResult, Value *Condition,
- IRBuilder<> &Builder, const DataLayout &DL) {
+ IRBuilder<> &Builder, const DataLayout &DL,
+ ArrayRef BranchWeights) {
   // If we are selecting between only two cases transform into a simple
   // select or a two-way select if default is possible.
   // Example:
@@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   //   case 20: return 2;   >  %2 = icmp eq i32 %a, 20
   //   default: return 4;  %3 = select i1 %2, i32 2, i32 %1
   // }
+
+  const bool HasBranchWeights =
+  !BranchWeights.empty() && !ProfcheckDisableMetadataFixes;
+
   if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 &&
   ResultVector[1].second.size() == 1) {
 ConstantInt *FirstCase = ResultVector[0].second[0];
@@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
 if (DefaultResult) {
   Value *ValueCompare =
   Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp");
-  SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first,
- DefaultResult, "switch.select");
+  SelectInst *SelectValueInst = cast(Builder.CreateSelect(
+  ValueCompare, ResultVector[1].first, DefaultResult, 
"switch.select"));
+  SelectValue = SelectValueInst;
+  if (HasBranchWeights) {
+// We start with 3 probabilities, where the numerator is the
+// corresponding BranchWeights[i], and the denominator is the sum over
+// BranchWeights. We want the probability and negative probability of
+// Condition == SecondCase.
+assert(BranchWeights.size() == 3);
+setBranchWeights(SelectValueInst, BranchWeights[2],
+ BranchWeights[0] + BranchWeights[1],
+ /*IsExpected=*/false);
+}
 }
 Value *ValueCompare =
 Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp");
-return Builder.CreateSelect(ValueCompare, ResultVector[0].first,
-SelectValue, "switch.select");
+SelectInst *Ret = cast(Builder.CreateSelect(
+ValueCompare, ResultVector[0].first, SelectValue, "switch.select"));
+if (HasBranchWeights) {
+  // We may have had a DefaultResult. Base the position of the first and
+  // second's branch weights accordingly. Also the proability that 
Condition
+  // != FirstCase needs to take that into account.
+  assert(BranchWeights.size() >= 2);
+  size_t FirstCasePos = (Condition != nullptr);
+  size_t SecondCasePos = FirstCasePos + 1;
+  uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0;
+  setBranchWeights(Ret, BranchWeights[FirstCasePos],
+   DefaultCase + BranchWeights[SecondCasePos],
+   /*IsExpected=*/false);
+}
+return Ret;
   }
 
   // Handle the degenerate case where two cases have the same result value.
@@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   Value *And = Builder.CreateAnd(Condition, AndMask);
   Value *Cmp = Builder.CreateICmpEQ(
   And, Constant::getIntegerValue(And->getType(), AndMask));
-  return Builder.CreateSelect(Cmp, ResultVector[0].first,
-  DefaultResult);
+

[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/159645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits


@@ -1,5 +1,5 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -passes=simplifycfg 
-simplifycfg-require-and-preserve-domtree=1 -S | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals
+; RUN: opt < %s -passes=prof-inject,simplifycfg -profcheck-weights-for-test 
-simplifycfg-require-and-preserve-domtree=1 -S | FileCheck %s

mtrofin wrote:

Note: this test is perfect in that it covers all the cases in the change 
(verified with some appropriately - placed `dbgs()`). To avoid cumbersomely 
adding `!prof` everywhere, we're using the feature introduced in the previous 
patch.

https://github.com/llvm/llvm-project/pull/159645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp -- 
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 276ca89d7..f775991b5 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -6357,7 +6357,7 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
 setBranchWeights(SelectValueInst, BranchWeights[2],
  BranchWeights[0] + BranchWeights[1],
  /*IsExpected=*/false);
-}
+  }
 }
 Value *ValueCompare =
 Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp");
@@ -6411,8 +6411,8 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   Value *And = Builder.CreateAnd(Condition, AndMask);
   Value *Cmp = Builder.CreateICmpEQ(
   And, Constant::getIntegerValue(And->getType(), AndMask));
-  SelectInst *Ret = cast(Builder.CreateSelect(Cmp, 
ResultVector[0].first,
-  DefaultResult));
+  SelectInst *Ret = cast(
+  Builder.CreateSelect(Cmp, ResultVector[0].first, DefaultResult));
   if (HasBranchWeights) {
 // We know there's a Default case. We base the resulting branch
 // weights off its probability.

``




https://github.com/llvm/llvm-project/pull/159645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/159645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/159645

>From 6d3342f397d39e366a06eb6bcabddec0b3d5a963 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 15 Sep 2025 17:49:18 +
Subject: [PATCH] [profcheck][SimplifyCFG] Propagate !prof from `switch` to
 `select`

---
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 86 ---
 .../SimplifyCFG/switch-to-select-two-case.ll  | 72 +---
 2 files changed, 117 insertions(+), 41 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index a1f759dd1df83..f775991b5ba41 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -84,6 +84,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, 
PHINode *&PHI,
 // Helper function that checks if it is possible to transform a switch with 
only
 // two cases (or two cases + default) that produces a result into a select.
 // TODO: Handle switches with more than 2 cases that map to the same result.
+// The branch weights correspond to the provided Condition (i.e. if Condition 
is
+// modified from the original SwitchInst, the caller must adjust the weights)
 static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector,
  Constant *DefaultResult, Value *Condition,
- IRBuilder<> &Builder, const DataLayout &DL) {
+ IRBuilder<> &Builder, const DataLayout &DL,
+ ArrayRef BranchWeights) {
   // If we are selecting between only two cases transform into a simple
   // select or a two-way select if default is possible.
   // Example:
@@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   //   case 20: return 2;   >  %2 = icmp eq i32 %a, 20
   //   default: return 4;  %3 = select i1 %2, i32 2, i32 %1
   // }
+
+  const bool HasBranchWeights =
+  !BranchWeights.empty() && !ProfcheckDisableMetadataFixes;
+
   if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 &&
   ResultVector[1].second.size() == 1) {
 ConstantInt *FirstCase = ResultVector[0].second[0];
@@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
 if (DefaultResult) {
   Value *ValueCompare =
   Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp");
-  SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first,
- DefaultResult, "switch.select");
+  SelectInst *SelectValueInst = cast(Builder.CreateSelect(
+  ValueCompare, ResultVector[1].first, DefaultResult, 
"switch.select"));
+  SelectValue = SelectValueInst;
+  if (HasBranchWeights) {
+// We start with 3 probabilities, where the numerator is the
+// corresponding BranchWeights[i], and the denominator is the sum over
+// BranchWeights. We want the probability and negative probability of
+// Condition == SecondCase.
+assert(BranchWeights.size() == 3);
+setBranchWeights(SelectValueInst, BranchWeights[2],
+ BranchWeights[0] + BranchWeights[1],
+ /*IsExpected=*/false);
+  }
 }
 Value *ValueCompare =
 Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp");
-return Builder.CreateSelect(ValueCompare, ResultVector[0].first,
-SelectValue, "switch.select");
+SelectInst *Ret = cast(Builder.CreateSelect(
+ValueCompare, ResultVector[0].first, SelectValue, "switch.select"));
+if (HasBranchWeights) {
+  // We may have had a DefaultResult. Base the position of the first and
+  // second's branch weights accordingly. Also the proability that 
Condition
+  // != FirstCase needs to take that into account.
+  assert(BranchWeights.size() >= 2);
+  size_t FirstCasePos = (Condition != nullptr);
+  size_t SecondCasePos = FirstCasePos + 1;
+  uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0;
+  setBranchWeights(Ret, BranchWeights[FirstCasePos],
+   DefaultCase + BranchWeights[SecondCasePos],
+   /*IsExpected=*/false);
+}
+return Ret;
   }
 
   // Handle the degenerate case where two cases have the same result value.
@@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const 
SwitchCaseResultVectorTy &ResultVector,
   Value *And = Builder.CreateAnd(Condition, AndMask);
   Value *Cmp = Builder.CreateICmpEQ(
   And, Constant::getIntegerValue(And->getType(), AndMask));
-  return Builder.CreateSelect(Cmp, ResultVector[0].first,
-  DefaultResult);
+  Select

[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)

2025-09-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/159641
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)

2025-09-18 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/159655

>From 108bf356e743d36b4eb5d0217720cf47ab85f33f Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Thu, 18 Sep 2025 14:31:38 -0700
Subject: [PATCH 1/2] [HLSL] NonUniformResourceIndex implementation

Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function 
calls
a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to
LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex.

Depends on #159608

Closes #157923
---
 clang/include/clang/Basic/Builtins.td |  6 +++
 clang/lib/CodeGen/CGHLSLBuiltins.cpp  |  7 
 clang/lib/CodeGen/CGHLSLRuntime.h |  2 +
 clang/lib/Headers/hlsl/hlsl_intrinsics.h  | 25 
 .../resources/NonUniformResourceIndex.hlsl| 38 +++
 5 files changed, 78 insertions(+)
 create mode 100644 
clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl

diff --git a/clang/include/clang/Basic/Builtins.td 
b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..96676bd810631 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : 
LangBuiltin<"HLSL_LANG"> {
   let Prototype = "void(...)";
 }
 
+def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> {
+  let Spellings = ["__builtin_hlsl_resource_nonuniformindex"];
+  let Attributes = [NoThrow];
+  let Prototype = "uint32_t(uint32_t)";
+}
+
 def HLSLAll : LangBuiltin<"HLSL_LANG"> {
   let Spellings = ["__builtin_hlsl_all"];
   let Attributes = [NoThrow, Const];
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp 
b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 7b5b924b1fe82..9f87afa5a8a3d 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned 
BuiltinID,
 SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name};
 return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args);
   }
+  case Builtin::BI__builtin_hlsl_resource_nonuniformindex: {
+Value *IndexOp = EmitScalarExpr(E->getArg(0));
+llvm::Type *RetTy = ConvertType(E->getType());
+return Builder.CreateIntrinsic(
+RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(),
+ArrayRef{IndexOp});
+  }
   case Builtin::BI__builtin_hlsl_all: {
 Value *Op0 = EmitScalarExpr(E->getArg(0));
 return Builder.CreateIntrinsic(
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h 
b/clang/lib/CodeGen/CGHLSLRuntime.h
index 370f3d5c5d30d..f4b410664d60c 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -129,6 +129,8 @@ class CGHLSLRuntime {
resource_handlefrombinding)
   GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding,
resource_handlefromimplicitbinding)
+  GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex,
+   resource_nonuniformindex)
   GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter)
   GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync,
group_memory_barrier_with_group_sync)
diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h 
b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
index d9d87c827e6a4..0eab2ff56c519 100644
--- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h
+++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h
@@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) {
   return __detail::d3d_color_to_ubyte4_impl(V);
 }
 
+//===--===//
+// NonUniformResourceIndex builtin
+//===--===//
+
+/// \fn uint NonUniformResourceIndex(uint I)
+/// \brief A compiler hint to indicate that a resource index varies across
+/// threads.
+// / within a wave (i.e., it is non-uniform).
+/// \param I [in] Resource array index
+///
+/// The return value is the \Index parameter.
+///
+/// When indexing into an array of shader resources (e.g., textures, buffers),
+/// some GPU hardware and drivers require the compiler to know whether the 
index
+/// is uniform (same for all threads) or non-uniform (varies per thread).
+///
+/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, .
+/// disabling certain assumptions or optimizations that could lead to incorrect
+/// behavior when dynamically accessing resource arrays with non-uniform
+/// indices.
+
+constexpr uint32_t NonUniformResourceIndex(uint32_t Index) {
+  return __builtin_hlsl_resource_nonuniformindex(Index);
+}
+
 
//===--===//
 // reflect builtin
 
//===--===//
diff --git a/clang/test/CodeGenHLSL/

[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)

2025-09-18 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin approved this pull request.


https://github.com/llvm/llvm-project/pull/158376
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158274

>From 7d3e2fa03f76098b2f4f90a2c4407e18d59423c5 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 9 Sep 2025 11:15:47 +0900
Subject: [PATCH] X86: Switch to RegClassByHwMode

Replace the target uses of PointerLikeRegClass with RegClassByHwMode
---
 .../X86/MCTargetDesc/X86MCTargetDesc.cpp  |  3 ++
 llvm/lib/Target/X86/X86.td|  2 ++
 llvm/lib/Target/X86/X86InstrInfo.td   |  8 ++---
 llvm/lib/Target/X86/X86InstrOperands.td   | 30 +++-
 llvm/lib/Target/X86/X86InstrPredicates.td | 14 
 llvm/lib/Target/X86/X86RegisterInfo.cpp   | 35 +--
 llvm/lib/Target/X86/X86Subtarget.h|  4 +--
 llvm/utils/TableGen/X86FoldTablesEmitter.cpp  |  4 +--
 8 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp 
b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
index bb1e716c33ed5..1d5ef8b0996dc 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
@@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) {
   else
 FS = "-64bit-mode,-32bit-mode,+16bit-mode";
 
+  if (TT.isX32())
+FS += ",+x32";
+
   return FS;
 }
 
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 7c9e821c02fda..3af8b3e060a16 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", 
"true",
"32-bit mode (80386)">;
 def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true",
"16-bit mode (i8086)">;
+def IsX32 : SubtargetFeature<"x32", "IsX32", "true",
+ "64-bit with ILP32 programming model (e.g. x32 
ABI)">;
 
 
//===--===//
 // X86 Subtarget ISA features
diff --git a/llvm/lib/Target/X86/X86InstrInfo.td 
b/llvm/lib/Target/X86/X86InstrInfo.td
index 7f6c5614847e3..0c4abc2c400f6 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.td
+++ b/llvm/lib/Target/X86/X86InstrInfo.td
@@ -18,14 +18,14 @@ include "X86InstrFragments.td"
 include "X86InstrFragmentsSIMD.td"
 
 
//===--===//
-// X86 Operand Definitions.
+// X86 Predicate Definitions.
 //
-include "X86InstrOperands.td"
+include "X86InstrPredicates.td"
 
 
//===--===//
-// X86 Predicate Definitions.
+// X86 Operand Definitions.
 //
-include "X86InstrPredicates.td"
+include "X86InstrOperands.td"
 
 
//===--===//
 // X86 Instruction Format Definitions.
diff --git a/llvm/lib/Target/X86/X86InstrOperands.td 
b/llvm/lib/Target/X86/X86InstrOperands.td
index 80843f6bb80e6..5207ecad127a2 100644
--- a/llvm/lib/Target/X86/X86InstrOperands.td
+++ b/llvm/lib/Target/X86/X86InstrOperands.td
@@ -6,9 +6,15 @@
 //
 
//===--===//
 
+def x86_ptr_rc : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32, GR64, LOW32_ADDR_ACCESS]>;
+
 // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for
 // the index operand of an address, to conform to x86 encoding restrictions.
-def ptr_rc_nosp : PointerLikeRegClass<1>;
+def ptr_rc_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOSP, GR64_NOSP, GR32_NOSP]>;
 
 // *mem - Operand definitions for the funky X86 addressing mode operands.
 //
@@ -53,7 +59,7 @@ class X86MemOperand : Operand {
   let PrintMethod = printMethod;
-  let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, 
SEGMENT_REG);
   let ParserMatchClass = parserMatchClass;
   let OperandType = "OPERAND_MEMORY";
   int Size = size;
@@ -63,7 +69,7 @@ class X86MemOperand
 : X86MemOperand {
-  let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
 }
 
 def anymem : X86MemOperand<"printMemReference">;
@@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", 
X86Mem64AsmOperand>;
 
 // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead
 // of a plain GPR, so that it doesn't potentially require a REX prefix.
-def ptr_rc_norex : PointerLikeRegClass<2>;
-def ptr_rc_norex_nosp : PointerLikeRegClass<3>;
+def ptr_rc_norex : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX, GR64_NOREX, GR32_NOREX]>;
+
+def ptr_rc_norex_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>;
+
 
 def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> {
   let MIOpe

[llvm-branch-commits] [llvm] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (PR #158271)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158271

>From e7ef891fb2c4e21bec4d23af954ad9204f3eb48f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 8 Sep 2025 14:04:59 +0900
Subject: [PATCH] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass

---
 .../Sparc/Disassembler/SparcDisassembler.cpp  |  8 ---
 llvm/lib/Target/Sparc/SparcInstrInfo.td   | 21 +--
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp 
b/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp
index c3d60f3689e1f..e585e5af42d32 100644
--- a/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp
+++ b/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp
@@ -159,14 +159,6 @@ static DecodeStatus DecodeI64RegsRegisterClass(MCInst 
&Inst, unsigned RegNo,
   return DecodeIntRegsRegisterClass(Inst, RegNo, Address, Decoder);
 }
 
-// This is used for the type "ptr_rc", which is either IntRegs or I64Regs
-// depending on SparcRegisterInfo::getPointerRegClass.
-static DecodeStatus DecodePointerLikeRegClass0(MCInst &Inst, unsigned RegNo,
-   uint64_t Address,
-   const MCDisassembler *Decoder) {
-  return DecodeIntRegsRegisterClass(Inst, RegNo, Address, Decoder);
-}
-
 static DecodeStatus DecodeFPRegsRegisterClass(MCInst &Inst, unsigned RegNo,
   uint64_t Address,
   const MCDisassembler *Decoder) {
diff --git a/llvm/lib/Target/Sparc/SparcInstrInfo.td 
b/llvm/lib/Target/Sparc/SparcInstrInfo.td
index 53972d6c105a4..97e7fd7769edb 100644
--- a/llvm/lib/Target/Sparc/SparcInstrInfo.td
+++ b/llvm/lib/Target/Sparc/SparcInstrInfo.td
@@ -95,10 +95,27 @@ def HasFSMULD : Predicate<"!Subtarget->hasNoFSMULD()">;
 // will pick deprecated instructions.
 def UseDeprecatedInsts : Predicate<"Subtarget->useV8DeprecatedInsts()">;
 
+//===--===//
+// HwModes Pattern Stuff
+//===--===//
+
+defvar SPARC32 = DefaultMode;
+def SPARC64 : HwMode<[Is64Bit]>;
+
 
//===--===//
 // Instruction Pattern Stuff
 
//===--===//
 
+def sparc_ptr_rc : RegClassByHwMode<
+  [SPARC32, SPARC64],
+  [IntRegs, I64Regs]>;
+
+// Both cases can use the same decoder method, so avoid the dispatch
+// by hwmode by setting an explicit DecoderMethod
+def ptr_op : RegisterOperand {
+  let DecoderMethod = "DecodeIntRegsRegisterClass";
+}
+
 // FIXME these should have AsmOperandClass.
 def uimm3 : PatLeaf<(imm), [{ return isUInt<3>(N->getZExtValue()); }]>;
 
@@ -178,12 +195,12 @@ def simm13Op : Operand {
 
 def MEMrr : Operand {
   let PrintMethod = "printMemOperand";
-  let MIOperandInfo = (ops ptr_rc, ptr_rc);
+  let MIOperandInfo = (ops ptr_op, ptr_op);
   let ParserMatchClass = SparcMEMrrAsmOperand;
 }
 def MEMri : Operand {
   let PrintMethod = "printMemOperand";
-  let MIOperandInfo = (ops ptr_rc, simm13Op);
+  let MIOperandInfo = (ops ptr_op, simm13Op);
   let ParserMatchClass = SparcMEMriAsmOperand;
 }
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158274

>From 7d3e2fa03f76098b2f4f90a2c4407e18d59423c5 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 9 Sep 2025 11:15:47 +0900
Subject: [PATCH] X86: Switch to RegClassByHwMode

Replace the target uses of PointerLikeRegClass with RegClassByHwMode
---
 .../X86/MCTargetDesc/X86MCTargetDesc.cpp  |  3 ++
 llvm/lib/Target/X86/X86.td|  2 ++
 llvm/lib/Target/X86/X86InstrInfo.td   |  8 ++---
 llvm/lib/Target/X86/X86InstrOperands.td   | 30 +++-
 llvm/lib/Target/X86/X86InstrPredicates.td | 14 
 llvm/lib/Target/X86/X86RegisterInfo.cpp   | 35 +--
 llvm/lib/Target/X86/X86Subtarget.h|  4 +--
 llvm/utils/TableGen/X86FoldTablesEmitter.cpp  |  4 +--
 8 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp 
b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
index bb1e716c33ed5..1d5ef8b0996dc 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
@@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) {
   else
 FS = "-64bit-mode,-32bit-mode,+16bit-mode";
 
+  if (TT.isX32())
+FS += ",+x32";
+
   return FS;
 }
 
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 7c9e821c02fda..3af8b3e060a16 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", 
"true",
"32-bit mode (80386)">;
 def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true",
"16-bit mode (i8086)">;
+def IsX32 : SubtargetFeature<"x32", "IsX32", "true",
+ "64-bit with ILP32 programming model (e.g. x32 
ABI)">;
 
 
//===--===//
 // X86 Subtarget ISA features
diff --git a/llvm/lib/Target/X86/X86InstrInfo.td 
b/llvm/lib/Target/X86/X86InstrInfo.td
index 7f6c5614847e3..0c4abc2c400f6 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.td
+++ b/llvm/lib/Target/X86/X86InstrInfo.td
@@ -18,14 +18,14 @@ include "X86InstrFragments.td"
 include "X86InstrFragmentsSIMD.td"
 
 
//===--===//
-// X86 Operand Definitions.
+// X86 Predicate Definitions.
 //
-include "X86InstrOperands.td"
+include "X86InstrPredicates.td"
 
 
//===--===//
-// X86 Predicate Definitions.
+// X86 Operand Definitions.
 //
-include "X86InstrPredicates.td"
+include "X86InstrOperands.td"
 
 
//===--===//
 // X86 Instruction Format Definitions.
diff --git a/llvm/lib/Target/X86/X86InstrOperands.td 
b/llvm/lib/Target/X86/X86InstrOperands.td
index 80843f6bb80e6..5207ecad127a2 100644
--- a/llvm/lib/Target/X86/X86InstrOperands.td
+++ b/llvm/lib/Target/X86/X86InstrOperands.td
@@ -6,9 +6,15 @@
 //
 
//===--===//
 
+def x86_ptr_rc : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32, GR64, LOW32_ADDR_ACCESS]>;
+
 // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for
 // the index operand of an address, to conform to x86 encoding restrictions.
-def ptr_rc_nosp : PointerLikeRegClass<1>;
+def ptr_rc_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOSP, GR64_NOSP, GR32_NOSP]>;
 
 // *mem - Operand definitions for the funky X86 addressing mode operands.
 //
@@ -53,7 +59,7 @@ class X86MemOperand : Operand {
   let PrintMethod = printMethod;
-  let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, 
SEGMENT_REG);
   let ParserMatchClass = parserMatchClass;
   let OperandType = "OPERAND_MEMORY";
   int Size = size;
@@ -63,7 +69,7 @@ class X86MemOperand
 : X86MemOperand {
-  let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
+  let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG);
 }
 
 def anymem : X86MemOperand<"printMemReference">;
@@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", 
X86Mem64AsmOperand>;
 
 // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead
 // of a plain GPR, so that it doesn't potentially require a REX prefix.
-def ptr_rc_norex : PointerLikeRegClass<2>;
-def ptr_rc_norex_nosp : PointerLikeRegClass<3>;
+def ptr_rc_norex : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX, GR64_NOREX, GR32_NOREX]>;
+
+def ptr_rc_norex_nosp : RegClassByHwMode<
+  [X86_32, X86_64, X86_64_X32],
+  [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>;
+
 
 def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> {
   let MIOpe

[llvm-branch-commits] [llvm] Mips: Switch to RegClassByHwMode (PR #158273)

2025-09-18 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/158273

>From 5b8f38bb56b46b9e63fe2031f9b43e4bbba333fb Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 6 Sep 2025 21:14:45 +0900
Subject: [PATCH 1/3] Mips: Switch to RegClassByHwMode

---
 .../Target/Mips/AsmParser/MipsAsmParser.cpp   |  9 +--
 .../Mips/Disassembler/MipsDisassembler.cpp| 24 +++
 llvm/lib/Target/Mips/MicroMipsInstrInfo.td| 12 +++---
 llvm/lib/Target/Mips/Mips.td  | 15 
 llvm/lib/Target/Mips/MipsInstrInfo.td | 20 +++-
 llvm/lib/Target/Mips/MipsRegisterInfo.cpp | 16 ++---
 llvm/lib/Target/Mips/MipsRegisterInfo.td  | 16 +
 7 files changed, 76 insertions(+), 36 deletions(-)

diff --git a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp 
b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
index 8a5cb517c94c5..ba70c9e6cb9e8 100644
--- a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
+++ b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
@@ -3706,7 +3706,9 @@ void MipsAsmParser::expandMem16Inst(MCInst &Inst, SMLoc 
IDLoc, MCStreamer &Out,
   MCRegister TmpReg = DstReg;
 
   const MCInstrDesc &Desc = MII.get(OpCode);
-  int16_t DstRegClass = Desc.operands()[StartOp].RegClass;
+  int16_t DstRegClass =
+  MII.getOpRegClassID(Desc.operands()[StartOp],
+  STI->getHwMode(MCSubtargetInfo::HwMode_RegInfo));
   unsigned DstRegClassID =
   getContext().getRegisterInfo()->getRegClass(DstRegClass).getID();
   bool IsGPR = (DstRegClassID == Mips::GPR32RegClassID) ||
@@ -3834,7 +3836,10 @@ void MipsAsmParser::expandMem9Inst(MCInst &Inst, SMLoc 
IDLoc, MCStreamer &Out,
   MCRegister TmpReg = DstReg;
 
   const MCInstrDesc &Desc = MII.get(OpCode);
-  int16_t DstRegClass = Desc.operands()[StartOp].RegClass;
+  int16_t DstRegClass =
+  MII.getOpRegClassID(Desc.operands()[StartOp],
+  STI->getHwMode(MCSubtargetInfo::HwMode_RegInfo));
+
   unsigned DstRegClassID =
   getContext().getRegisterInfo()->getRegClass(DstRegClass).getID();
   bool IsGPR = (DstRegClassID == Mips::GPR32RegClassID) ||
diff --git a/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp 
b/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp
index c22b8f61b12dc..705695c74803f 100644
--- a/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp
+++ b/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp
@@ -916,6 +916,30 @@ DecodeGPRMM16MovePRegisterClass(MCInst &Inst, unsigned 
RegNo, uint64_t Address,
   return MCDisassembler::Success;
 }
 
+static DecodeStatus DecodeGP32RegisterClass(MCInst &Inst, unsigned RegNo,
+uint64_t Address,
+const MCDisassembler *Decoder) {
+  llvm_unreachable("this is unused");
+}
+
+static DecodeStatus DecodeGP64RegisterClass(MCInst &Inst, unsigned RegNo,
+uint64_t Address,
+const MCDisassembler *Decoder) {
+  llvm_unreachable("this is unused");
+}
+
+static DecodeStatus DecodeSP32RegisterClass(MCInst &Inst, unsigned RegNo,
+uint64_t Address,
+const MCDisassembler *Decoder) {
+  llvm_unreachable("this is unused");
+}
+
+static DecodeStatus DecodeSP64RegisterClass(MCInst &Inst, unsigned RegNo,
+uint64_t Address,
+const MCDisassembler *Decoder) {
+  llvm_unreachable("this is unused");
+}
+
 static DecodeStatus DecodeGPR32RegisterClass(MCInst &Inst, unsigned RegNo,
  uint64_t Address,
  const MCDisassembler *Decoder) {
diff --git a/llvm/lib/Target/Mips/MicroMipsInstrInfo.td 
b/llvm/lib/Target/Mips/MicroMipsInstrInfo.td
index b3fd8f422f429..b44bf1391b73e 100644
--- a/llvm/lib/Target/Mips/MicroMipsInstrInfo.td
+++ b/llvm/lib/Target/Mips/MicroMipsInstrInfo.td
@@ -57,12 +57,6 @@ def MicroMipsMemGPRMM16AsmOperand : AsmOperandClass {
   let PredicateMethod = "isMemWithGRPMM16Base";
 }
 
-// Define the classes of pointers used by microMIPS.
-// The numbers must match those in MipsRegisterInfo::MipsPtrClass.
-def ptr_gpr16mm_rc : PointerLikeRegClass<1>;
-def ptr_sp_rc : PointerLikeRegClass<2>;
-def ptr_gp_rc : PointerLikeRegClass<3>;
-
 class mem_mm_4_generic : Operand {
   let PrintMethod = "printMemOperand";
   let MIOperandInfo = (ops ptr_gpr16mm_rc, simm4);
@@ -114,7 +108,7 @@ def mem_mm_gp_simm7_lsl2 : Operand {
 
 def mem_mm_9 : Operand {
   let PrintMethod = "printMemOperand";
-  let MIOperandInfo = (ops ptr_rc, simm9);
+  let MIOperandInfo = (ops mips_ptr_rc, simm9);
   let EncoderMethod = "getMemEncodingMMImm9";
   let ParserMatchClass = MipsMemSimmAsmOperand<9>;
   let OperandType = "OPERAND_MEMORY";
@@ -130,7 +124,7 @@ def mem_m

  1   2   >