[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache { SanitizerAnnotateDebugInfo(ArrayRef Ordinals, SanitizerHandler Handler); - /// Emit additional metadata used by the AllocToken instrumentation. + /// Emit metadata used by the AllocToken instrumentation. + llvm::MDNode *EmitAllocTokenHint(QualType AllocType); melver wrote: Yes, LLVM permits sharing MD nodes - MDNode::get should intern nodes, although if you want to skip the whole calculation involved, that needs introducing a separate lookup table. Changing it to buildAllocToken. https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
@@ -5760,6 +5764,24 @@ bool Sema::BuiltinAllocaWithAlign(CallExpr *TheCall) { return false; } +bool Sema::BuiltinAllocTokenInfer(CallExpr *TheCall) { melver wrote: I'm indifferent here. Switching to a static function. https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
joker-eph wrote: > > That isn't in MLIR right now, so that's not generally usable. > > I've added `complex.powi -> complex.pow` conversion to the > `ComplexToStandard` MLIR pass. Thanks, LG! https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)
deaklajos wrote: @vitalybuka https://github.com/llvm/llvm-project/pull/159551 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: MC: Better handle backslash-escaped symbols (PR #159420)
nikic wrote: The diff here is fairly large, but also very mechanical. This fixes a regression for the Rust defmt crate with LLVM 21. https://github.com/llvm/llvm-project/pull/159420 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] CodeGen: Emit .prefalign directives based on the prefalign attribute. (PR #155529)
https://github.com/efriedma-quic commented: Can you split "implement basic codegen support for prefalign" (the bits which don't depend on the .prefalign directive) into a separate patch? It's not clear what's causing the test changes here. https://github.com/llvm/llvm-project/pull/155529 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][docs] Fix implicit-int-conversion-on-negation typos (PR #156815)
github-actions[bot] wrote: @correctmost (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/156815 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
https://github.com/tobias-stadler updated https://github.com/llvm/llvm-project/pull/156715 >From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001 From: Tobias Stadler Date: Thu, 18 Sep 2025 12:34:55 +0100 Subject: [PATCH 1/2] Replace bitstream blobs by yaml Created using spr 1.3.7-wip --- llvm/lib/Remarks/BitstreamRemarkParser.cpp| 5 +- .../dsymutil/ARM/remarks-linking-bundle.test | 13 +- .../basic1.macho.remarks.arm64.opt.bitstream | Bin 824 -> 0 bytes .../basic1.macho.remarks.arm64.opt.yaml | 47 + ...c1.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic2.macho.remarks.arm64.opt.bitstream | Bin 1696 -> 0 bytes .../basic2.macho.remarks.arm64.opt.yaml | 194 ++ ...c2.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic3.macho.remarks.arm64.opt.bitstream | Bin 1500 -> 0 bytes .../basic3.macho.remarks.arm64.opt.yaml | 181 ...c3.macho.remarks.empty.arm64.opt.bitstream | 0 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes .../remarks/fat.macho.remarks.x86_64.opt.yaml | 53 + .../fat.macho.remarks.x86_64h.opt.bitstream | Bin 820 -> 0 bytes .../fat.macho.remarks.x86_64h.opt.yaml| 53 + .../X86/remarks-linking-fat-bundle.test | 8 +- 16 files changed, 543 insertions(+), 11 deletions(-) delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp b/llvm/lib/Remarks/BitstreamRemarkParser.cpp index 63b16bd2df0ec..2b27a0f661d88 100644 --- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp +++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp @@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() { return E; if (ContainerType != BitstreamRemarkContainerType::RemarksFile) -return error( -"Error while parsing external file's BLOCK_META: wrong container " -"type."); +return ParserHelper->MetaHelper.error( +"Wrong container type in external file."); return Error::success(); } diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test index 09a60d7d044c6..e1b04455b0d9d 100644 --- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test +++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test @@ -1,22 +1,25 @@ RUN: rm -rf %t -RUN: mkdir -p %t +RUN: mkdir -p %t/private/tmp/remarks RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> %t/basic.macho.remarks.arm64 +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream -RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64 +RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t %t/basic.macho.remarks.arm64 Check that the remark file in the bundle exists and is sane: RUN: llvm-bcanalyzer -dump %t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64 | FileCheck %s -RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.r
[llvm-branch-commits] [llvm] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) (PR #142391)
https://github.com/MacDue updated https://github.com/llvm/llvm-project/pull/142391 >From 0dfb0725e2a4f82af47821946bfbbfcd7ed08e10 Mon Sep 17 00:00:00 2001 From: Benjamin Maxwell Date: Thu, 8 May 2025 17:38:27 + Subject: [PATCH] [AArch64] Prepare for split ZPR and PPR area allocation (NFCI) This patch attempts to refactor AArch64FrameLowering to allow the size of the ZPR and PPR areas to be calculated separately. This will be used by a subsequent patch to support allocating ZPRs and PPRs to separate areas. This patch should be an NFC and is split out to make later functional changes easier to spot. --- .../Target/AArch64/AArch64FrameLowering.cpp | 220 ++ .../lib/Target/AArch64/AArch64FrameLowering.h | 20 +- .../AArch64/AArch64MachineFunctionInfo.cpp| 20 +- .../AArch64/AArch64MachineFunctionInfo.h | 63 ++--- .../AArch64/AArch64PrologueEpilogue.cpp | 128 ++ .../Target/AArch64/AArch64RegisterInfo.cpp| 4 +- .../DebugInfo/AArch64/asan-stack-vars.mir | 3 +- .../compiler-gen-bbs-livedebugvalues.mir | 3 +- 8 files changed, 288 insertions(+), 173 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 20b0d697827c5..f5f7b6522ddec 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -324,6 +324,36 @@ AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF, static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF); +enum class AssignObjectOffsets { No, Yes }; +/// Process all the SVE stack objects and the SVE stack size and offsets for +/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE +/// stack sizes set). Returns the size of the SVE stack. +static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, +AssignObjectOffsets AssignOffsets, +bool SplitSVEObjects = false); + +static unsigned getStackHazardSize(const MachineFunction &MF) { + return MF.getSubtarget().getStreamingHazardSize(); +} + +/// Returns true if PPRs are spilled as ZPRs. +static bool arePPRsSpilledAsZPR(const MachineFunction &MF) { + return MF.getSubtarget().getRegisterInfo()->getSpillSize( + AArch64::PPRRegClass) == 16; +} + +StackOffset +AArch64FrameLowering::getZPRStackSize(const MachineFunction &MF) const { + const AArch64FunctionInfo *AFI = MF.getInfo(); + return StackOffset::getScalable(AFI->getStackSizeZPR()); +} + +StackOffset +AArch64FrameLowering::getPPRStackSize(const MachineFunction &MF) const { + const AArch64FunctionInfo *AFI = MF.getInfo(); + return StackOffset::getScalable(AFI->getStackSizePPR()); +} + // Conservatively, returns true if the function is likely to have SVE vectors // on the stack. This function is safe to be called before callee-saves or // object offsets have been determined. @@ -482,13 +512,6 @@ AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF, } } -/// Returns the size of the entire SVE stackframe (calleesaves + spills). -StackOffset -AArch64FrameLowering::getSVEStackSize(const MachineFunction &MF) const { - const AArch64FunctionInfo *AFI = MF.getInfo(); - return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE()); -} - bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const { if (!EnableRedZone) return false; @@ -514,7 +537,7 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const { !Subtarget.hasSVE(); return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize || - getSVEStackSize(MF) || LowerQRegCopyThroughMem); + AFI->hasSVEStackSize() || LowerQRegCopyThroughMem); } /// hasFPImpl - Return true if the specified function should have a dedicated @@ -557,7 +580,7 @@ bool AArch64FrameLowering::hasFPImpl(const MachineFunction &MF) const { // CFA in either of these cases. if (AFI.needsDwarfUnwindInfo(MF) && ((requiresSaveVG(MF) || AFI.getSMEFnAttrs().hasStreamingBody()) && - (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0))) + (!AFI.hasCalculatedStackSizeSVE() || AFI.hasSVEStackSize( return true; // With large callframes around we may need to use FP to access the scavenging // emergency spillslot. @@ -1126,10 +1149,6 @@ static bool isTargetWindows(const MachineFunction &MF) { return MF.getSubtarget().isTargetWindows(); } -static unsigned getStackHazardSize(const MachineFunction &MF) { - return MF.getSubtarget().getStreamingHazardSize(); -} - void AArch64FrameLowering::emitPacRetPlusLeafHardening( MachineFunction &MF) const { const AArch64Subtarget &Subtarget = MF.getSubtarget(); @@ -1212,7 +1231,9 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/vzakhari commented: LGTM with some final comments. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -1272,7 +1272,18 @@ mlir::Value genMathOp(fir::FirOpBuilder &builder, mlir::Location loc, LLVM_DEBUG(llvm::dbgs() << "Generating '" << mathLibFuncName << "' operation with type "; mathLibFuncType.dump(); llvm::dbgs() << "\n"); -result = T::create(builder, loc, args); +if constexpr (std::is_same_v) { + auto resultType = mathLibFuncType.getResult(0); + result = T::create(builder, loc, resultType, args); +} else if constexpr (std::is_same_v) { + auto resultType = mathLibFuncType.getResult(0); + auto fmfAttr = mlir::arith::FastMathFlagsAttr::get( + builder.getContext(), builder.getFastMathFlags()); + result = builder.create(loc, resultType, args[0], + args[1], fmfAttr); +} else { vzakhari wrote: Do we really need all this code? I believe just a simple `T::create(buider, loc, args)` should work, because of the type constraints in the operations definitions. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -175,12 +176,20 @@ PowIStrengthReduction::matchAndRewrite( Value one; Type opType = getElementTypeOrSelf(op.getType()); - if constexpr (std::is_same_v) + if constexpr (std::is_same_v) { one = arith::ConstantOp::create(rewriter, loc, rewriter.getFloatAttr(opType, 1.0)); - else + } else if constexpr (std::is_same_v) { +auto complexTy = cast(opType); +Type elementType = complexTy.getElementType(); +auto realPart = rewriter.getFloatAttr(elementType, 1.0); +auto imagPart = rewriter.getFloatAttr(elementType, 0.0); +one = rewriter.create( vzakhari wrote: I believe all the `create` methods of the rewriter will become deprecated soon, so `complex::ConstantOp::create` is a better alternative. There are other cases below. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/vzakhari edited https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (PR #158246)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/158246 This is special for the same reason av_mov_b64_imm_pseudo is special. >From e5032294b4979c4b7f2367cee30c24d42901714b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 5 Sep 2025 17:27:37 +0900 Subject: [PATCH] AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass This is special for the same reason av_mov_b64_imm_pseudo is special. --- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | 8 +++- llvm/lib/Target/AMDGPU/SIInstrInfo.h | 6 -- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 5c3340703ba3b..b1a61886802f4 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -5976,8 +5976,7 @@ SIInstrInfo::getWholeWaveFunctionSetup(MachineFunction &MF) const { static const TargetRegisterClass * adjustAllocatableRegClass(const GCNSubtarget &ST, const SIRegisterInfo &RI, const MCInstrDesc &TID, unsigned RCID) { - if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore()) && -!(TID.TSFlags & SIInstrFlags::Spill { + if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore() { switch (RCID) { case AMDGPU::AV_32RegClassID: RCID = AMDGPU::VGPR_32RegClassID; @@ -6012,10 +6011,9 @@ const TargetRegisterClass *SIInstrInfo::getRegClass(const MCInstrDesc &TID, if (OpNum >= TID.getNumOperands()) return nullptr; auto RegClass = TID.operands()[OpNum].RegClass; - if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO) { -// Special pseudos have no alignment requirement + // Special pseudos have no alignment requirement + if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO || isSpill(TID)) return RI.getRegClass(RegClass); - } return adjustAllocatableRegClass(ST, RI, TID, RegClass); } diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h index f7dde2b90b68e..e0373e7768435 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h @@ -797,10 +797,12 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo { return get(Opcode).TSFlags & SIInstrFlags::Spill; } - static bool isSpill(const MachineInstr &MI) { -return MI.getDesc().TSFlags & SIInstrFlags::Spill; + static bool isSpill(const MCInstrDesc &Desc) { +return Desc.TSFlags & SIInstrFlags::Spill; } + static bool isSpill(const MachineInstr &MI) { return isSpill(MI.getDesc()); } + static bool isWWMRegSpillOpcode(uint16_t Opcode) { return Opcode == AMDGPU::SI_SPILL_WWM_V32_SAVE || Opcode == AMDGPU::SI_SPILL_WWM_AV32_SAVE || ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo (PR #158224)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/158224 Both conceptually belong to the same subtarget, so it should not be necessary to pass in the context TargetRegisterInfo to any TargetInstrInfo member. Add this reference so those superfluous arguments can be removed. Most targets placed their TargetRegisterInfo as a member in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo, so unify all targets to look the same. >From 532af14dba99fbaf1ccfbd4ac63e22fce9aa371b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 12 Sep 2025 14:11:48 +0900 Subject: [PATCH] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo Both conceptually belong to the same subtarget, so it should not be necessary to pass in the context TargetRegisterInfo to any TargetInstrInfo member. Add this reference so those superfluous arguments can be removed. Most targets placed their TargetRegisterInfo as a member in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo, so unify all targets to look the same. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 11 ++- llvm/lib/CodeGen/TargetInstrInfo.cpp | 68 --- llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | 2 +- llvm/lib/Target/AMDGPU/R600InstrInfo.cpp | 2 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 3 +- llvm/lib/Target/ARC/ARCInstrInfo.cpp | 3 +- llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp | 5 +- llvm/lib/Target/ARM/ARMBaseInstrInfo.h| 9 ++- llvm/lib/Target/ARM/ARMInstrInfo.cpp | 3 +- llvm/lib/Target/ARM/ARMInstrInfo.h| 2 +- llvm/lib/Target/ARM/Thumb1InstrInfo.cpp | 2 +- llvm/lib/Target/ARM/Thumb1InstrInfo.h | 2 +- llvm/lib/Target/ARM/Thumb2InstrInfo.cpp | 2 +- llvm/lib/Target/ARM/Thumb2InstrInfo.h | 2 +- llvm/lib/Target/AVR/AVRInstrInfo.cpp | 4 +- llvm/lib/Target/BPF/BPFInstrInfo.cpp | 2 +- llvm/lib/Target/CSKY/CSKYInstrInfo.cpp| 2 +- llvm/lib/Target/DirectX/DirectXInstrInfo.cpp | 2 +- llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp | 4 +- llvm/lib/Target/Hexagon/HexagonInstrInfo.h| 5 ++ llvm/lib/Target/Hexagon/HexagonSubtarget.cpp | 3 +- llvm/lib/Target/Hexagon/HexagonSubtarget.h| 3 +- llvm/lib/Target/Lanai/LanaiInstrInfo.cpp | 3 +- .../Target/LoongArch/LoongArchInstrInfo.cpp | 4 +- .../lib/Target/LoongArch/LoongArchInstrInfo.h | 4 ++ .../Target/LoongArch/LoongArchSubtarget.cpp | 2 +- .../lib/Target/LoongArch/LoongArchSubtarget.h | 3 +- llvm/lib/Target/MSP430/MSP430InstrInfo.cpp| 3 +- llvm/lib/Target/Mips/Mips16InstrInfo.cpp | 6 +- llvm/lib/Target/Mips/Mips16InstrInfo.h| 2 +- llvm/lib/Target/Mips/MipsInstrInfo.cpp| 5 +- llvm/lib/Target/Mips/MipsInstrInfo.h | 8 ++- llvm/lib/Target/Mips/MipsSEInstrInfo.cpp | 6 +- llvm/lib/Target/Mips/MipsSEInstrInfo.h| 2 +- llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp | 2 +- llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | 2 +- llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | 5 +- llvm/lib/Target/RISCV/RISCVInstrInfo.h| 3 + llvm/lib/Target/RISCV/RISCVSubtarget.cpp | 2 +- llvm/lib/Target/RISCV/RISCVSubtarget.h| 3 +- llvm/lib/Target/SPIRV/SPIRVInstrInfo.cpp | 2 +- llvm/lib/Target/Sparc/SparcInstrInfo.cpp | 4 +- llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp | 2 +- llvm/lib/Target/VE/VEInstrInfo.cpp| 2 +- .../WebAssembly/WebAssemblyInstrInfo.cpp | 2 +- llvm/lib/Target/X86/X86InstrInfo.cpp | 2 +- llvm/lib/Target/XCore/XCoreInstrInfo.cpp | 2 +- llvm/lib/Target/Xtensa/XtensaInstrInfo.cpp| 3 +- llvm/unittests/CodeGen/MFCommon.inc | 4 +- llvm/utils/TableGen/InstrInfoEmitter.cpp | 12 ++-- 50 files changed, 127 insertions(+), 114 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index 6a624a7052cdd..802cca6022074 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -113,9 +113,12 @@ struct ExtAddrMode { /// class LLVM_ABI TargetInstrInfo : public MCInstrInfo { protected: - TargetInstrInfo(unsigned CFSetupOpcode = ~0u, unsigned CFDestroyOpcode = ~0u, - unsigned CatchRetOpcode = ~0u, unsigned ReturnOpcode = ~0u) - : CallFrameSetupOpcode(CFSetupOpcode), + const TargetRegisterInfo &TRI; + + TargetInstrInfo(const TargetRegisterInfo &TRI, unsigned CFSetupOpcode = ~0u, + unsigned CFDestroyOpcode = ~0u, unsigned CatchRetOpcode = ~0u, + unsigned ReturnOpcode = ~0u) + : TRI(TRI), CallFrameSetupOpcode(CFSetupOpcode), CallFrameDestroyOpcode(CFDestroyOpcode), CatchRetOpcode(CatchRetOpcode), ReturnOpcode(ReturnOpcode) {} @@ -124,6 +127,8 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo { TargetInstrInfo &operator=(co
[llvm-branch-commits] [compiler-rt] Backport AArch64 sanitizer fixes to 21.x. (PR #157848)
https://github.com/mgorny milestoned https://github.com/llvm/llvm-project/pull/157848 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/TIFitis updated https://github.com/llvm/llvm-project/pull/158722 >From 6976910364aa2fe18603aefcb27b10bd0120513d Mon Sep 17 00:00:00 2001 From: Akash Banerjee Date: Mon, 15 Sep 2025 20:35:29 +0100 Subject: [PATCH 1/7] Add complex.powi op. --- flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 20 ++-- .../Transforms/ConvertComplexPow.cpp | 94 +-- flang/test/Lower/HLFIR/binary-ops.f90 | 2 +- .../test/Lower/Intrinsics/pow_complex16i.f90 | 2 +- .../test/Lower/Intrinsics/pow_complex16k.f90 | 2 +- flang/test/Lower/amdgcn-complex.f90 | 9 ++ flang/test/Lower/power-operator.f90 | 9 +- .../mlir/Dialect/Complex/IR/ComplexOps.td | 26 + .../ComplexToROCDLLibraryCalls.cpp| 41 +++- .../Transforms/AlgebraicSimplification.cpp| 24 +++-- .../Dialect/Math/Transforms/CMakeLists.txt| 1 + .../complex-to-rocdl-library-calls.mlir | 14 +++ mlir/test/Dialect/Complex/powi-simplify.mlir | 20 13 files changed, 188 insertions(+), 76 deletions(-) create mode 100644 mlir/test/Dialect/Complex/powi-simplify.mlir diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp index 466458c05dba7..74a4e8f85c8ff 100644 --- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp +++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp @@ -1331,14 +1331,20 @@ mlir::Value genComplexPow(fir::FirOpBuilder &builder, mlir::Location loc, return genLibCall(builder, loc, mathOp, mathLibFuncType, args); auto complexTy = mlir::cast(mathLibFuncType.getInput(0)); mlir::Value exp = args[1]; - if (!mlir::isa(exp.getType())) { -auto realTy = complexTy.getElementType(); -mlir::Value realExp = builder.createConvert(loc, realTy, exp); -mlir::Value zero = builder.createRealConstant(loc, realTy, 0); -exp = -builder.create(loc, complexTy, realExp, zero); + mlir::Value result; + if (mlir::isa(exp.getType()) || + mlir::isa(exp.getType())) { +result = builder.create(loc, args[0], exp); + } else { +if (!mlir::isa(exp.getType())) { + auto realTy = complexTy.getElementType(); + mlir::Value realExp = builder.createConvert(loc, realTy, exp); + mlir::Value zero = builder.createRealConstant(loc, realTy, 0); + exp = builder.create(loc, complexTy, realExp, +zero); +} +result = builder.create(loc, args[0], exp); } - mlir::Value result = builder.create(loc, args[0], exp); result = builder.createConvert(loc, mathLibFuncType.getResult(0), result); return result; } diff --git a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp index 78f9d9e4f639a..d76451459def9 100644 --- a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp +++ b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp @@ -58,63 +58,57 @@ void ConvertComplexPowPass::runOnOperation() { ModuleOp mod = getOperation(); fir::FirOpBuilder builder(mod, fir::getKindMapping(mod)); - mod.walk([&](complex::PowOp op) { + mod.walk([&](complex::PowiOp op) { builder.setInsertionPoint(op); Location loc = op.getLoc(); auto complexTy = cast(op.getType()); auto elemTy = complexTy.getElementType(); - Value base = op.getLhs(); -Value rhs = op.getRhs(); - -Value intExp; -if (auto create = rhs.getDefiningOp()) { - if (isZero(create.getImaginary())) { -if (auto conv = create.getReal().getDefiningOp()) { - if (auto intTy = dyn_cast(conv.getValue().getType())) -intExp = conv.getValue(); -} - } -} - +Value intExp = op.getRhs(); func::FuncOp callee; -SmallVector args; -if (intExp) { - unsigned realBits = cast(elemTy).getWidth(); - unsigned intBits = cast(intExp.getType()).getWidth(); - auto funcTy = builder.getFunctionType( - {complexTy, builder.getIntegerType(intBits)}, {complexTy}); - if (realBits == 32 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); - else if (realBits == 32 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); - else if (realBits == 64 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); - else if (realBits == 64 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); - else if (realBits == 128 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); - else if (realBits == 128 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); - else -return; - args = {base, intExp}; -} else { - unsigned realBits = cast(elemTy).getWidth(); - auto funcTy = - builder.getFunctionType({complexTy, complexTy}, {complexTy}); -
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -1272,7 +1272,18 @@ mlir::Value genMathOp(fir::FirOpBuilder &builder, mlir::Location loc, LLVM_DEBUG(llvm::dbgs() << "Generating '" << mathLibFuncName << "' operation with type "; mathLibFuncType.dump(); llvm::dbgs() << "\n"); -result = T::create(builder, loc, args); +if constexpr (std::is_same_v) { + auto resultType = mathLibFuncType.getResult(0); + result = T::create(builder, loc, resultType, args); +} else if constexpr (std::is_same_v) { + auto resultType = mathLibFuncType.getResult(0); + auto fmfAttr = mlir::arith::FastMathFlagsAttr::get( + builder.getContext(), builder.getFastMathFlags()); + result = builder.create(loc, resultType, args[0], + args[1], fmfAttr); +} else { TIFitis wrote: You're right, I've simplified it. Thanks for catching. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -175,12 +176,20 @@ PowIStrengthReduction::matchAndRewrite( Value one; Type opType = getElementTypeOrSelf(op.getType()); - if constexpr (std::is_same_v) + if constexpr (std::is_same_v) { one = arith::ConstantOp::create(rewriter, loc, rewriter.getFloatAttr(opType, 1.0)); - else + } else if constexpr (std::is_same_v) { +auto complexTy = cast(opType); +Type elementType = complexTy.getElementType(); +auto realPart = rewriter.getFloatAttr(elementType, 1.0); +auto imagPart = rewriter.getFloatAttr(elementType, 0.0); +one = rewriter.create( TIFitis wrote: Done. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)
https://github.com/jdenny-ornl edited https://github.com/llvm/llvm-project/pull/157754 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)
github-actions[bot] wrote: ⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo. Please turn off [Keep my email addresses private](https://github.com/settings/emails) setting in your account. See [LLVM Developer Policy](https://llvm.org/docs/DeveloperPolicy.html#email-addresses) and [LLVM Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it) for more information. https://github.com/llvm/llvm-project/pull/159551 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -1274,6 +1274,12 @@ def AllocaWithAlignUninitialized : Builtin { let Prototype = "void*(size_t, _Constant size_t)"; } +def AllocTokenInfer : Builtin { + let Spellings = ["__builtin_alloc_token_infer"]; melver wrote: Renaming to __builtin_infer_alloc_token https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156840 >From 14c75441e84aa32e4f5876598b9a2c59d4ecbe65 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 21:32:21 +0200 Subject: [PATCH 1/2] fixup! fix for incomplete types Created using spr 1.3.8-beta.1 --- clang/lib/CodeGen/CGExpr.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 288b41bc42203..455de644daf00 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -1289,6 +1289,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, // Check if QualType contains a pointer. Implements a simple DFS to // recursively check if a type contains a pointer type. llvm::SmallPtrSet VisitedRD; + bool IncompleteType = false; auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool { QualType CanonicalType = T.getCanonicalType(); if (CanonicalType->isPointerType()) @@ -1312,6 +1313,10 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, return self(self, AT->getElementType()); // The type is a struct, class, or union. if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) { + if (!RD->isCompleteDefinition()) { +IncompleteType = true; +return false; + } if (!VisitedRD.insert(RD).second) return false; // already visited // Check all fields. @@ -1333,6 +1338,8 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, return false; }; const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType); + if (!ContainsPtr && IncompleteType) +return nullptr; auto *ContainsPtrC = Builder.getInt1(ContainsPtr); auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC); >From 7f706618ddc40375d4085bc2ebe03f02ec78823a Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 21:58:01 +0200 Subject: [PATCH 2/2] fixup! Created using spr 1.3.8-beta.1 --- clang/lib/CodeGen/CGExpr.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 455de644daf00..e7a0e7696e204 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -1339,7 +1339,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, }; const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType); if (!ContainsPtr && IncompleteType) -return nullptr; +return; auto *ContainsPtrC = Builder.getInt1(ContainsPtr); auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156842 >From 48227c8f7712b2dc807b252d18353c91905b1fb5 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 17:19:04 +0200 Subject: [PATCH] fixup! Created using spr 1.3.8-beta.1 --- llvm/lib/Transforms/Instrumentation/AllocToken.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp index d5ac3035df71b..3a28705d87523 100644 --- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp +++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp @@ -151,7 +151,8 @@ STATISTIC(NumAllocations, "Allocations found"); /// Expected format is: !{, } MDNode *getAllocTokenHintMetadata(const CallBase &CB) { MDNode *Ret = nullptr; - if (auto *II = dyn_cast(&CB)) { + if (auto *II = dyn_cast(&CB); + II && II->getIntrinsicID() == Intrinsic::alloc_token_id) { auto *MDV = cast(II->getArgOperand(0)); Ret = cast(MDV->getMetadata()); // If the intrinsic has an empty MDNode, type inference failed. @@ -358,7 +359,7 @@ bool AllocToken::instrumentFunction(Function &F) { // Collect all allocation calls to avoid iterator invalidation. for (Instruction &I : instructions(F)) { // Collect all alloc_token_* intrinsics. -if (IntrinsicInst *II = dyn_cast(&I); +if (auto *II = dyn_cast(&I); II && II->getIntrinsicID() == Intrinsic::alloc_token_id) { IntrinsicInsts.emplace_back(II); continue; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156842 >From 48227c8f7712b2dc807b252d18353c91905b1fb5 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 17:19:04 +0200 Subject: [PATCH] fixup! Created using spr 1.3.8-beta.1 --- llvm/lib/Transforms/Instrumentation/AllocToken.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp index d5ac3035df71b..3a28705d87523 100644 --- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp +++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp @@ -151,7 +151,8 @@ STATISTIC(NumAllocations, "Allocations found"); /// Expected format is: !{, } MDNode *getAllocTokenHintMetadata(const CallBase &CB) { MDNode *Ret = nullptr; - if (auto *II = dyn_cast(&CB)) { + if (auto *II = dyn_cast(&CB); + II && II->getIntrinsicID() == Intrinsic::alloc_token_id) { auto *MDV = cast(II->getArgOperand(0)); Ret = cast(MDV->getMetadata()); // If the intrinsic has an empty MDNode, type inference failed. @@ -358,7 +359,7 @@ bool AllocToken::instrumentFunction(Function &F) { // Collect all allocation calls to avoid iterator invalidation. for (Instruction &I : instructions(F)) { // Collect all alloc_token_* intrinsics. -if (IntrinsicInst *II = dyn_cast(&I); +if (auto *II = dyn_cast(&I); II && II->getIntrinsicID() == Intrinsic::alloc_token_id) { IntrinsicInsts.emplace_back(II); continue; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156839 >From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Thu, 4 Sep 2025 12:07:26 +0200 Subject: [PATCH] fixup! Insert AllocToken into index.rst Created using spr 1.3.8-beta.1 --- clang/docs/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/clang/docs/index.rst b/clang/docs/index.rst index be654af57f890..aa2b3a73dc11b 100644 --- a/clang/docs/index.rst +++ b/clang/docs/index.rst @@ -40,6 +40,7 @@ Using Clang as a Compiler SanitizerCoverage SanitizerStats SanitizerSpecialCaseList + AllocToken BoundsSafety BoundsSafetyAdoptionGuide BoundsSafetyImplPlans ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156839 >From b3653330c2c39ebaa094670f11afb0f9d36b9de2 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Thu, 4 Sep 2025 12:07:26 +0200 Subject: [PATCH] fixup! Insert AllocToken into index.rst Created using spr 1.3.8-beta.1 --- clang/docs/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/clang/docs/index.rst b/clang/docs/index.rst index be654af57f890..aa2b3a73dc11b 100644 --- a/clang/docs/index.rst +++ b/clang/docs/index.rst @@ -40,6 +40,7 @@ Using Clang as a Compiler SanitizerCoverage SanitizerStats SanitizerSpecialCaseList + AllocToken BoundsSafety BoundsSafetyAdoptionGuide BoundsSafetyImplPlans ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoongArch] Generate [x]vldi instructions with special constant splats (PR #159258)
https://github.com/ylzsx updated https://github.com/llvm/llvm-project/pull/159258 >From e1a23dd6e31734b05af239bb827a280d403564ee Mon Sep 17 00:00:00 2001 From: yangzhaoxin Date: Wed, 17 Sep 2025 10:20:46 +0800 Subject: [PATCH 1/3] [LoongArch] Generate [x]vldi instructions with special constant splats --- .../LoongArch/LoongArchISelDAGToDAG.cpp | 52 +++ .../LoongArch/LoongArchISelLowering.cpp | 87 ++- .../Target/LoongArch/LoongArchISelLowering.h | 5 ++ .../CodeGen/LoongArch/lasx/build-vector.ll| 80 + .../lasx/fdiv-reciprocal-estimate.ll | 87 +++ .../lasx/fsqrt-reciprocal-estimate.ll | 39 +++-- llvm/test/CodeGen/LoongArch/lasx/fsqrt.ll | 3 +- .../LoongArch/lasx/ir-instruction/fdiv.ll | 3 +- llvm/test/CodeGen/LoongArch/lasx/vselect.ll | 31 +++ .../CodeGen/LoongArch/lsx/build-vector.ll | 77 +--- .../LoongArch/lsx/fdiv-reciprocal-estimate.ll | 87 +++ .../lsx/fsqrt-reciprocal-estimate.ll | 70 +-- llvm/test/CodeGen/LoongArch/lsx/fsqrt.ll | 3 +- .../LoongArch/lsx/ir-instruction/fdiv.ll | 3 +- llvm/test/CodeGen/LoongArch/lsx/vselect.ll| 31 +++ 15 files changed, 289 insertions(+), 369 deletions(-) diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp index 07e722b9a6591..fda313e693760 100644 --- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp @@ -113,10 +113,11 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { APInt SplatValue, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; -unsigned Op; +unsigned Op = 0; EVT ResTy = BVN->getValueType(0); bool Is128Vec = BVN->getValueType(0).is128BitVector(); bool Is256Vec = BVN->getValueType(0).is256BitVector(); +SDNode *Res; if (!Subtarget->hasExtLSX() || (!Is128Vec && !Is256Vec)) break; @@ -124,26 +125,25 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { HasAnyUndefs, 8)) break; -switch (SplatBitSize) { -default: - break; -case 8: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B; - break; -case 16: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H; - break; -case 32: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W; - break; -case 64: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D; - break; -} - -SDNode *Res; // If we have a signed 10 bit integer, we can splat it directly. if (SplatValue.isSignedIntN(10)) { + switch (SplatBitSize) { + default: +break; + case 8: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B; +break; + case 16: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H; +break; + case 32: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W; +break; + case 64: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D; +break; + } + EVT EleType = ResTy.getVectorElementType(); APInt Val = SplatValue.sextOrTrunc(EleType.getSizeInBits()); SDValue Imm = CurDAG->getTargetConstant(Val, DL, EleType); @@ -151,6 +151,20 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { ReplaceNode(Node, Res); return; } + +// Select appropriate [x]vldi instructions for some special constant splats, +// where the immediate value `imm[12] == 1` for used [x]vldi instructions. +std::pair ConvertVLDI = +LoongArchTargetLowering::isImmVLDILegalForMode1(SplatValue, +SplatBitSize); +if (ConvertVLDI.first) { + Op = Is256Vec ? LoongArch::XVLDI : LoongArch::VLDI; + SDValue Imm = CurDAG->getSignedTargetConstant( + SignExtend32<13>(ConvertVLDI.second), DL, MVT::i32); + Res = CurDAG->getMachineNode(Op, DL, ResTy, Imm); + ReplaceNode(Node, Res); + return; +} break; } } diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp index e8668860c2b38..460e2d7c87af7 100644 --- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp @@ -2679,9 +2679,10 @@ SDValue LoongArchTargetLowering::lowerBUILD_VECTOR(SDValue Op, if (SplatBitSize == 64 && !Subtarget.is64Bit()) { // We can only handle 64-bit elements that are within - // the signed 10-bit range on 32-bit targets. + // the signed 10-bit range or match vldi patterns on 32-bit targets. // See the BUILD_VECTOR case in LoongArchDAGToDAGISel::Select(). -
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
@@ -1349,6 +1350,98 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN); } +/// Infer type from a simple sizeof expression. +static QualType inferTypeFromSizeofExpr(const Expr *E) { + const Expr *Arg = E->IgnoreParenImpCasts(); + if (const auto *UET = dyn_cast(Arg)) { +if (UET->getKind() == UETT_SizeOf) { + if (UET->isArgumentType()) { +return UET->getArgumentTypeInfo()->getType(); + } else { +return UET->getArgumentExpr()->getType(); + } +} + } + return QualType(); +} + +/// Infer type from an arithmetic expression involving a sizeof. +static QualType inferTypeFromArithSizeofExpr(const Expr *E) { + const Expr *Arg = E->IgnoreParenImpCasts(); + // The argument is a lone sizeof expression. + QualType QT = inferTypeFromSizeofExpr(Arg); + if (!QT.isNull()) +return QT; + if (const auto *BO = dyn_cast(Arg)) { +// Argument is an arithmetic expression. Cover common arithmetic patterns +// involving sizeof. +switch (BO->getOpcode()) { +case BO_Add: +case BO_Div: +case BO_Mul: +case BO_Shl: +case BO_Shr: +case BO_Sub: + QT = inferTypeFromArithSizeofExpr(BO->getLHS()); melver wrote: The Linux kernel has structs with flexible array members, and it's not uncommon to see this: ``` struct A { int len; struct Foo *foo; int array[]; }; ... = kmalloc(sizeof(A) + sizeof(int) * N, ...); ``` I'm willing to accept some degree of unsoundness in complex cases to get completeness here, but am assuming that in the majority of cases the first type is the one we want to pick. https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/TIFitis updated https://github.com/llvm/llvm-project/pull/158722 >From 6976910364aa2fe18603aefcb27b10bd0120513d Mon Sep 17 00:00:00 2001 From: Akash Banerjee Date: Mon, 15 Sep 2025 20:35:29 +0100 Subject: [PATCH 1/6] Add complex.powi op. --- flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 20 ++-- .../Transforms/ConvertComplexPow.cpp | 94 +-- flang/test/Lower/HLFIR/binary-ops.f90 | 2 +- .../test/Lower/Intrinsics/pow_complex16i.f90 | 2 +- .../test/Lower/Intrinsics/pow_complex16k.f90 | 2 +- flang/test/Lower/amdgcn-complex.f90 | 9 ++ flang/test/Lower/power-operator.f90 | 9 +- .../mlir/Dialect/Complex/IR/ComplexOps.td | 26 + .../ComplexToROCDLLibraryCalls.cpp| 41 +++- .../Transforms/AlgebraicSimplification.cpp| 24 +++-- .../Dialect/Math/Transforms/CMakeLists.txt| 1 + .../complex-to-rocdl-library-calls.mlir | 14 +++ mlir/test/Dialect/Complex/powi-simplify.mlir | 20 13 files changed, 188 insertions(+), 76 deletions(-) create mode 100644 mlir/test/Dialect/Complex/powi-simplify.mlir diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp index 466458c05dba7..74a4e8f85c8ff 100644 --- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp +++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp @@ -1331,14 +1331,20 @@ mlir::Value genComplexPow(fir::FirOpBuilder &builder, mlir::Location loc, return genLibCall(builder, loc, mathOp, mathLibFuncType, args); auto complexTy = mlir::cast(mathLibFuncType.getInput(0)); mlir::Value exp = args[1]; - if (!mlir::isa(exp.getType())) { -auto realTy = complexTy.getElementType(); -mlir::Value realExp = builder.createConvert(loc, realTy, exp); -mlir::Value zero = builder.createRealConstant(loc, realTy, 0); -exp = -builder.create(loc, complexTy, realExp, zero); + mlir::Value result; + if (mlir::isa(exp.getType()) || + mlir::isa(exp.getType())) { +result = builder.create(loc, args[0], exp); + } else { +if (!mlir::isa(exp.getType())) { + auto realTy = complexTy.getElementType(); + mlir::Value realExp = builder.createConvert(loc, realTy, exp); + mlir::Value zero = builder.createRealConstant(loc, realTy, 0); + exp = builder.create(loc, complexTy, realExp, +zero); +} +result = builder.create(loc, args[0], exp); } - mlir::Value result = builder.create(loc, args[0], exp); result = builder.createConvert(loc, mathLibFuncType.getResult(0), result); return result; } diff --git a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp index 78f9d9e4f639a..d76451459def9 100644 --- a/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp +++ b/flang/lib/Optimizer/Transforms/ConvertComplexPow.cpp @@ -58,63 +58,57 @@ void ConvertComplexPowPass::runOnOperation() { ModuleOp mod = getOperation(); fir::FirOpBuilder builder(mod, fir::getKindMapping(mod)); - mod.walk([&](complex::PowOp op) { + mod.walk([&](complex::PowiOp op) { builder.setInsertionPoint(op); Location loc = op.getLoc(); auto complexTy = cast(op.getType()); auto elemTy = complexTy.getElementType(); - Value base = op.getLhs(); -Value rhs = op.getRhs(); - -Value intExp; -if (auto create = rhs.getDefiningOp()) { - if (isZero(create.getImaginary())) { -if (auto conv = create.getReal().getDefiningOp()) { - if (auto intTy = dyn_cast(conv.getValue().getType())) -intExp = conv.getValue(); -} - } -} - +Value intExp = op.getRhs(); func::FuncOp callee; -SmallVector args; -if (intExp) { - unsigned realBits = cast(elemTy).getWidth(); - unsigned intBits = cast(intExp.getType()).getWidth(); - auto funcTy = builder.getFunctionType( - {complexTy, builder.getIntegerType(intBits)}, {complexTy}); - if (realBits == 32 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); - else if (realBits == 32 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); - else if (realBits == 64 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); - else if (realBits == 64 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); - else if (realBits == 128 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); - else if (realBits == 128 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); - else -return; - args = {base, intExp}; -} else { - unsigned realBits = cast(elemTy).getWidth(); - auto funcTy = - builder.getFunctionType({complexTy, complexTy}, {complexTy}); -
[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies for epilogue (PR #159163)
https://github.com/jdenny-ornl edited https://github.com/llvm/llvm-project/pull/159163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoongArch] Generate [x]vldi instructions with special constant splats (PR #159258)
https://github.com/ylzsx updated https://github.com/llvm/llvm-project/pull/159258 >From e1a23dd6e31734b05af239bb827a280d403564ee Mon Sep 17 00:00:00 2001 From: yangzhaoxin Date: Wed, 17 Sep 2025 10:20:46 +0800 Subject: [PATCH 1/3] [LoongArch] Generate [x]vldi instructions with special constant splats --- .../LoongArch/LoongArchISelDAGToDAG.cpp | 52 +++ .../LoongArch/LoongArchISelLowering.cpp | 87 ++- .../Target/LoongArch/LoongArchISelLowering.h | 5 ++ .../CodeGen/LoongArch/lasx/build-vector.ll| 80 + .../lasx/fdiv-reciprocal-estimate.ll | 87 +++ .../lasx/fsqrt-reciprocal-estimate.ll | 39 +++-- llvm/test/CodeGen/LoongArch/lasx/fsqrt.ll | 3 +- .../LoongArch/lasx/ir-instruction/fdiv.ll | 3 +- llvm/test/CodeGen/LoongArch/lasx/vselect.ll | 31 +++ .../CodeGen/LoongArch/lsx/build-vector.ll | 77 +--- .../LoongArch/lsx/fdiv-reciprocal-estimate.ll | 87 +++ .../lsx/fsqrt-reciprocal-estimate.ll | 70 +-- llvm/test/CodeGen/LoongArch/lsx/fsqrt.ll | 3 +- .../LoongArch/lsx/ir-instruction/fdiv.ll | 3 +- llvm/test/CodeGen/LoongArch/lsx/vselect.ll| 31 +++ 15 files changed, 289 insertions(+), 369 deletions(-) diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp index 07e722b9a6591..fda313e693760 100644 --- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp @@ -113,10 +113,11 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { APInt SplatValue, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; -unsigned Op; +unsigned Op = 0; EVT ResTy = BVN->getValueType(0); bool Is128Vec = BVN->getValueType(0).is128BitVector(); bool Is256Vec = BVN->getValueType(0).is256BitVector(); +SDNode *Res; if (!Subtarget->hasExtLSX() || (!Is128Vec && !Is256Vec)) break; @@ -124,26 +125,25 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { HasAnyUndefs, 8)) break; -switch (SplatBitSize) { -default: - break; -case 8: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B; - break; -case 16: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H; - break; -case 32: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W; - break; -case 64: - Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D; - break; -} - -SDNode *Res; // If we have a signed 10 bit integer, we can splat it directly. if (SplatValue.isSignedIntN(10)) { + switch (SplatBitSize) { + default: +break; + case 8: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_B : LoongArch::PseudoVREPLI_B; +break; + case 16: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_H : LoongArch::PseudoVREPLI_H; +break; + case 32: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_W : LoongArch::PseudoVREPLI_W; +break; + case 64: +Op = Is256Vec ? LoongArch::PseudoXVREPLI_D : LoongArch::PseudoVREPLI_D; +break; + } + EVT EleType = ResTy.getVectorElementType(); APInt Val = SplatValue.sextOrTrunc(EleType.getSizeInBits()); SDValue Imm = CurDAG->getTargetConstant(Val, DL, EleType); @@ -151,6 +151,20 @@ void LoongArchDAGToDAGISel::Select(SDNode *Node) { ReplaceNode(Node, Res); return; } + +// Select appropriate [x]vldi instructions for some special constant splats, +// where the immediate value `imm[12] == 1` for used [x]vldi instructions. +std::pair ConvertVLDI = +LoongArchTargetLowering::isImmVLDILegalForMode1(SplatValue, +SplatBitSize); +if (ConvertVLDI.first) { + Op = Is256Vec ? LoongArch::XVLDI : LoongArch::VLDI; + SDValue Imm = CurDAG->getSignedTargetConstant( + SignExtend32<13>(ConvertVLDI.second), DL, MVT::i32); + Res = CurDAG->getMachineNode(Op, DL, ResTy, Imm); + ReplaceNode(Node, Res); + return; +} break; } } diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp index e8668860c2b38..460e2d7c87af7 100644 --- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp @@ -2679,9 +2679,10 @@ SDValue LoongArchTargetLowering::lowerBUILD_VECTOR(SDValue Op, if (SplatBitSize == 64 && !Subtarget.is64Bit()) { // We can only handle 64-bit elements that are within - // the signed 10-bit range on 32-bit targets. + // the signed 10-bit range or match vldi patterns on 32-bit targets. // See the BUILD_VECTOR case in LoongArchDAGToDAGISel::Select(). -
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)
https://github.com/RossBrunton converted_to_draft https://github.com/llvm/llvm-project/pull/157484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146076 >From 3b0c210862015dc304004641990fea429f8e31c7 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 05:38:52 -0400 Subject: [PATCH 1/3] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 10 +- .../identical-subrange-spill-infloop.ll | 352 +++--- .../AMDGPU/infer-addrspace-flat-atomic.ll | 14 +- llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll | 8 +- .../AMDGPU/lower-module-lds-via-hybrid.ll | 4 +- .../AMDGPU/lower-module-lds-via-table.ll | 16 +- .../match-perm-extract-vector-elt-bug.ll | 22 +- llvm/test/CodeGen/AMDGPU/memmove-var-size.ll | 16 +- .../AMDGPU/preload-implicit-kernargs.ll | 6 +- .../AMDGPU/promote-constOffset-to-imm.ll | 8 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 7 +- .../AMDGPU/ptradd-sdag-optimizations.ll | 94 ++--- .../AMDGPU/ptradd-sdag-undef-poison.ll| 6 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 27 +- llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll | 29 +- 15 files changed, 310 insertions(+), 309 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 78d608556f056..ac3d322ad65c3 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -64,14 +64,6 @@ static cl::opt UseDivergentRegisterIndexing( cl::desc("Use indirect register addressing for divergent indexes"), cl::init(false)); -// TODO: This option should be removed once we switch to always using PTRADD in -// the SelectionDAG. -static cl::opt UseSelectionDAGPTRADD( -"amdgpu-use-sdag-ptradd", cl::Hidden, -cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the " - "SelectionDAG ISel"), -cl::init(false)); - static bool denormalModeIsFlushAllF32(const MachineFunction &MF) { const SIMachineFunctionInfo *Info = MF.getInfo(); return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign(); @@ -11473,7 +11465,7 @@ static bool isNoUnsignedWrap(SDValue Addr) { bool SITargetLowering::shouldPreservePtrArith(const Function &F, EVT PtrVT) const { - return UseSelectionDAGPTRADD && PtrVT == MVT::i64; + return PtrVT == MVT::i64; } bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F, diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll index 2c03113e8af47..805cdd37d6e70 100644 --- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll +++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll @@ -6,96 +6,150 @@ define void @main(i1 %arg) #0 { ; CHECK: ; %bb.0: ; %bb ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill ; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v5, s30, 0 -; CHECK-NEXT:v_writelane_b32 v5, s31, 1 -; CHECK-NEXT:v_writelane_b32 v5, s36, 2 -; CHECK-NEXT:v_writelane_b32 v5, s37, 3 -; CHECK-NEXT:v_writelane_b32 v5, s38, 4 -; CHECK-NEXT:v_writelane_b32 v5, s39, 5 -; CHECK-NEXT:v_writelane_b32 v5, s48, 6 -; CHECK-NEXT:v_writelane_b32 v5, s49, 7 -; CHECK-NEXT:v_writelane_b32 v5, s50, 8 -; CHECK-NEXT:v_writelane_b32 v5, s51, 9 -; CHECK-NEXT:v_writelane_b32 v5, s52, 10 -; CHECK-NEXT:v_writelane_b32 v5, s53, 11 -; CHECK-NEXT:v_writelane_b32 v5, s54, 12 -; CHECK-NEXT:v_writelane_b32 v5, s55, 13 -; CHECK-NEXT:s_getpc_b64 s[24:25] -; CHECK-NEXT:v_writelane_b32 v5, s64, 14 -; CHECK-NEXT:s_movk_i32 s4, 0xf0 -; CHECK-NEXT:s_mov_b32 s5, s24 -; CHECK-NEXT:v_writelane_b32 v5, s65, 15 -; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0 -; CHECK-NEXT:s_mov_b64 s[4:5], 0 -; CHECK-NEXT:v_writelane_b32 v5, s66, 16 -; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0 -; CHECK-NEXT:v_writelane_b32 v5, s67, 17 -; CHECK-NEXT:s_waitcnt lgkmcnt(0) -; CHECK-NEXT:s_movk_i32 s6, 0x130 -; CHECK-NEXT:s_mov_b32 s7, s24 -; CHECK-NEXT:v_writelane_b32 v5
[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146074 >From b484d75cff9bd4703dd2c90d041d4df0aefd0e3c Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Thu, 26 Jun 2025 06:10:35 -0400 Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic. --- llvm/include/llvm/CodeGen/TargetLowering.h| 7 + llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 59 ++--- llvm/lib/Target/AMDGPU/SIISelLowering.h | 3 + 4 files changed, 94 insertions(+), 100 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index 46be271320fdd..4c2d991308d30 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase { return false; } + /// True if the target allows transformations of in-bounds pointer + /// arithmetic that cause out-of-bounds intermediate results. + virtual bool canTransformPtrArithOutOfBounds(const Function &F, + EVT PtrVT) const { +return false; + } + /// Does this target support complex deinterleaving virtual bool isComplexDeinterleavingSupported() const { return false; } diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 77bc47f28fc80..67db08c3f9bac 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) { if (PtrVT == IntVT && isNullConstant(N0)) return N1; - if (N0.getOpcode() != ISD::PTRADD || - reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) -return SDValue(); - - SDValue X = N0.getOperand(0); - SDValue Y = N0.getOperand(1); - SDValue Z = N1; - bool N0OneUse = N0.hasOneUse(); - bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); - bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); - - // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: - // * y is a constant and (ptradd x, y) has one use; or - // * y and z are both constants. - if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { -// If both additions in the original were NUW, the new ones are as well. -SDNodeFlags Flags = -(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; -SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); -AddToWorklist(Add.getNode()); -return DAG.getMemBasePlusOffset(X, Add, DL, Flags); + if (N0.getOpcode() == ISD::PTRADD && + !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) { +SDValue X = N0.getOperand(0); +SDValue Y = N0.getOperand(1); +SDValue Z = N1; +bool N0OneUse = N0.hasOneUse(); +bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); +bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); + +// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: +// * y is a constant and (ptradd x, y) has one use; or +// * y and z are both constants. +if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { + // If both additions in the original were NUW, the new ones are as well. + SDNodeFlags Flags = + (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; + SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); + AddToWorklist(Add.getNode()); + return DAG.getMemBasePlusOffset(X, Add, DL, Flags); +} + } + + // The following combines can turn in-bounds pointer arithmetic out of bounds. + // That is problematic for settings like AArch64's CPA, which checks that + // intermediate results of pointer arithmetic remain in bounds. The target + // therefore needs to opt-in to enable them. + if (!TLI.canTransformPtrArithOutOfBounds( + DAG.getMachineFunction().getFunction(), PtrVT)) +return SDValue(); + + if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) { +// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with +// global address GA and constant c, such that c can be folded into GA. +SDValue GAValue = N0.getOperand(0); +if (const GlobalAddressSDNode *GA = +dyn_cast(GAValue)) { + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + if (!LegalOperations && TLI.
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145330 >From da5b337fef36cdee209845b51bba323e84272334 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 04:03:53 -0400 Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125. --- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- .../CodeGen/SelectionDAG/TargetLowering.cpp | 21 +- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 6 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 67 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 196 ++ 6 files changed, 105 insertions(+), 194 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 93ddba93b8034..42d3b36f222d7 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -8600,7 +8600,7 @@ static bool isMemSrcFromConstant(SDValue Src, ConstantDataArraySlice &Slice) { GlobalAddressSDNode *G = nullptr; if (Src.getOpcode() == ISD::GlobalAddress) G = cast(Src); - else if (Src.getOpcode() == ISD::ADD && + else if (Src->isAnyAdd() && Src.getOperand(0).getOpcode() == ISD::GlobalAddress && Src.getOperand(1).getOpcode() == ISD::Constant) { G = cast(Src.getOperand(0)); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 177aa0d11ff90..7465c9b310cb9 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth, // operands on the new node are also disjoint. SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint : SDNodeFlags::None); + unsigned Opcode = Op.getOpcode(); + if (Opcode == ISD::PTRADD) { +// It isn't a ptradd anymore if it doesn't operate on the entire +// pointer. +Opcode = ISD::ADD; + } SDValue X = DAG.getNode( - Op.getOpcode(), dl, SmallVT, + Opcode, dl, SmallVT, DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)), DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags); assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?"); @@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits( return TLO.CombineTo(Op, And1); } [[fallthrough]]; + case ISD::PTRADD: +if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType()) + break; +// PTRADD behaves like ADD if pointers are represented as integers. +[[fallthrough]]; case ISD::ADD: case ISD::SUB: { // Add, Sub, and Mul don't demand any bits in positions beyond that @@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits( if (Op.getOpcode() == ISD::MUL) { Known = KnownBits::mul(KnownOp0, KnownOp1); -} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB. +} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB. Known = KnownBits::computeForAddSub( - Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(), - Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1); + Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(), + KnownOp0, KnownOp1); } break; } @@ -5679,7 +5690,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA, return true; } - if (N->getOpcode() == ISD::ADD) { + if (N->isAnyAdd()) { SDValue N1 = N->getOperand(0); SDValue N2 = N->getOperand(1); if (isGAPlusOffset(N1.getNode(), GA, Offset)) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index c2fca79979e1b..312de262490f4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr, SDValue &VAddr, C1 = nullptr; } - if (N0.getOpcode() == ISD::ADD) { + if (N0->isAnyAdd()) { // (add N2, N3) -> addr64, or // (add (add N2, N3), C1) -> addr64 SDValue N2 = N0.getOperand(0); @@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr, } // Match the variable offset. - if (Addr.getOpcode() == ISD::ADD) { + if (Addr->isAnyAdd()) { LHS = Addr.getOperand(0); if (!LHS
[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146074 >From b484d75cff9bd4703dd2c90d041d4df0aefd0e3c Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Thu, 26 Jun 2025 06:10:35 -0400 Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic. --- llvm/include/llvm/CodeGen/TargetLowering.h| 7 + llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 59 ++--- llvm/lib/Target/AMDGPU/SIISelLowering.h | 3 + 4 files changed, 94 insertions(+), 100 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index 46be271320fdd..4c2d991308d30 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase { return false; } + /// True if the target allows transformations of in-bounds pointer + /// arithmetic that cause out-of-bounds intermediate results. + virtual bool canTransformPtrArithOutOfBounds(const Function &F, + EVT PtrVT) const { +return false; + } + /// Does this target support complex deinterleaving virtual bool isComplexDeinterleavingSupported() const { return false; } diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 77bc47f28fc80..67db08c3f9bac 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) { if (PtrVT == IntVT && isNullConstant(N0)) return N1; - if (N0.getOpcode() != ISD::PTRADD || - reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) -return SDValue(); - - SDValue X = N0.getOperand(0); - SDValue Y = N0.getOperand(1); - SDValue Z = N1; - bool N0OneUse = N0.hasOneUse(); - bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); - bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); - - // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: - // * y is a constant and (ptradd x, y) has one use; or - // * y and z are both constants. - if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { -// If both additions in the original were NUW, the new ones are as well. -SDNodeFlags Flags = -(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; -SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); -AddToWorklist(Add.getNode()); -return DAG.getMemBasePlusOffset(X, Add, DL, Flags); + if (N0.getOpcode() == ISD::PTRADD && + !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) { +SDValue X = N0.getOperand(0); +SDValue Y = N0.getOperand(1); +SDValue Z = N1; +bool N0OneUse = N0.hasOneUse(); +bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); +bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); + +// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: +// * y is a constant and (ptradd x, y) has one use; or +// * y and z are both constants. +if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { + // If both additions in the original were NUW, the new ones are as well. + SDNodeFlags Flags = + (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; + SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); + AddToWorklist(Add.getNode()); + return DAG.getMemBasePlusOffset(X, Add, DL, Flags); +} + } + + // The following combines can turn in-bounds pointer arithmetic out of bounds. + // That is problematic for settings like AArch64's CPA, which checks that + // intermediate results of pointer arithmetic remain in bounds. The target + // therefore needs to opt-in to enable them. + if (!TLI.canTransformPtrArithOutOfBounds( + DAG.getMachineFunction().getFunction(), PtrVT)) +return SDValue(); + + if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) { +// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with +// global address GA and constant c, such that c can be folded into GA. +SDValue GAValue = N0.getOperand(0); +if (const GlobalAddressSDNode *GA = +dyn_cast(GAValue)) { + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + if (!LegalOperations && TLI.
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145330 >From da5b337fef36cdee209845b51bba323e84272334 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 04:03:53 -0400 Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125. --- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- .../CodeGen/SelectionDAG/TargetLowering.cpp | 21 +- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 6 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 67 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 196 ++ 6 files changed, 105 insertions(+), 194 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 93ddba93b8034..42d3b36f222d7 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -8600,7 +8600,7 @@ static bool isMemSrcFromConstant(SDValue Src, ConstantDataArraySlice &Slice) { GlobalAddressSDNode *G = nullptr; if (Src.getOpcode() == ISD::GlobalAddress) G = cast(Src); - else if (Src.getOpcode() == ISD::ADD && + else if (Src->isAnyAdd() && Src.getOperand(0).getOpcode() == ISD::GlobalAddress && Src.getOperand(1).getOpcode() == ISD::Constant) { G = cast(Src.getOperand(0)); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 177aa0d11ff90..7465c9b310cb9 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth, // operands on the new node are also disjoint. SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint : SDNodeFlags::None); + unsigned Opcode = Op.getOpcode(); + if (Opcode == ISD::PTRADD) { +// It isn't a ptradd anymore if it doesn't operate on the entire +// pointer. +Opcode = ISD::ADD; + } SDValue X = DAG.getNode( - Op.getOpcode(), dl, SmallVT, + Opcode, dl, SmallVT, DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)), DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags); assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?"); @@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits( return TLO.CombineTo(Op, And1); } [[fallthrough]]; + case ISD::PTRADD: +if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType()) + break; +// PTRADD behaves like ADD if pointers are represented as integers. +[[fallthrough]]; case ISD::ADD: case ISD::SUB: { // Add, Sub, and Mul don't demand any bits in positions beyond that @@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits( if (Op.getOpcode() == ISD::MUL) { Known = KnownBits::mul(KnownOp0, KnownOp1); -} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB. +} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB. Known = KnownBits::computeForAddSub( - Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(), - Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1); + Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(), + KnownOp0, KnownOp1); } break; } @@ -5679,7 +5690,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA, return true; } - if (N->getOpcode() == ISD::ADD) { + if (N->isAnyAdd()) { SDValue N1 = N->getOperand(0); SDValue N2 = N->getOperand(1); if (isGAPlusOffset(N1.getNode(), GA, Offset)) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index c2fca79979e1b..312de262490f4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr, SDValue &VAddr, C1 = nullptr; } - if (N0.getOpcode() == ISD::ADD) { + if (N0->isAnyAdd()) { // (add N2, N3) -> addr64, or // (add (add N2, N3), C1) -> addr64 SDValue N2 = N0.getOperand(0); @@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr, } // Match the variable offset. - if (Addr.getOpcode() == ISD::ADD) { + if (Addr->isAnyAdd()) { LHS = Addr.getOperand(0); if (!LHS
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146075 >From 7c417c4c1413a3807d476b7fc490256084a0ac62 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 04:23:50 -0400 Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which we will soon need to fold offsets into FLAT instructions. Currently, disjoint ORs can still be used for offset folding, so that part of the logic can't be tested. The PR contains a hacky workaround for a situation where an AssertAlign operand of a PTRADD is not DAGCombined before the PTRADD, causing the PTRADD to be turned into a disjoint OR although reassociating it with the operand of the AssertAlign would be better. This wouldn't be a problem if the DAGCombiner ensured that a node is only processed after all its operands have been processed. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 .../AMDGPU/ptradd-sdag-optimizations.ll | 56 ++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 78d608556f056..ffaaef65569ae 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -16145,6 +16145,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; +}); + +if (!TransformCanBreakAddrMode) + return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint); + } + if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse()) return SDValue(); diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 199c1f61d2522..7d7fe141e5440 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) { ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in the ; assertalign DAG combine. -define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) #0 { +define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) { ; GFX942-LABEL: llvm_amdgcn_queue_ptr: ; GFX942: ; %bb.0: ; GFX942-NEXT:v_mov_b32_e32 v0, 0 @@ -415,6 +415,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign no
[llvm-branch-commits] [llvm] [Offload] Add olGetMemInfo with platform-less API (PR #159581)
https://github.com/RossBrunton created https://github.com/llvm/llvm-project/pull/159581 None >From 149a8e88c447d10e9181ba0940c5d05ace6f0d5a Mon Sep 17 00:00:00 2001 From: Ross Brunton Date: Thu, 18 Sep 2025 15:23:45 +0100 Subject: [PATCH] [Offload] Add olGetMemInfo with platform-less API --- offload/liboffload/API/Memory.td | 50 +++ offload/liboffload/src/OffloadImpl.cpp| 54 offload/unittests/OffloadAPI/CMakeLists.txt | 4 +- .../OffloadAPI/memory/olGetMemInfo.cpp| 130 ++ .../OffloadAPI/memory/olGetMemInfoSize.cpp| 63 + 5 files changed, 300 insertions(+), 1 deletion(-) create mode 100644 offload/unittests/OffloadAPI/memory/olGetMemInfo.cpp create mode 100644 offload/unittests/OffloadAPI/memory/olGetMemInfoSize.cpp diff --git a/offload/liboffload/API/Memory.td b/offload/liboffload/API/Memory.td index debda165d2b23..3e47b586edd23 100644 --- a/offload/liboffload/API/Memory.td +++ b/offload/liboffload/API/Memory.td @@ -45,6 +45,56 @@ def olMemFree : Function { let returns = []; } +def ol_mem_info_t : Enum { + let desc = "Supported memory info."; + let is_typed = 1; + let etors = [ +TaggedEtor<"DEVICE", "ol_device_handle_t", "The handle of the device associated with the allocation.">, +TaggedEtor<"BASE", "void *", "Base address of this allocation.">, +TaggedEtor<"SIZE", "size_t", "Size of this allocation in bytes.">, +TaggedEtor<"TYPE", "ol_alloc_type_t", "Type of this allocation.">, + ]; +} + +def olGetMemInfo : Function { + let desc = "Queries the given property of a memory allocation allocated with olMemAlloc."; + let details = [ +"`olGetMemInfoSize` can be used to query the storage size required for the given query.", +"The provided pointer can point to any location inside the allocation.", + ]; + let params = [ +Param<"const void *", "Ptr", "pointer to the allocated memory", PARAM_IN>, +Param<"ol_mem_info_t", "PropName", "type of the info to retrieve", PARAM_IN>, +Param<"size_t", "PropSize", "the number of bytes pointed to by PropValue.", PARAM_IN>, +TypeTaggedParam<"void*", "PropValue", "array of bytes holding the info. " + "If Size is not equal to or greater to the real number of bytes needed to return the info " + "then the OL_ERRC_INVALID_SIZE error is returned and pPlatformInfo is not used.", PARAM_OUT, + TypeInfo<"PropName" , "PropSize">> + ]; + let returns = [ +Return<"OL_ERRC_INVALID_SIZE", [ + "`PropSize == 0`", + "If `PropSize` is less than the real number of bytes needed to return the info." +]>, +Return<"OL_ERRC_NOT_FOUND", ["memory was not allocated by this platform"]> + ]; +} + +def olGetMemInfoSize : Function { + let desc = "Returns the storage size of the given queue query."; + let details = [ +"The provided pointer can point to any location inside the allocation.", + ]; + let params = [ +Param<"const void *", "Ptr", "pointer to the allocated memory", PARAM_IN>, +Param<"ol_mem_info_t", "PropName", "type of the info to query", PARAM_IN>, +Param<"size_t*", "PropSizeRet", "pointer to the number of bytes required to store the query", PARAM_OUT> + ]; + let returns = [ +Return<"OL_ERRC_NOT_FOUND", ["memory was not allocated by this platform"]> + ]; +} + def olMemcpy : Function { let desc = "Enqueue a memcpy operation."; let details = [ diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp index 4a253c61a657b..2a0e238125dd7 100644 --- a/offload/liboffload/src/OffloadImpl.cpp +++ b/offload/liboffload/src/OffloadImpl.cpp @@ -700,6 +700,60 @@ Error olMemFree_impl(void *Address) { return Error::success(); } +Error olGetMemInfoImplDetail(const void *Ptr, ol_mem_info_t PropName, + size_t PropSize, void *PropValue, + size_t *PropSizeRet) { + InfoWriter Info(PropSize, PropValue, PropSizeRet); + std::lock_guard Lock(OffloadContext::get().AllocInfoMapMutex); + + auto &AllocBases = OffloadContext::get().AllocBases; + auto &AllocInfoMap = OffloadContext::get().AllocInfoMap; + const AllocInfo *Alloc = nullptr; + if (AllocInfoMap.contains(Ptr)) { +// Fast case, we have been given the base pointer directly +Alloc = &AllocInfoMap.at(Ptr); + } else { +// Slower case, we need to look up the base pointer first +// Find the first memory allocation whose end is after the target pointer, +// and then check to see if it is in range +auto Loc = std::lower_bound(AllocBases.begin(), AllocBases.end(), Ptr, +[&](const void *Iter, const void *Val) { + return AllocInfoMap.at(Iter).End <= Val; +}); +if (Loc == AllocBases.end() || Ptr < AllocInfoMap.at(*Loc).Start) + return Plugin::error(ErrorCode::NOT_FOUND, + "allocated memory information
[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)
https://github.com/RossBrunton closed https://github.com/llvm/llvm-project/pull/157484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload] `olGetMemInfo` (PR #157651)
https://github.com/RossBrunton closed https://github.com/llvm/llvm-project/pull/157651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146075 >From 7c417c4c1413a3807d476b7fc490256084a0ac62 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 04:23:50 -0400 Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which we will soon need to fold offsets into FLAT instructions. Currently, disjoint ORs can still be used for offset folding, so that part of the logic can't be tested. The PR contains a hacky workaround for a situation where an AssertAlign operand of a PTRADD is not DAGCombined before the PTRADD, causing the PTRADD to be turned into a disjoint OR although reassociating it with the operand of the AssertAlign would be better. This wouldn't be a problem if the DAGCombiner ensured that a node is only processed after all its operands have been processed. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 .../AMDGPU/ptradd-sdag-optimizations.ll | 56 ++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 78d608556f056..ffaaef65569ae 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -16145,6 +16145,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; +}); + +if (!TransformCanBreakAddrMode) + return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint); + } + if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse()) return SDValue(); diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 199c1f61d2522..7d7fe141e5440 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) { ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in the ; assertalign DAG combine. -define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) #0 { +define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) { ; GFX942-LABEL: llvm_amdgcn_queue_ptr: ; GFX942: ; %bb.0: ; GFX942-NEXT:v_mov_b32_e32 v0, 0 @@ -415,6 +415,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign no
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
https://github.com/tobias-stadler updated https://github.com/llvm/llvm-project/pull/156715 >From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001 From: Tobias Stadler Date: Thu, 18 Sep 2025 12:34:55 +0100 Subject: [PATCH] Replace bitstream blobs by yaml Created using spr 1.3.7-wip --- llvm/lib/Remarks/BitstreamRemarkParser.cpp| 5 +- .../dsymutil/ARM/remarks-linking-bundle.test | 13 +- .../basic1.macho.remarks.arm64.opt.bitstream | Bin 824 -> 0 bytes .../basic1.macho.remarks.arm64.opt.yaml | 47 + ...c1.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic2.macho.remarks.arm64.opt.bitstream | Bin 1696 -> 0 bytes .../basic2.macho.remarks.arm64.opt.yaml | 194 ++ ...c2.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic3.macho.remarks.arm64.opt.bitstream | Bin 1500 -> 0 bytes .../basic3.macho.remarks.arm64.opt.yaml | 181 ...c3.macho.remarks.empty.arm64.opt.bitstream | 0 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes .../remarks/fat.macho.remarks.x86_64.opt.yaml | 53 + .../fat.macho.remarks.x86_64h.opt.bitstream | Bin 820 -> 0 bytes .../fat.macho.remarks.x86_64h.opt.yaml| 53 + .../X86/remarks-linking-fat-bundle.test | 8 +- 16 files changed, 543 insertions(+), 11 deletions(-) delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp b/llvm/lib/Remarks/BitstreamRemarkParser.cpp index 63b16bd2df0ec..2b27a0f661d88 100644 --- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp +++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp @@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() { return E; if (ContainerType != BitstreamRemarkContainerType::RemarksFile) -return error( -"Error while parsing external file's BLOCK_META: wrong container " -"type."); +return ParserHelper->MetaHelper.error( +"Wrong container type in external file."); return Error::success(); } diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test index 09a60d7d044c6..e1b04455b0d9d 100644 --- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test +++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test @@ -1,22 +1,25 @@ RUN: rm -rf %t -RUN: mkdir -p %t +RUN: mkdir -p %t/private/tmp/remarks RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> %t/basic.macho.remarks.arm64 +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream -RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64 +RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t %t/basic.macho.remarks.arm64 Check that the remark file in the bundle exists and is sane: RUN: llvm-bcanalyzer -dump %t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64 | FileCheck %s -RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.remar
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id (PR #156842)
https://github.com/melver edited https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)
https://github.com/deaklajos created https://github.com/llvm/llvm-project/pull/159551 Backports: 3fc723ec2cf1965aa4eec8883957fbbe1b2e7027 (#136195) Ran into the issue on Alpine when building with TSAN that `__sanitizer_msghdr` and the `msghdr` provided by musl did not match. This caused lots of tsan reports and an eventual termination of the application by the oom during a `sendmsg`. From 60b10f56319e62415c61e69c67f9c713ed81172e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?De=C3=A1k=20Lajos?= <36414743+deakla...@users.noreply.github.com> Date: Tue, 22 Jul 2025 20:31:28 +0200 Subject: [PATCH] [compiler-rt][sanitizer] fix msghdr for musl (#136195) Ran into the issue on Alpine when building with TSAN that `__sanitizer_msghdr` and the `msghdr` provided by musl did not match. This caused lots of tsan reports and an eventual termination of the application by the oom during a `sendmsg`. --- .../sanitizer_platform_limits_posix.h | 24 +++ 1 file changed, 24 insertions(+) diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h index f118d53f0df80..24966523f3a02 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h @@ -478,6 +478,30 @@ struct __sanitizer_cmsghdr { int cmsg_level; int cmsg_type; }; +# elif SANITIZER_MUSL +struct __sanitizer_msghdr { + void *msg_name; + unsigned msg_namelen; + struct __sanitizer_iovec *msg_iov; + int msg_iovlen; +#if SANITIZER_WORDSIZE == 64 + int __pad1; +#endif + void *msg_control; + unsigned msg_controllen; +#if SANITIZER_WORDSIZE == 64 + int __pad2; +#endif + int msg_flags; +}; +struct __sanitizer_cmsghdr { + unsigned cmsg_len; +#if SANITIZER_WORDSIZE == 64 + int __pad1; +#endif + int cmsg_level; + int cmsg_type; +}; # else // In POSIX, int msg_iovlen; socklen_t msg_controllen; socklen_t cmsg_len; but // many implementations don't conform to the standard. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/21.x: [compiler-rt][sanitizer] fix msghdr for musl (PR #159551)
llvmbot wrote: @llvm/pr-subscribers-compiler-rt-sanitizer Author: Deák Lajos (deaklajos) Changes Backports: 3fc723ec2cf1965aa4eec8883957fbbe1b2e7027 (#136195) Ran into the issue on Alpine when building with TSAN that `__sanitizer_msghdr` and the `msghdr` provided by musl did not match. This caused lots of tsan reports and an eventual termination of the application by the oom during a `sendmsg`. --- Full diff: https://github.com/llvm/llvm-project/pull/159551.diff 1 Files Affected: - (modified) compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h (+24) ``diff diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h index f118d53f0df80..24966523f3a02 100644 --- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h +++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h @@ -478,6 +478,30 @@ struct __sanitizer_cmsghdr { int cmsg_level; int cmsg_type; }; +# elif SANITIZER_MUSL +struct __sanitizer_msghdr { + void *msg_name; + unsigned msg_namelen; + struct __sanitizer_iovec *msg_iov; + int msg_iovlen; +#if SANITIZER_WORDSIZE == 64 + int __pad1; +#endif + void *msg_control; + unsigned msg_controllen; +#if SANITIZER_WORDSIZE == 64 + int __pad2; +#endif + int msg_flags; +}; +struct __sanitizer_cmsghdr { + unsigned cmsg_len; +#if SANITIZER_WORDSIZE == 64 + int __pad1; +#endif + int cmsg_level; + int cmsg_type; +}; # else // In POSIX, int msg_iovlen; socklen_t msg_controllen; socklen_t cmsg_len; but // many implementations don't conform to the standard. `` https://github.com/llvm/llvm-project/pull/159551 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/vzakhari approved this pull request. The `powi` part looks good to me. Are you planning to merge it, and then rebase the other PR for the Flang changes for the final review? https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/TIFitis edited https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -47,74 +47,61 @@ static func::FuncOp getOrDeclare(fir::FirOpBuilder &builder, Location loc, return func; } -static bool isZero(Value v) { - if (auto cst = v.getDefiningOp()) -if (auto attr = dyn_cast(cst.getValue())) - return attr.getValue().isZero(); - return false; -} - void ConvertComplexPowPass::runOnOperation() { ModuleOp mod = getOperation(); fir::FirOpBuilder builder(mod, fir::getKindMapping(mod)); - mod.walk([&](complex::PowOp op) { + mod.walk([&](complex::PowiOp op) { builder.setInsertionPoint(op); Location loc = op.getLoc(); auto complexTy = cast(op.getType()); auto elemTy = complexTy.getElementType(); - Value base = op.getLhs(); -Value rhs = op.getRhs(); - -Value intExp; -if (auto create = rhs.getDefiningOp()) { - if (isZero(create.getImaginary())) { -if (auto conv = create.getReal().getDefiningOp()) { - if (auto intTy = dyn_cast(conv.getValue().getType())) -intExp = conv.getValue(); -} - } -} - +Value intExp = op.getRhs(); func::FuncOp callee; -SmallVector args; -if (intExp) { - unsigned realBits = cast(elemTy).getWidth(); - unsigned intBits = cast(intExp.getType()).getWidth(); - auto funcTy = builder.getFunctionType( - {complexTy, builder.getIntegerType(intBits)}, {complexTy}); - if (realBits == 32 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); - else if (realBits == 32 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); - else if (realBits == 64 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); - else if (realBits == 64 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); - else if (realBits == 128 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); - else if (realBits == 128 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); - else -return; - args = {base, intExp}; -} else { - unsigned realBits = cast(elemTy).getWidth(); - auto funcTy = - builder.getFunctionType({complexTy, complexTy}, {complexTy}); - if (realBits == 32) -callee = getOrDeclare(builder, loc, "cpowf", funcTy); - else if (realBits == 64) -callee = getOrDeclare(builder, loc, "cpow", funcTy); - else if (realBits == 128) -callee = getOrDeclare(builder, loc, RTNAME_STRING(CPowF128), funcTy); - else -return; - args = {base, rhs}; -} +unsigned realBits = cast(elemTy).getWidth(); +unsigned intBits = cast(intExp.getType()).getWidth(); +auto funcTy = builder.getFunctionType( +{complexTy, builder.getIntegerType(intBits)}, {complexTy}); +if (realBits == 32 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); +else if (realBits == 32 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); +else if (realBits == 64 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); +else if (realBits == 64 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); +else if (realBits == 128 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); +else if (realBits == 128 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); +else + return; +auto call = fir::CallOp::create(builder, loc, callee, {base, intExp}); +if (auto fmf = op.getFastmathAttr()) + call.setFastmathAttr(fmf); +op.replaceAllUsesWith(call.getResult(0)); +op.erase(); + }); -auto call = fir::CallOp::create(builder, loc, callee, args); + mod.walk([&](complex::PowOp op) { TIFitis wrote: I've updated this. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #150937)
https://github.com/vpykhtin updated https://github.com/llvm/llvm-project/pull/150937 >From ae3589e2c93351349cd1bbb5586c2dfcb075ea68 Mon Sep 17 00:00:00 2001 From: Valery Pykhtin Date: Thu, 10 Apr 2025 11:58:13 + Subject: [PATCH] amdgpu_use_ssaupdaterbulk_in_structurizecfg --- llvm/lib/Transforms/Scalar/StructurizeCFG.cpp | 25 +++ 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp index 2ee91a9b40026..0f3978f56045e 100644 --- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp +++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp @@ -47,6 +47,7 @@ #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/SSAUpdater.h" +#include "llvm/Transforms/Utils/SSAUpdaterBulk.h" #include #include @@ -321,7 +322,7 @@ class StructurizeCFG { void collectInfos(); - void insertConditions(bool Loops); + void insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter); void simplifyConditions(); @@ -671,10 +672,9 @@ void StructurizeCFG::collectInfos() { } /// Insert the missing branch conditions -void StructurizeCFG::insertConditions(bool Loops) { +void StructurizeCFG::insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter) { BranchVector &Conds = Loops ? LoopConds : Conditions; Value *Default = Loops ? BoolTrue : BoolFalse; - SSAUpdater PhiInserter; for (BranchInst *Term : Conds) { assert(Term->isConditional()); @@ -683,8 +683,9 @@ void StructurizeCFG::insertConditions(bool Loops) { BasicBlock *SuccTrue = Term->getSuccessor(0); BasicBlock *SuccFalse = Term->getSuccessor(1); -PhiInserter.Initialize(Boolean, ""); -PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default); +unsigned Variable = PhiInserter.AddVariable("", Boolean); +PhiInserter.AddAvailableValue(Variable, Loops ? SuccFalse : Parent, + Default); BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue]; @@ -697,7 +698,7 @@ void StructurizeCFG::insertConditions(bool Loops) { ParentInfo = PI; break; } - PhiInserter.AddAvailableValue(BB, PI.Pred); + PhiInserter.AddAvailableValue(Variable, BB, PI.Pred); Dominator.addAndRememberBlock(BB); } @@ -706,9 +707,9 @@ void StructurizeCFG::insertConditions(bool Loops) { CondBranchWeights::setMetadata(*Term, ParentInfo.Weights); } else { if (!Dominator.resultIsRememberedBlock()) -PhiInserter.AddAvailableValue(Dominator.result(), Default); +PhiInserter.AddAvailableValue(Variable, Dominator.result(), Default); - Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent)); + PhiInserter.AddUse(Variable, &Term->getOperandUse(0)); } } } @@ -1414,8 +1415,12 @@ bool StructurizeCFG::run(Region *R, DominatorTree *DT, orderNodes(); collectInfos(); createFlow(); - insertConditions(false); - insertConditions(true); + + SSAUpdaterBulk PhiInserter; + insertConditions(false, PhiInserter); + insertConditions(true, PhiInserter); + PhiInserter.RewriteAndOptimizeAllUses(*DT); + setPhiValues(); simplifyHoistedPhis(); simplifyConditions(); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
TIFitis wrote: > The `powi` part looks good to me. Are you planning to merge it, and then > rebase the other PR for the Flang changes for the final review? I plan on landing both PRs at once. This PR depends on #158642, which should land first. All the work should have been in a single PR but I split it up to make it easier to review. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 Author: Helena Kotas (hekota) Changes Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 --- Full diff: https://github.com/llvm/llvm-project/pull/159655.diff 5 Files Affected: - (modified) clang/include/clang/Basic/Builtins.td (+6) - (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) - (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) - (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) - (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) ``diff diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 27639f06529cb..96676bd810631 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : LangBuiltin<"HLSL_LANG"> { let Prototype = "void(...)"; } +def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> { + let Spellings = ["__builtin_hlsl_resource_nonuniformindex"]; + let Attributes = [NoThrow]; + let Prototype = "uint32_t(uint32_t)"; +} + def HLSLAll : LangBuiltin<"HLSL_LANG"> { let Spellings = ["__builtin_hlsl_all"]; let Attributes = [NoThrow, Const]; diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp index 7b5b924b1fe82..9f87afa5a8a3d 100644 --- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp +++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp @@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name}; return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args); } + case Builtin::BI__builtin_hlsl_resource_nonuniformindex: { +Value *IndexOp = EmitScalarExpr(E->getArg(0)); +llvm::Type *RetTy = ConvertType(E->getType()); +return Builder.CreateIntrinsic( +RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(), +ArrayRef{IndexOp}); + } case Builtin::BI__builtin_hlsl_all: { Value *Op0 = EmitScalarExpr(E->getArg(0)); return Builder.CreateIntrinsic( diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h index 370f3d5c5d30d..f4b410664d60c 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.h +++ b/clang/lib/CodeGen/CGHLSLRuntime.h @@ -129,6 +129,8 @@ class CGHLSLRuntime { resource_handlefrombinding) GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding, resource_handlefromimplicitbinding) + GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex, + resource_nonuniformindex) GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter) GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync, group_memory_barrier_with_group_sync) diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h index d9d87c827e6a4..0eab2ff56c519 100644 --- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h +++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h @@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) { return __detail::d3d_color_to_ubyte4_impl(V); } +//===--===// +// NonUniformResourceIndex builtin +//===--===// + +/// \fn uint NonUniformResourceIndex(uint I) +/// \brief A compiler hint to indicate that a resource index varies across +/// threads. +// / within a wave (i.e., it is non-uniform). +/// \param I [in] Resource array index +/// +/// The return value is the \Index parameter. +/// +/// When indexing into an array of shader resources (e.g., textures, buffers), +/// some GPU hardware and drivers require the compiler to know whether the index +/// is uniform (same for all threads) or non-uniform (varies per thread). +/// +/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, . +/// disabling certain assumptions or optimizations that could lead to incorrect +/// behavior when dynamically accessing resource arrays with non-uniform +/// indices. + +constexpr uint32_t NonUniformResourceIndex(uint32_t Index) { + return __builtin_hlsl_resource_nonuniformindex(Index); +} + //===--===// // reflect builtin //===--===// diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl new file mode 100644 index 0..ab512ce111d19 --- /dev/null +++ b/clang/test/CodeGenHLS
[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)
llvmbot wrote: @llvm/pr-subscribers-hlsl @llvm/pr-subscribers-clang-codegen Author: Helena Kotas (hekota) Changes Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 --- Full diff: https://github.com/llvm/llvm-project/pull/159655.diff 5 Files Affected: - (modified) clang/include/clang/Basic/Builtins.td (+6) - (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) - (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) - (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) - (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) ``diff diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 27639f06529cb..96676bd810631 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : LangBuiltin<"HLSL_LANG"> { let Prototype = "void(...)"; } +def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> { + let Spellings = ["__builtin_hlsl_resource_nonuniformindex"]; + let Attributes = [NoThrow]; + let Prototype = "uint32_t(uint32_t)"; +} + def HLSLAll : LangBuiltin<"HLSL_LANG"> { let Spellings = ["__builtin_hlsl_all"]; let Attributes = [NoThrow, Const]; diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp index 7b5b924b1fe82..9f87afa5a8a3d 100644 --- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp +++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp @@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name}; return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args); } + case Builtin::BI__builtin_hlsl_resource_nonuniformindex: { +Value *IndexOp = EmitScalarExpr(E->getArg(0)); +llvm::Type *RetTy = ConvertType(E->getType()); +return Builder.CreateIntrinsic( +RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(), +ArrayRef{IndexOp}); + } case Builtin::BI__builtin_hlsl_all: { Value *Op0 = EmitScalarExpr(E->getArg(0)); return Builder.CreateIntrinsic( diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h index 370f3d5c5d30d..f4b410664d60c 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.h +++ b/clang/lib/CodeGen/CGHLSLRuntime.h @@ -129,6 +129,8 @@ class CGHLSLRuntime { resource_handlefrombinding) GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding, resource_handlefromimplicitbinding) + GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex, + resource_nonuniformindex) GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter) GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync, group_memory_barrier_with_group_sync) diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h index d9d87c827e6a4..0eab2ff56c519 100644 --- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h +++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h @@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) { return __detail::d3d_color_to_ubyte4_impl(V); } +//===--===// +// NonUniformResourceIndex builtin +//===--===// + +/// \fn uint NonUniformResourceIndex(uint I) +/// \brief A compiler hint to indicate that a resource index varies across +/// threads. +// / within a wave (i.e., it is non-uniform). +/// \param I [in] Resource array index +/// +/// The return value is the \Index parameter. +/// +/// When indexing into an array of shader resources (e.g., textures, buffers), +/// some GPU hardware and drivers require the compiler to know whether the index +/// is uniform (same for all threads) or non-uniform (varies per thread). +/// +/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, . +/// disabling certain assumptions or optimizations that could lead to incorrect +/// behavior when dynamically accessing resource arrays with non-uniform +/// indices. + +constexpr uint32_t NonUniformResourceIndex(uint32_t Index) { + return __builtin_hlsl_resource_nonuniformindex(Index); +} + //===--===// // reflect builtin //===--===// diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl new file mode 100644 index 0..ab512ce111d19 --- /dev/null
[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)
https://github.com/hekota created https://github.com/llvm/llvm-project/pull/159655 Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 >From 108bf356e743d36b4eb5d0217720cf47ab85f33f Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Thu, 18 Sep 2025 14:31:38 -0700 Subject: [PATCH] [HLSL] NonUniformResourceIndex implementation Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 --- clang/include/clang/Basic/Builtins.td | 6 +++ clang/lib/CodeGen/CGHLSLBuiltins.cpp | 7 clang/lib/CodeGen/CGHLSLRuntime.h | 2 + clang/lib/Headers/hlsl/hlsl_intrinsics.h | 25 .../resources/NonUniformResourceIndex.hlsl| 38 +++ 5 files changed, 78 insertions(+) create mode 100644 clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 27639f06529cb..96676bd810631 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : LangBuiltin<"HLSL_LANG"> { let Prototype = "void(...)"; } +def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> { + let Spellings = ["__builtin_hlsl_resource_nonuniformindex"]; + let Attributes = [NoThrow]; + let Prototype = "uint32_t(uint32_t)"; +} + def HLSLAll : LangBuiltin<"HLSL_LANG"> { let Spellings = ["__builtin_hlsl_all"]; let Attributes = [NoThrow, Const]; diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp index 7b5b924b1fe82..9f87afa5a8a3d 100644 --- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp +++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp @@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name}; return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args); } + case Builtin::BI__builtin_hlsl_resource_nonuniformindex: { +Value *IndexOp = EmitScalarExpr(E->getArg(0)); +llvm::Type *RetTy = ConvertType(E->getType()); +return Builder.CreateIntrinsic( +RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(), +ArrayRef{IndexOp}); + } case Builtin::BI__builtin_hlsl_all: { Value *Op0 = EmitScalarExpr(E->getArg(0)); return Builder.CreateIntrinsic( diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h index 370f3d5c5d30d..f4b410664d60c 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.h +++ b/clang/lib/CodeGen/CGHLSLRuntime.h @@ -129,6 +129,8 @@ class CGHLSLRuntime { resource_handlefrombinding) GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding, resource_handlefromimplicitbinding) + GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex, + resource_nonuniformindex) GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter) GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync, group_memory_barrier_with_group_sync) diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h index d9d87c827e6a4..0eab2ff56c519 100644 --- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h +++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h @@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) { return __detail::d3d_color_to_ubyte4_impl(V); } +//===--===// +// NonUniformResourceIndex builtin +//===--===// + +/// \fn uint NonUniformResourceIndex(uint I) +/// \brief A compiler hint to indicate that a resource index varies across +/// threads. +// / within a wave (i.e., it is non-uniform). +/// \param I [in] Resource array index +/// +/// The return value is the \Index parameter. +/// +/// When indexing into an array of shader resources (e.g., textures, buffers), +/// some GPU hardware and drivers require the compiler to know whether the index +/// is uniform (same for all threads) or non-uniform (varies per thread). +/// +/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, . +/// disabling certain assumptions or optimizations that could lead to incorrect +/// behavior when dynamically accessing resource arrays with non-uniform +/// indices. + +constexpr uint32_t NonUniformResourceIndex(uint32_t Index) { + return __builtin_hlsl_resource
[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Helena Kotas (hekota) Changes Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 --- Full diff: https://github.com/llvm/llvm-project/pull/159655.diff 5 Files Affected: - (modified) clang/include/clang/Basic/Builtins.td (+6) - (modified) clang/lib/CodeGen/CGHLSLBuiltins.cpp (+7) - (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+2) - (modified) clang/lib/Headers/hlsl/hlsl_intrinsics.h (+25) - (added) clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl (+38) ``diff diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 27639f06529cb..96676bd810631 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : LangBuiltin<"HLSL_LANG"> { let Prototype = "void(...)"; } +def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> { + let Spellings = ["__builtin_hlsl_resource_nonuniformindex"]; + let Attributes = [NoThrow]; + let Prototype = "uint32_t(uint32_t)"; +} + def HLSLAll : LangBuiltin<"HLSL_LANG"> { let Spellings = ["__builtin_hlsl_all"]; let Attributes = [NoThrow, Const]; diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp index 7b5b924b1fe82..9f87afa5a8a3d 100644 --- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp +++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp @@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name}; return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args); } + case Builtin::BI__builtin_hlsl_resource_nonuniformindex: { +Value *IndexOp = EmitScalarExpr(E->getArg(0)); +llvm::Type *RetTy = ConvertType(E->getType()); +return Builder.CreateIntrinsic( +RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(), +ArrayRef{IndexOp}); + } case Builtin::BI__builtin_hlsl_all: { Value *Op0 = EmitScalarExpr(E->getArg(0)); return Builder.CreateIntrinsic( diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h index 370f3d5c5d30d..f4b410664d60c 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.h +++ b/clang/lib/CodeGen/CGHLSLRuntime.h @@ -129,6 +129,8 @@ class CGHLSLRuntime { resource_handlefrombinding) GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding, resource_handlefromimplicitbinding) + GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex, + resource_nonuniformindex) GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter) GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync, group_memory_barrier_with_group_sync) diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h index d9d87c827e6a4..0eab2ff56c519 100644 --- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h +++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h @@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) { return __detail::d3d_color_to_ubyte4_impl(V); } +//===--===// +// NonUniformResourceIndex builtin +//===--===// + +/// \fn uint NonUniformResourceIndex(uint I) +/// \brief A compiler hint to indicate that a resource index varies across +/// threads. +// / within a wave (i.e., it is non-uniform). +/// \param I [in] Resource array index +/// +/// The return value is the \Index parameter. +/// +/// When indexing into an array of shader resources (e.g., textures, buffers), +/// some GPU hardware and drivers require the compiler to know whether the index +/// is uniform (same for all threads) or non-uniform (varies per thread). +/// +/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, . +/// disabling certain assumptions or optimizations that could lead to incorrect +/// behavior when dynamically accessing resource arrays with non-uniform +/// indices. + +constexpr uint32_t NonUniformResourceIndex(uint32_t Index) { + return __builtin_hlsl_resource_nonuniformindex(Index); +} + //===--===// // reflect builtin //===--===// diff --git a/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl b/clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl new file mode 100644 index 0..ab512ce111d19 --- /dev/null +++ b/clang/test/CodeGenHLSL/reso
[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158274 >From 1a85c9cf7cdf944be302c00efd231eba5d46bdc6 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 9 Sep 2025 11:15:47 +0900 Subject: [PATCH] X86: Switch to RegClassByHwMode Replace the target uses of PointerLikeRegClass with RegClassByHwMode --- .../X86/MCTargetDesc/X86MCTargetDesc.cpp | 3 ++ llvm/lib/Target/X86/X86.td| 2 ++ llvm/lib/Target/X86/X86InstrInfo.td | 8 ++--- llvm/lib/Target/X86/X86InstrOperands.td | 30 +++- llvm/lib/Target/X86/X86InstrPredicates.td | 14 llvm/lib/Target/X86/X86RegisterInfo.cpp | 35 +-- llvm/lib/Target/X86/X86Subtarget.h| 4 +-- llvm/utils/TableGen/X86FoldTablesEmitter.cpp | 4 +-- 8 files changed, 57 insertions(+), 43 deletions(-) diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp index bb1e716c33ed5..1d5ef8b0996dc 100644 --- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp +++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp @@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) { else FS = "-64bit-mode,-32bit-mode,+16bit-mode"; + if (TT.isX32()) +FS += ",+x32"; + return FS; } diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td index 7c9e821c02fda..3af8b3e060a16 100644 --- a/llvm/lib/Target/X86/X86.td +++ b/llvm/lib/Target/X86/X86.td @@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", "true", "32-bit mode (80386)">; def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true", "16-bit mode (i8086)">; +def IsX32 : SubtargetFeature<"x32", "IsX32", "true", + "64-bit with ILP32 programming model (e.g. x32 ABI)">; //===--===// // X86 Subtarget ISA features diff --git a/llvm/lib/Target/X86/X86InstrInfo.td b/llvm/lib/Target/X86/X86InstrInfo.td index 7f6c5614847e3..0c4abc2c400f6 100644 --- a/llvm/lib/Target/X86/X86InstrInfo.td +++ b/llvm/lib/Target/X86/X86InstrInfo.td @@ -18,14 +18,14 @@ include "X86InstrFragments.td" include "X86InstrFragmentsSIMD.td" //===--===// -// X86 Operand Definitions. +// X86 Predicate Definitions. // -include "X86InstrOperands.td" +include "X86InstrPredicates.td" //===--===// -// X86 Predicate Definitions. +// X86 Operand Definitions. // -include "X86InstrPredicates.td" +include "X86InstrOperands.td" //===--===// // X86 Instruction Format Definitions. diff --git a/llvm/lib/Target/X86/X86InstrOperands.td b/llvm/lib/Target/X86/X86InstrOperands.td index 80843f6bb80e6..5207ecad127a2 100644 --- a/llvm/lib/Target/X86/X86InstrOperands.td +++ b/llvm/lib/Target/X86/X86InstrOperands.td @@ -6,9 +6,15 @@ // //===--===// +def x86_ptr_rc : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32, GR64, LOW32_ADDR_ACCESS]>; + // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for // the index operand of an address, to conform to x86 encoding restrictions. -def ptr_rc_nosp : PointerLikeRegClass<1>; +def ptr_rc_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOSP, GR64_NOSP, GR32_NOSP]>; // *mem - Operand definitions for the funky X86 addressing mode operands. // @@ -53,7 +59,7 @@ class X86MemOperand : Operand { let PrintMethod = printMethod; - let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); let ParserMatchClass = parserMatchClass; let OperandType = "OPERAND_MEMORY"; int Size = size; @@ -63,7 +69,7 @@ class X86MemOperand : X86MemOperand { - let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); } def anymem : X86MemOperand<"printMemReference">; @@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", X86Mem64AsmOperand>; // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead // of a plain GPR, so that it doesn't potentially require a REX prefix. -def ptr_rc_norex : PointerLikeRegClass<2>; -def ptr_rc_norex_nosp : PointerLikeRegClass<3>; +def ptr_rc_norex : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX, GR64_NOREX, GR32_NOREX]>; + +def ptr_rc_norex_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>; + def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> { let MIOpe
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP3 dpp support (PR #159654)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/159654 None >From b83405b879b471da983f885bfdffb3d1f58130de Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 18 Sep 2025 14:30:20 -0700 Subject: [PATCH] [AMDGPU] gfx1251 VOP3 dpp support --- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3Instructions.td| 64 ++-- llvm/lib/Target/AMDGPU/VOPInstructions.td | 78 + llvm/test/CodeGen/AMDGPU/dpp64_combine.ll | 4 + llvm/test/MC/AMDGPU/gfx1251_asm_vop3_dpp16.s | 150 ++ .../AMDGPU/gfx1251_asm_vop3_from_vop1_dpp16.s | 58 +++ .../AMDGPU/gfx1251_asm_vop3_from_vop1_err.s | 150 ++ .../AMDGPU/gfx1251_asm_vop3_from_vop2_dpp16.s | 34 .../AMDGPU/gfx1251_asm_vop3_from_vop2_err.s | 93 +++ llvm/test/MC/AMDGPU/vop3-gfx9.s | 4 +- .../AMDGPU/gfx1251_dasm_vop3_dpp16.txt| 94 +++ .../gfx1251_dasm_vop3_from_vop1_dpp16.txt | 43 + .../gfx1251_dasm_vop3_from_vop2_dpp16.txt | 25 +++ 13 files changed, 745 insertions(+), 53 deletions(-) create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_dpp16.s create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop1_dpp16.s create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop1_err.s create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop2_dpp16.s create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop3_from_vop2_err.s create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_dpp16.txt create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_from_vop1_dpp16.txt create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop3_from_vop2_dpp16.txt diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td index c49f1930705aa..18fae6cfc7ed9 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td @@ -1969,6 +1969,7 @@ class getVOP3DPPSrcForVT { RegisterOperand ret = !cond(!eq(VT, i1) : SSrc_i1, !eq(VT, i16): !if (IsFake16, VCSrc_b16, VCSrcT_b16), +!eq(VT, i64): VCSrc_b64, !eq(VT, f16): !if (IsFake16, VCSrc_f16, VCSrcT_f16), !eq(VT, bf16) : !if (IsFake16, VCSrc_bf16, VCSrcT_bf16), !eq(VT, v2i16) : VCSrc_v2b16, diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index 582a353632436..e6a7c35dce0be 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -24,6 +24,7 @@ def VOP_F32_F32_F32_F32_VCC : VOPProfile<[f32, f32, f32, f32]> { } def VOP_F64_F64_F64_F64_VCC : VOPProfile<[f64, f64, f64, f64]> { let Outs64 = (outs DstRC.RegClass:$vdst); + let HasExt64BitDPP = 1; let IsSingle = 1; } } @@ -51,7 +52,24 @@ def VOP3b_I64_I1_I32_I32_I64 : VOPProfile<[i64, i32, i32, i64]> { let HasExt64BitDPP = 1 in { def VOP3b_F32_I1_F32_F32_F32 : VOP3b_Profile; -def VOP3b_F64_I1_F64_F64_F64 : VOP3b_Profile; +def VOP3b_F64_I1_F64_F64_F64 : VOP3b_Profile { + let OutsVOP3DPP = Outs64; + let AsmVOP3DPP = getAsmVOP3DPP.ret; + let AsmVOP3DPP16 = getAsmVOP3DPP16.ret; + let AsmVOP3DPP8 = getAsmVOP3DPP8.ret; +} + +def VOP3b_I64_I1_I32_I32_I64_DPP : VOPProfile<[i64, i32, i32, i64]> { + let HasClamp = 1; + + let IsSingle = 1; + let Outs64 = (outs DstRC:$vdst, VOPDstS64orS32:$sdst); + let OutsVOP3DPP = Outs64; + let Asm64 = "$vdst, $sdst, $src0, $src1, $src2$clamp"; + let AsmVOP3DPP = getAsmVOP3DPP.ret; + let AsmVOP3DPP16 = getAsmVOP3DPP16.ret; + let AsmVOP3DPP8 = getAsmVOP3DPP8.ret; +} class V_MUL_PROF : VOP3_Profile { let HasExtVOP3DPP = 0; @@ -229,7 +247,7 @@ defm V_DIV_FMAS_F32 : VOP3Inst_Pseudo_Wrapper <"v_div_fmas_f32", VOP_F32_F32_F32 // result *= 2^64 // let SchedRW = [WriteDouble], FPDPRounding = 1 in -defm V_DIV_FMAS_F64 : VOP3Inst_Pseudo_Wrapper <"v_div_fmas_f64", VOP_F64_F64_F64_F64_VCC, []>; +defm V_DIV_FMAS_F64 : VOP3Inst <"v_div_fmas_f64", VOP_F64_F64_F64_F64_VCC>; } // End Uses = [MODE, VCC, EXEC] } // End isCommutable = 1 @@ -294,7 +312,7 @@ defm V_CVT_PK_U8_F32 : VOP3Inst<"v_cvt_pk_u8_f32", VOP3_Profile; let SchedRW = [WriteDoubleAdd], FPDPRounding = 1 in { - defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP3_Profile, AMDGPUdiv_fixup>; + defm V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP_F64_F64_F64_F64_DPP_PROF, AMDGPUdiv_fixup>; defm V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile, any_fldexp>; } // End SchedRW = [WriteDoubleAdd], FPDPRounding = 1 } // End isReMaterializable = 1 @@ -335,7 +353,7 @@ let mayRaiseFPException = 0 in { // Seems suspicious but manual doesn't say it d // Double precision division pre-scale. let SchedRW = [WriteDouble, WriteSALU], FPDPRounding = 1 in - defm V_DIV_SCALE_F64 : VOP3Inst_Pseudo_Wrapper <"v_div_scale_f64", VOP3b_F64_I1_F64_F64_F64>; + defm V_DIV_SCALE_F64 : VOP3Inst <
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP3 dpp support (PR #159654)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#159654** https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159654?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#159641** https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#159637** https://app.graphite.dev/github/pr/llvm/llvm-project/159637?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/159654 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc++] Annotate classes with _LIBCXX_PFP to enable pointer field protection (PR #151652)
pcc wrote: > What is the reasoning behind this? Could we document something when to apply > the attribute? I added this to types which are commonly used, as mentioned in the commit message. I will document that in the coding guidelines. https://github.com/llvm/llvm-project/pull/151652 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/159632 >From 7bb9fb5b3b9a2dfcd1d00f01c86fe26c5d14c30f Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Thu, 18 Sep 2025 08:49:38 -0500 Subject: [PATCH] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some semantics-checking functions needed to be updated to operate on a single object at a time. --- flang/include/flang/Parser/openmp-utils.h| 4 +- flang/include/flang/Parser/parse-tree.h | 3 +- flang/include/flang/Semantics/openmp-utils.h | 3 +- flang/lib/Parser/openmp-parsers.cpp | 7 +- flang/lib/Parser/unparse.cpp | 7 +- flang/lib/Semantics/check-omp-structure.cpp | 89 +++- flang/lib/Semantics/check-omp-structure.h| 3 + flang/lib/Semantics/openmp-utils.cpp | 22 +++-- flang/lib/Semantics/resolve-directives.cpp | 11 ++- 9 files changed, 86 insertions(+), 63 deletions(-) diff --git a/flang/include/flang/Parser/openmp-utils.h b/flang/include/flang/Parser/openmp-utils.h index 032fb8996fe48..1372945427955 100644 --- a/flang/include/flang/Parser/openmp-utils.h +++ b/flang/include/flang/Parser/openmp-utils.h @@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, D::OMPD_declare_simd); MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target); MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate); MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires); -MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate); #undef MAKE_CONSTR_ID @@ -111,8 +110,7 @@ struct DirectiveNameScope { std::is_same_v || std::is_same_v || std::is_same_v || - std::is_same_v || - std::is_same_v) { + std::is_same_v) { return MakeName(std::get(x.t).source, ConstructId::id); } else { return GetFromTuple( diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h index 09a45476420df..8cb6d2e744876 100644 --- a/flang/include/flang/Parser/parse-tree.h +++ b/flang/include/flang/Parser/parse-tree.h @@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct { // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list) struct OpenMPThreadprivate { - TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate); + WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification); CharBlock source; - std::tuple t; }; // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause] diff --git a/flang/include/flang/Semantics/openmp-utils.h b/flang/include/flang/Semantics/openmp-utils.h index 68318d6093a1e..65441728c5549 100644 --- a/flang/include/flang/Semantics/openmp-utils.h +++ b/flang/include/flang/Semantics/openmp-utils.h @@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const parser::OmpObject &object); const parser::ArrayElement *GetArrayElementFromObj( const parser::OmpObject &object); const Symbol *GetObjectSymbol(const parser::OmpObject &object); -const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); std::optional GetObjectSource( const parser::OmpObject &object); +const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); +const parser::OmpObject *GetArgumentObject(const parser::OmpArgument &argument); bool IsCommonBlock(const Symbol &sym); bool IsExtendedListItem(const Symbol &sym); diff --git a/flang/lib/Parser/openmp-parsers.cpp b/flang/lib/Parser/openmp-parsers.cpp index 66526ba00b5ed..60ce71cf983f6 100644 --- a/flang/lib/Parser/openmp-parsers.cpp +++ b/flang/lib/Parser/openmp-parsers.cpp @@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct( verbatim("REQUIRES"_tok), Parser{}))) // 2.15.2 Threadprivate directive -TYPE_PARSER(sourced(construct( -verbatim("THREADPRIVATE"_tok), parenthesized(Parser{} +TYPE_PARSER(sourced( // +construct( +predicated(OmpDirectiveNameParser{}, +IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >= +Parser{}))) // 2.11.3 Declarative Allocate directive TYPE_PARSER( diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp index 189a34ee1dc56..db46525ac57b1 100644 --- a/flang/lib/Parser/unparse.cpp +++ b/flang/lib/Parser/unparse.cpp @@ -2611,12 +2611,11 @@ class UnparseVisitor { } void Unparse(const OpenMPThreadprivate &x) { BeginOpenMP(); -Word("!$OMP THREADPRIVATE ("); -Walk(std::get(x.t)); -Put(")\n"); +Word("!$OMP "); +Walk(x.v); +Put("\n"); EndOpenMP(); } - bool Pre(const OmpMessageClause &x) { Walk(x.v); return false; diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp index 1ee5385fb38a1..507957dfecb3d 100644 --- a/flang/lib/Semantics/check-omp-structure.cpp +++ b/flang/lib/Semantics/check-omp-structure.cpp @@ -669,11 +669,6 @@ template struct DirectiveSpellingVisi
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/159641 None >From 344bfe15f023e965348da4d92738b48683768887 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 18 Sep 2025 12:58:41 -0700 Subject: [PATCH] [AMDGPU] gfx1251 VOP2 dpp support --- llvm/lib/Target/AMDGPU/VOP2Instructions.td| 79 +++-- llvm/test/CodeGen/AMDGPU/dpp_combine.ll | 6 +- llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s | 74 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s| 106 ++ .../AMDGPU/gfx1251_dasm_vop2_dpp16.txt| 37 ++ 5 files changed, 267 insertions(+), 35 deletions(-) create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s create mode 100644 llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop2_dpp16.txt diff --git a/llvm/lib/Target/AMDGPU/VOP2Instructions.td b/llvm/lib/Target/AMDGPU/VOP2Instructions.td index 46a1a4bf1ab4a..37d92bc5076de 100644 --- a/llvm/lib/Target/AMDGPU/VOP2Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP2Instructions.td @@ -287,10 +287,14 @@ multiclass VOP2bInst , Commutable_REV; - let SubtargetPredicate = isGFX11Plus in { -if P.HasExtVOP3DPP then - def _e64_dpp : VOP3_DPP_Pseudo ; - } // End SubtargetPredicate = isGFX11Plus + if P.HasExtVOP3DPP then +def _e64_dpp : VOP3_DPP_Pseudo { + let SubtargetPredicate = isGFX11Plus; +} + else if P.HasExt64BitDPP then +def _e64_dpp : VOP3_DPP_Pseudo { + let OtherPredicates = [HasDPALU_DPP]; + } } } @@ -345,10 +349,14 @@ multiclass VOPD_Component; } -let SubtargetPredicate = isGFX11Plus in { - if P.HasExtVOP3DPP then -def _e64_dpp : VOP3_DPP_Pseudo ; -} // End SubtargetPredicate = isGFX11Plus +if P.HasExtVOP3DPP then + def _e64_dpp : VOP3_DPP_Pseudo { +let SubtargetPredicate = isGFX11Plus; + } +else if P.HasExt64BitDPP then + def _e64_dpp : VOP3_DPP_Pseudo { +let OtherPredicates = [HasDPALU_DPP]; + } } } @@ -1607,8 +1615,9 @@ multiclass VOP2_Real_dpp op> { } multiclass VOP2_Real_dpp8 op> { - if !cast(NAME#"_e32").Pfl.HasExtDPP then - def _dpp8#Gen.Suffix : VOP2_DPP8_Gen(NAME#"_e32"), Gen>; + defvar ps = !cast(NAME#"_e32"); + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then +def _dpp8#Gen.Suffix : VOP2_DPP8_Gen; } //===- VOP2 (with name) -===// @@ -1643,10 +1652,10 @@ multiclass VOP2_Real_dpp_with_name op, string opName, multiclass VOP2_Real_dpp8_with_name op, string opName, string asmName> { defvar ps = !cast(opName#"_e32"); - if ps.Pfl.HasExtDPP then - def _dpp8#Gen.Suffix : VOP2_DPP8_Gen { -let AsmString = asmName # ps.Pfl.AsmDPP8; - } + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then +def _dpp8#Gen.Suffix : VOP2_DPP8_Gen { + let AsmString = asmName # ps.Pfl.AsmDPP8; +} } //===-- VOP2be --===// @@ -1687,32 +1696,32 @@ multiclass VOP2be_Real_dpp op, string opName, string asmName } } multiclass VOP2be_Real_dpp8 op, string opName, string asmName> { - if !cast(opName#"_e32").Pfl.HasExtDPP then + defvar ps = !cast(opName#"_e32"); + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then { def _dpp8#Gen.Suffix : -VOP2_DPP8_Gen(opName#"_e32"), Gen> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8_Gen { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # !subst(", vcc", "", AsmDPP8); } - if !cast(opName#"_e32").Pfl.HasExtDPP then def _dpp8_w32#Gen.Suffix : -VOP2_DPP8(opName#"_e32")> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8 { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP8); let isAsmParserOnly = 1; let WaveSizePredicate = isWave32; let AssemblerPredicate = Gen.AssemblerPredicate; let DecoderNamespace = Gen.DecoderNamespace; } - if !cast(opName#"_e32").Pfl.HasExtDPP then def _dpp8_w64#Gen.Suffix : -VOP2_DPP8(opName#"_e32")> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8 { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # AsmDPP8; let isAsmParserOnly = 1; let WaveSizePredicate = isWave64; let AssemblerPredicate = Gen.AssemblerPredicate; let DecoderNamespace = Gen.DecoderNamespace; } + } } // We don't want to override separate decoderNamespaces within these @@ -1777,9 +1786,11 @@ multiclass VOP2_Real_NO_DPP_with_name op, string opName, } } -multiclass VOP2_Real_NO_DPP_with_alias op, string alias> { +multiclass VOP2_Real_with_DPP16_with_alias op, string alias> { defm NAME
[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)
https://github.com/ilovepi approved this pull request. LGTM. IMO this is a much nicer way to test a property on `stdin`'s positioning. Lets get a bit more consensus from other maintainers before landing though. https://github.com/llvm/llvm-project/pull/157232 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
https://github.com/svkeerthy edited https://github.com/llvm/llvm-project/pull/158376 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157232 >From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Sun, 7 Sep 2025 17:39:19 + Subject: [PATCH 1/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 17 + llvm/test/MC/COFF/stdin.s | 1 - 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 llvm/test/MC/COFF/stdin.py delete mode 100644 llvm/test/MC/COFF/stdin.s diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py new file mode 100644 index 0..8b7b6ae1fba13 --- /dev/null +++ b/llvm/test/MC/COFF/stdin.py @@ -0,0 +1,17 @@ +# RUN: echo "// comment" > %t.input +# RUN: which llvm-mc | %python %s %t + +import subprocess +import sys + +llvm_mc_binary = sys.stdin.readlines()[0].strip() +temp_file = sys.argv[1] +input_file = temp_file + ".input" + +with open(temp_file, "w") as mc_stdout: +mc_stdout.seek(4) +subprocess.run( +[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +stdout=mc_stdout, +check=True, +) diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s deleted file mode 100644 index 8ceae7fdef501..0 --- a/llvm/test/MC/COFF/stdin.s +++ /dev/null @@ -1 +0,0 @@ -// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s ) > %t' >From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:28:59 + Subject: [PATCH 2/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 8b7b6ae1fba13..1d9b50c022523 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,14 +1,22 @@ # RUN: echo "// comment" > %t.input # RUN: which llvm-mc | %python %s %t +import argparse import subprocess import sys +parser = argparse.ArgumentParser() +parser.add_argument("temp_file") +arguments = parser.parse_args() + llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = sys.argv[1] +temp_file = arguments.temp_file input_file = temp_file + ".input" with open(temp_file, "w") as mc_stdout: +## We need to test that starting on an input stream with a non-zero offset +## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek +## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], >From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:43:39 + Subject: [PATCH 3/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 1d9b50c022523..0da1b4895142b 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,25 +1,30 @@ # RUN: echo "// comment" > %t.input -# RUN: which llvm-mc | %python %s %t +# RUN: which llvm-mc | %python %s %t.input %t import argparse import subprocess import sys parser = argparse.ArgumentParser() +parser.add_argument("input_file") parser.add_argument("temp_file") arguments = parser.parse_args() llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = arguments.temp_file -input_file = temp_file + ".input" -with open(temp_file, "w") as mc_stdout: +with open(arguments.temp_file, "w") as mc_stdout: ## We need to test that starting on an input stream with a non-zero offset ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek ## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( -[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +[ +llvm_mc_binary, +"-filetype=obj", +"-triple", +"i686-pc-win32", +arguments.input_file, +], stdout=mc_stdout, check=True, ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156610 >From 3b73016ad3984069441409516598caf1161c7448 Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 08:36:34 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. --- .../OpenMP/DoConcurrentConversion.cpp | 117 ++ .../DoConcurrent/reduce_device.mlir | 53 2 files changed, 121 insertions(+), 49 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index d00a4fdd2cf2e..6e308499100fa 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, for (mlir::Value local : loop.getLocalVars()) liveIns.push_back(local); + + for (mlir::Value reduce : loop.getReduceVars()) +liveIns.push_back(reduce); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -319,7 +322,7 @@ class DoConcurrentConversion targetOp = genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns, targetClauseOps, loopNestClauseOps, liveInShapeInfoMap); - genTeamsOp(doLoop.getLoc(), rewriter); + genTeamsOp(rewriter, loop, mapper); } mlir::omp::ParallelOp parallelOp = @@ -492,46 +495,7 @@ class DoConcurrentConversion if (!mapToDevice) genPrivatizers(rewriter, mapper, loop, wsloopClauseOps); -if (!loop.getReduceVars().empty()) { - for (auto [op, byRef, sym, arg] : llvm::zip_equal( - loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(), - loop.getReduceSymsAttr().getAsRange(), - loop.getRegionReduceArgs())) { -auto firReducer = moduleSymbolTable.lookup( -sym.getLeafReference()); - -mlir::OpBuilder::InsertionGuard guard(rewriter); -rewriter.setInsertionPointAfter(firReducer); -std::string ompReducerName = sym.getLeafReference().str() + ".omp"; - -auto ompReducer = -moduleSymbolTable.lookup( -rewriter.getStringAttr(ompReducerName)); - -if (!ompReducer) { - ompReducer = mlir::omp::DeclareReductionOp::create( - rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); - - cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), - ompReducer.getAllocRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(), - ompReducer.getInitializerRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(), - ompReducer.getReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(), - ompReducer.getAtomicReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(), - ompReducer.getCleanupRegion()); - moduleSymbolTable.insert(ompReducer); -} - -wsloopClauseOps.reductionVars.push_back(op); -wsloopClauseOps.reductionByref.push_back(byRef); -wsloopClauseOps.reductionSyms.push_back( -mlir::SymbolRefAttr::get(ompReducer)); - } -} +genReductions(rewriter, mapper, loop, wsloopClauseOps); auto wsloopOp = mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps); @@ -553,8 +517,6 @@ class DoConcurrentConversion rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back()); mlir::omp::YieldOp::create(rewriter, loop->getLoc()); -loop->getParentOfType().print( -llvm::errs(), mlir::OpPrintingFlags().assumeVerified()); return {loopNestOp, wsloopOp}; } @@ -778,15 +740,26 @@ class DoConcurrentConversion liveInName, shape); } - mlir::omp::TeamsOp - genTeamsOp(mlir::Location loc, - mlir::ConversionPatternRewriter &rewriter) const { -auto teamsOp = rewriter.create( -loc, /*clauses=*/mlir::omp::TeamsOperands{}); + mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter, +fir::DoConcurrentLoopOp loop, +mlir::IRMapping &mapper) const { +mlir::omp::TeamsOperands teamsOps; +genReductions(rewriter, mapper, loop, teamsOps); + +mlir::Location loc = loop.getLoc(); +aut
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) Changes --- Patch is 22.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159641.diff 5 Files Affected: - (modified) llvm/lib/Target/AMDGPU/VOP2Instructions.td (+45-34) - (modified) llvm/test/CodeGen/AMDGPU/dpp_combine.ll (+5-1) - (added) llvm/test/MC/AMDGPU/gfx1251_asm_vop2_dpp16.s (+74) - (added) llvm/test/MC/AMDGPU/gfx1251_asm_vop2_err.s (+106) - (added) llvm/test/MC/Disassembler/AMDGPU/gfx1251_dasm_vop2_dpp16.txt (+37) ``diff diff --git a/llvm/lib/Target/AMDGPU/VOP2Instructions.td b/llvm/lib/Target/AMDGPU/VOP2Instructions.td index 46a1a4bf1ab4a..37d92bc5076de 100644 --- a/llvm/lib/Target/AMDGPU/VOP2Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP2Instructions.td @@ -287,10 +287,14 @@ multiclass VOP2bInst , Commutable_REV; - let SubtargetPredicate = isGFX11Plus in { -if P.HasExtVOP3DPP then - def _e64_dpp : VOP3_DPP_Pseudo ; - } // End SubtargetPredicate = isGFX11Plus + if P.HasExtVOP3DPP then +def _e64_dpp : VOP3_DPP_Pseudo { + let SubtargetPredicate = isGFX11Plus; +} + else if P.HasExt64BitDPP then +def _e64_dpp : VOP3_DPP_Pseudo { + let OtherPredicates = [HasDPALU_DPP]; + } } } @@ -345,10 +349,14 @@ multiclass VOPD_Component; } -let SubtargetPredicate = isGFX11Plus in { - if P.HasExtVOP3DPP then -def _e64_dpp : VOP3_DPP_Pseudo ; -} // End SubtargetPredicate = isGFX11Plus +if P.HasExtVOP3DPP then + def _e64_dpp : VOP3_DPP_Pseudo { +let SubtargetPredicate = isGFX11Plus; + } +else if P.HasExt64BitDPP then + def _e64_dpp : VOP3_DPP_Pseudo { +let OtherPredicates = [HasDPALU_DPP]; + } } } @@ -1607,8 +1615,9 @@ multiclass VOP2_Real_dpp op> { } multiclass VOP2_Real_dpp8 op> { - if !cast(NAME#"_e32").Pfl.HasExtDPP then - def _dpp8#Gen.Suffix : VOP2_DPP8_Gen(NAME#"_e32"), Gen>; + defvar ps = !cast(NAME#"_e32"); + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then +def _dpp8#Gen.Suffix : VOP2_DPP8_Gen; } //===- VOP2 (with name) -===// @@ -1643,10 +1652,10 @@ multiclass VOP2_Real_dpp_with_name op, string opName, multiclass VOP2_Real_dpp8_with_name op, string opName, string asmName> { defvar ps = !cast(opName#"_e32"); - if ps.Pfl.HasExtDPP then - def _dpp8#Gen.Suffix : VOP2_DPP8_Gen { -let AsmString = asmName # ps.Pfl.AsmDPP8; - } + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then +def _dpp8#Gen.Suffix : VOP2_DPP8_Gen { + let AsmString = asmName # ps.Pfl.AsmDPP8; +} } //===-- VOP2be --===// @@ -1687,32 +1696,32 @@ multiclass VOP2be_Real_dpp op, string opName, string asmName } } multiclass VOP2be_Real_dpp8 op, string opName, string asmName> { - if !cast(opName#"_e32").Pfl.HasExtDPP then + defvar ps = !cast(opName#"_e32"); + if !and(ps.Pfl.HasExtDPP, !not(ps.Pfl.HasExt64BitDPP)) then { def _dpp8#Gen.Suffix : -VOP2_DPP8_Gen(opName#"_e32"), Gen> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8_Gen { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # !subst(", vcc", "", AsmDPP8); } - if !cast(opName#"_e32").Pfl.HasExtDPP then def _dpp8_w32#Gen.Suffix : -VOP2_DPP8(opName#"_e32")> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8 { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # !subst("vcc", "vcc_lo", AsmDPP8); let isAsmParserOnly = 1; let WaveSizePredicate = isWave32; let AssemblerPredicate = Gen.AssemblerPredicate; let DecoderNamespace = Gen.DecoderNamespace; } - if !cast(opName#"_e32").Pfl.HasExtDPP then def _dpp8_w64#Gen.Suffix : -VOP2_DPP8(opName#"_e32")> { - string AsmDPP8 = !cast(opName#"_e32").Pfl.AsmDPP8; +VOP2_DPP8 { + string AsmDPP8 = ps.Pfl.AsmDPP8; let AsmString = asmName # AsmDPP8; let isAsmParserOnly = 1; let WaveSizePredicate = isWave64; let AssemblerPredicate = Gen.AssemblerPredicate; let DecoderNamespace = Gen.DecoderNamespace; } + } } // We don't want to override separate decoderNamespaces within these @@ -1777,9 +1786,11 @@ multiclass VOP2_Real_NO_DPP_with_name op, string opName, } } -multiclass VOP2_Real_NO_DPP_with_alias op, string alias> { +multiclass VOP2_Real_with_DPP16_with_alias op, string alias> { defm NAME : VOP2_Real_e32, - VOP2_Real_e64; + VOP2_Real_dpp, + VOP2_Real_e64, + VOP3_Real_dpp_Base; def Gen.Suffix#"_alias" : AMDGPUMnemonicAlias { let AssemblerPredicate = Gen.AssemblerPredicate; } @@ -1808,6 +1819,9
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: Mircea Trofin (mtrofin) Changes --- Patch is 21.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159645.diff 2 Files Affected: - (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+75-11) - (modified) llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll (+42-30) ``diff diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index a1f759dd1df83..276ca89d715f1 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -84,6 +84,7 @@ #include #include #include +#include #include #include #include @@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, PHINode *&PHI, // Helper function that checks if it is possible to transform a switch with only // two cases (or two cases + default) that produces a result into a select. // TODO: Handle switches with more than 2 cases that map to the same result. +// The branch weights correspond to the provided Condition (i.e. if Condition is +// modified from the original SwitchInst, the caller must adjust the weights) static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Constant *DefaultResult, Value *Condition, - IRBuilder<> &Builder, const DataLayout &DL) { + IRBuilder<> &Builder, const DataLayout &DL, + ArrayRef BranchWeights) { // If we are selecting between only two cases transform into a simple // select or a two-way select if default is possible. // Example: @@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, // case 20: return 2; > %2 = icmp eq i32 %a, 20 // default: return 4; %3 = select i1 %2, i32 2, i32 %1 // } + + const bool HasBranchWeights = + !BranchWeights.empty() && !ProfcheckDisableMetadataFixes; + if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 && ResultVector[1].second.size() == 1) { ConstantInt *FirstCase = ResultVector[0].second[0]; @@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, if (DefaultResult) { Value *ValueCompare = Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp"); - SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first, - DefaultResult, "switch.select"); + SelectInst *SelectValueInst = cast(Builder.CreateSelect( + ValueCompare, ResultVector[1].first, DefaultResult, "switch.select")); + SelectValue = SelectValueInst; + if (HasBranchWeights) { +// We start with 3 probabilities, where the numerator is the +// corresponding BranchWeights[i], and the denominator is the sum over +// BranchWeights. We want the probability and negative probability of +// Condition == SecondCase. +assert(BranchWeights.size() == 3); +setBranchWeights(SelectValueInst, BranchWeights[2], + BranchWeights[0] + BranchWeights[1], + /*IsExpected=*/false); +} } Value *ValueCompare = Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp"); -return Builder.CreateSelect(ValueCompare, ResultVector[0].first, -SelectValue, "switch.select"); +SelectInst *Ret = cast(Builder.CreateSelect( +ValueCompare, ResultVector[0].first, SelectValue, "switch.select")); +if (HasBranchWeights) { + // We may have had a DefaultResult. Base the position of the first and + // second's branch weights accordingly. Also the proability that Condition + // != FirstCase needs to take that into account. + assert(BranchWeights.size() >= 2); + size_t FirstCasePos = (Condition != nullptr); + size_t SecondCasePos = FirstCasePos + 1; + uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0; + setBranchWeights(Ret, BranchWeights[FirstCasePos], + DefaultCase + BranchWeights[SecondCasePos], + /*IsExpected=*/false); +} +return Ret; } // Handle the degenerate case where two cases have the same result value. @@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Value *And = Builder.CreateAnd(Condition, AndMask); Value *Cmp = Builder.CreateICmpEQ( And, Constant::getIntegerValue(And->getType(), AndMask)); - return Builder.CreateSelect(Cmp, ResultVector[0].first, - DefaultResult); + SelectInst *Ret = cast(Builder.CreateSelect(Cmp, ResultVector[0].first, +
[llvm-branch-commits] [mlir] 301f09f - Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…"
Author: MaheshRavishankar Date: 2025-09-18T13:49:24-07:00 New Revision: 301f09f236c1439c9313ebc2dda1193d210ab698 URL: https://github.com/llvm/llvm-project/commit/301f09f236c1439c9313ebc2dda1193d210ab698 DIFF: https://github.com/llvm/llvm-project/commit/301f09f236c1439c9313ebc2dda1193d210ab698.diff LOG: Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…" This reverts commit b8649098a7fcf598406d8d8b7d68891d1444e9c8. Added: Modified: mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.cpp mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td Removed: mlir/test/Interfaces/TilingInterface/tile-using-custom-op.mlir diff --git a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h index 6b05ade37881c..3205da6e448fc 100644 --- a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h +++ b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h @@ -33,14 +33,6 @@ using SCFTileSizeComputationFunction = /// Options to use to control tiling. struct SCFTilingOptions { - /// Specify which loop construct to use for tile and fuse. - enum class LoopType { ForOp, ForallOp, CustomOp }; - LoopType loopType = LoopType::ForOp; - SCFTilingOptions &setLoopType(LoopType type) { -loopType = type; -return *this; - } - /// Computation function that returns the tile sizes to use for each loop. /// Returning a tile size of zero implies no tiling for that loop. If the /// size of the returned vector is smaller than the number of loops, the inner @@ -58,17 +50,6 @@ struct SCFTilingOptions { /// proper interaction with folding. SCFTilingOptions &setTileSizes(ArrayRef tileSizes); - /// The interchange vector to reorder the tiled loops. - SmallVector interchangeVector = {}; - SCFTilingOptions &setInterchange(ArrayRef interchange) { -interchangeVector = llvm::to_vector(interchange); -return *this; - } - - //-// - // Options related to tiling using `scf.forall`. - //-// - /// Computation function that returns the number of threads to use for /// each loop. Returning a num threads of zero implies no tiling for that /// loop. If the size of the returned vector is smaller than the number of @@ -89,6 +70,21 @@ struct SCFTilingOptions { /// function that computes num threads at the point they are needed. SCFTilingOptions &setNumThreads(ArrayRef numThreads); + /// The interchange vector to reorder the tiled loops. + SmallVector interchangeVector = {}; + SCFTilingOptions &setInterchange(ArrayRef interchange) { +interchangeVector = llvm::to_vector(interchange); +return *this; + } + + /// Specify which loop construct to use for tile and fuse. + enum class LoopType { ForOp, ForallOp }; + LoopType loopType = LoopType::ForOp; + SCFTilingOptions &setLoopType(LoopType type) { +loopType = type; +return *this; + } + /// Specify mapping of loops to devices. This is only respected when the loop /// constructs support such a mapping (like `scf.forall`). Will be ignored /// when using loop constructs that dont support such a mapping (like @@ -121,98 +117,6 @@ struct SCFTilingOptions { reductionDims.insert(dims.begin(), dims.end()); return *this; } - - //-// - // Options related to tiling using custom loop. - //-// - - // For generating the inter-tile loops using a custom loop, two callback - // functions are needed - // 1. That generates the "loop header", i.e. the loop that iterates over the - // diff erent tiles. - // 2. That generates the loop terminator - // - // For `scf.forall` case the call back to generate loop header would generate - // - // ```mlir - // scf.forall (...) = ... { - // .. - // } - // ``` - // - // and the call back to generate the loop terminator would generate the - // `scf.in_parallel` region - // - // ```mlir - // scf.forall (...) = ... { - // scf.in_parallel { - // tensor.parallel_insert_slice ... - // } - // } - // ``` - // - - // Information that is to be returned by the callback to generate the loop - // header needed for the rest of the tiled codegeneration. - // - `loops`: The generated loops - // - `tileOffset`: The values that represent the offset of the iteration space - // tile - // - `tileSizes` : The values that represent the size of the iteration space - // tile. - // - `destinationTensor
[llvm-branch-commits] [mlir] 7af3f6e - Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…"
Author: MaheshRavishankar Date: 2025-09-18T09:29:29-07:00 New Revision: 7af3f6e0317e84900e6683ac0ea3dc60b805904e URL: https://github.com/llvm/llvm-project/commit/7af3f6e0317e84900e6683ac0ea3dc60b805904e DIFF: https://github.com/llvm/llvm-project/commit/7af3f6e0317e84900e6683ac0ea3dc60b805904e.diff LOG: Revert "[mlir][SCF] Allow using a custom operation to generate loops with `ml…" This reverts commit b8649098a7fcf598406d8d8b7d68891d1444e9c8. Added: Modified: mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.cpp mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td Removed: mlir/test/Interfaces/TilingInterface/tile-using-custom-op.mlir diff --git a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h index 6b05ade37881c..3205da6e448fc 100644 --- a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h +++ b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h @@ -33,14 +33,6 @@ using SCFTileSizeComputationFunction = /// Options to use to control tiling. struct SCFTilingOptions { - /// Specify which loop construct to use for tile and fuse. - enum class LoopType { ForOp, ForallOp, CustomOp }; - LoopType loopType = LoopType::ForOp; - SCFTilingOptions &setLoopType(LoopType type) { -loopType = type; -return *this; - } - /// Computation function that returns the tile sizes to use for each loop. /// Returning a tile size of zero implies no tiling for that loop. If the /// size of the returned vector is smaller than the number of loops, the inner @@ -58,17 +50,6 @@ struct SCFTilingOptions { /// proper interaction with folding. SCFTilingOptions &setTileSizes(ArrayRef tileSizes); - /// The interchange vector to reorder the tiled loops. - SmallVector interchangeVector = {}; - SCFTilingOptions &setInterchange(ArrayRef interchange) { -interchangeVector = llvm::to_vector(interchange); -return *this; - } - - //-// - // Options related to tiling using `scf.forall`. - //-// - /// Computation function that returns the number of threads to use for /// each loop. Returning a num threads of zero implies no tiling for that /// loop. If the size of the returned vector is smaller than the number of @@ -89,6 +70,21 @@ struct SCFTilingOptions { /// function that computes num threads at the point they are needed. SCFTilingOptions &setNumThreads(ArrayRef numThreads); + /// The interchange vector to reorder the tiled loops. + SmallVector interchangeVector = {}; + SCFTilingOptions &setInterchange(ArrayRef interchange) { +interchangeVector = llvm::to_vector(interchange); +return *this; + } + + /// Specify which loop construct to use for tile and fuse. + enum class LoopType { ForOp, ForallOp }; + LoopType loopType = LoopType::ForOp; + SCFTilingOptions &setLoopType(LoopType type) { +loopType = type; +return *this; + } + /// Specify mapping of loops to devices. This is only respected when the loop /// constructs support such a mapping (like `scf.forall`). Will be ignored /// when using loop constructs that dont support such a mapping (like @@ -121,98 +117,6 @@ struct SCFTilingOptions { reductionDims.insert(dims.begin(), dims.end()); return *this; } - - //-// - // Options related to tiling using custom loop. - //-// - - // For generating the inter-tile loops using a custom loop, two callback - // functions are needed - // 1. That generates the "loop header", i.e. the loop that iterates over the - // diff erent tiles. - // 2. That generates the loop terminator - // - // For `scf.forall` case the call back to generate loop header would generate - // - // ```mlir - // scf.forall (...) = ... { - // .. - // } - // ``` - // - // and the call back to generate the loop terminator would generate the - // `scf.in_parallel` region - // - // ```mlir - // scf.forall (...) = ... { - // scf.in_parallel { - // tensor.parallel_insert_slice ... - // } - // } - // ``` - // - - // Information that is to be returned by the callback to generate the loop - // header needed for the rest of the tiled codegeneration. - // - `loops`: The generated loops - // - `tileOffset`: The values that represent the offset of the iteration space - // tile - // - `tileSizes` : The values that represent the size of the iteration space - // tile. - // - `destinationTensor
[llvm-branch-commits] [llvm] [AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (PR #150937)
https://github.com/vpykhtin updated https://github.com/llvm/llvm-project/pull/150937 >From ae3589e2c93351349cd1bbb5586c2dfcb075ea68 Mon Sep 17 00:00:00 2001 From: Valery Pykhtin Date: Thu, 10 Apr 2025 11:58:13 + Subject: [PATCH] amdgpu_use_ssaupdaterbulk_in_structurizecfg --- llvm/lib/Transforms/Scalar/StructurizeCFG.cpp | 25 +++ 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp index 2ee91a9b40026..0f3978f56045e 100644 --- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp +++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp @@ -47,6 +47,7 @@ #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/SSAUpdater.h" +#include "llvm/Transforms/Utils/SSAUpdaterBulk.h" #include #include @@ -321,7 +322,7 @@ class StructurizeCFG { void collectInfos(); - void insertConditions(bool Loops); + void insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter); void simplifyConditions(); @@ -671,10 +672,9 @@ void StructurizeCFG::collectInfos() { } /// Insert the missing branch conditions -void StructurizeCFG::insertConditions(bool Loops) { +void StructurizeCFG::insertConditions(bool Loops, SSAUpdaterBulk &PhiInserter) { BranchVector &Conds = Loops ? LoopConds : Conditions; Value *Default = Loops ? BoolTrue : BoolFalse; - SSAUpdater PhiInserter; for (BranchInst *Term : Conds) { assert(Term->isConditional()); @@ -683,8 +683,9 @@ void StructurizeCFG::insertConditions(bool Loops) { BasicBlock *SuccTrue = Term->getSuccessor(0); BasicBlock *SuccFalse = Term->getSuccessor(1); -PhiInserter.Initialize(Boolean, ""); -PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default); +unsigned Variable = PhiInserter.AddVariable("", Boolean); +PhiInserter.AddAvailableValue(Variable, Loops ? SuccFalse : Parent, + Default); BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue]; @@ -697,7 +698,7 @@ void StructurizeCFG::insertConditions(bool Loops) { ParentInfo = PI; break; } - PhiInserter.AddAvailableValue(BB, PI.Pred); + PhiInserter.AddAvailableValue(Variable, BB, PI.Pred); Dominator.addAndRememberBlock(BB); } @@ -706,9 +707,9 @@ void StructurizeCFG::insertConditions(bool Loops) { CondBranchWeights::setMetadata(*Term, ParentInfo.Weights); } else { if (!Dominator.resultIsRememberedBlock()) -PhiInserter.AddAvailableValue(Dominator.result(), Default); +PhiInserter.AddAvailableValue(Variable, Dominator.result(), Default); - Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent)); + PhiInserter.AddUse(Variable, &Term->getOperandUse(0)); } } } @@ -1414,8 +1415,12 @@ bool StructurizeCFG::run(Region *R, DominatorTree *DT, orderNodes(); collectInfos(); createFlow(); - insertConditions(false); - insertConditions(true); + + SSAUpdaterBulk PhiInserter; + insertConditions(false, PhiInserter); + insertConditions(true, PhiInserter); + PhiInserter.RewriteAndOptimizeAllUses(*DT); + setPhiValues(); simplifyHoistedPhis(); simplifyConditions(); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload] `olGetMemInfo` (PR #157651)
https://github.com/RossBrunton converted_to_draft https://github.com/llvm/llvm-project/pull/157651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DirectX] Validating Root flags are denying shader stage (PR #153287)
https://github.com/joaosaffran updated https://github.com/llvm/llvm-project/pull/153287 >From b1e34ff07fffe96fec438b87027bd2c450b6b36f Mon Sep 17 00:00:00 2001 From: Joao Saffran <{ID}+{username}@users.noreply.github.com> Date: Tue, 12 Aug 2025 13:07:42 -0700 Subject: [PATCH 01/24] adding validaiton and tests --- .../DXILPostOptimizationValidation.cpp| 95 ++- .../rootsignature-validation-deny-shader.ll | 16 ...re-validation-fail-deny-multiple-shader.ll | 17 ...ture-validation-fail-deny-single-shader.ll | 17 4 files changed, 122 insertions(+), 23 deletions(-) create mode 100644 llvm/test/CodeGen/DirectX/rootsignature-validation-deny-shader.ll create mode 100644 llvm/test/CodeGen/DirectX/rootsignature-validation-fail-deny-multiple-shader.ll create mode 100644 llvm/test/CodeGen/DirectX/rootsignature-validation-fail-deny-single-shader.ll diff --git a/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp b/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp index 3721b5f539b8c..251f4a0daf43a 100644 --- a/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp +++ b/llvm/lib/Target/DirectX/DXILPostOptimizationValidation.cpp @@ -21,6 +21,7 @@ #include "llvm/InitializePasses.h" #include "llvm/MC/DXContainerRootSignature.h" #include "llvm/Support/DXILABI.h" +#include "llvm/TargetParser/Triple.h" #include #define DEBUG_TYPE "dxil-post-optimization-validation" @@ -169,15 +170,16 @@ reportDescriptorTableMixingTypes(Module &M, uint32_t Location, M.getContext().diagnose(DiagnosticInfoGeneric(Message)); } -static void reportOverlowingRange(Module &M, const dxbc::RTS0::v2::DescriptorRange &Range) { +static void +reportOverlowingRange(Module &M, const dxbc::RTS0::v2::DescriptorRange &Range) { SmallString<128> Message; raw_svector_ostream OS(Message); - OS << "Cannot append range with implicit lower " - << "bound after an unbounded range " - << getResourceClassName(toResourceClass(static_cast(Range.RangeType))) - << "(register=" << Range.BaseShaderRegister << ", space=" << - Range.RegisterSpace - << ") exceeds maximum allowed value."; + OS << "Cannot append range with implicit lower " + << "bound after an unbounded range " + << getResourceClassName(toResourceClass( +static_cast(Range.RangeType))) + << "(register=" << Range.BaseShaderRegister + << ", space=" << Range.RegisterSpace << ") exceeds maximum allowed value."; M.getContext().diagnose(DiagnosticInfoGeneric(Message)); } @@ -262,12 +264,57 @@ getRootDescriptorsBindingInfo(const mcdxbc::RootSignatureDesc &RSD, return RDs; } +static void reportIfDeniedShaderStageAccess(Module &M, dxbc::RootFlags Flags, +dxbc::RootFlags Mask) { + if ((Flags & Mask) == Mask) { +SmallString<128> Message; +raw_svector_ostream OS(Message); +OS << "Shader has root bindings but root signature uses a DENY flag to " + "disallow root binding access to the shader stage."; +M.getContext().diagnose(DiagnosticInfoGeneric(Message)); + } +} + +static void validateRootFlags(Module &M, const mcdxbc::RootSignatureDesc &RSD, + const dxil::ModuleMetadataInfo &MMI) { + dxbc::RootFlags Flags = dxbc::RootFlags(RSD.Flags); + switch (MMI.ShaderProfile) { + case Triple::Pixel: +reportIfDeniedShaderStageAccess(M, Flags, + dxbc::RootFlags::DenyPixelShaderRootAccess); +break; + case Triple::Vertex: +reportIfDeniedShaderStageAccess( +M, Flags, dxbc::RootFlags::DenyVertexShaderRootAccess); +break; + case Triple::Geometry: +reportIfDeniedShaderStageAccess( +M, Flags, dxbc::RootFlags::DenyGeometryShaderRootAccess); +break; + case Triple::Hull: +reportIfDeniedShaderStageAccess(M, Flags, +dxbc::RootFlags::DenyHullShaderRootAccess); +break; + case Triple::Domain: +reportIfDeniedShaderStageAccess( +M, Flags, dxbc::RootFlags::DenyDomainShaderRootAccess); +break; + case Triple::Mesh: +reportIfDeniedShaderStageAccess(M, Flags, +dxbc::RootFlags::DenyMeshShaderRootAccess); +break; + case Triple::Amplification: +reportIfDeniedShaderStageAccess( +M, Flags, dxbc::RootFlags::DenyAmplificationShaderRootAccess); +break; + default: +break; + } +} static void validateDescriptorTables(Module &M, - const mcdxbc::RootSignatureDesc &RSD, - dxil::ModuleMetadataInfo &MMI, - DXILResourceMap &DRM) { + const mcdxbc::RootSignatureDesc &RSD) { for (const mcdxbc::RootParameterInfo &ParamInfo : RSD.ParametersContainer) { if (static_cast(ParamInfo.Header.ParameterType) != dxbc::RootParameterType::DescriptorTable) @@ -2
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
@@ -47,74 +47,61 @@ static func::FuncOp getOrDeclare(fir::FirOpBuilder &builder, Location loc, return func; } -static bool isZero(Value v) { - if (auto cst = v.getDefiningOp()) -if (auto attr = dyn_cast(cst.getValue())) - return attr.getValue().isZero(); - return false; -} - void ConvertComplexPowPass::runOnOperation() { ModuleOp mod = getOperation(); fir::FirOpBuilder builder(mod, fir::getKindMapping(mod)); - mod.walk([&](complex::PowOp op) { + mod.walk([&](complex::PowiOp op) { builder.setInsertionPoint(op); Location loc = op.getLoc(); auto complexTy = cast(op.getType()); auto elemTy = complexTy.getElementType(); - Value base = op.getLhs(); -Value rhs = op.getRhs(); - -Value intExp; -if (auto create = rhs.getDefiningOp()) { - if (isZero(create.getImaginary())) { -if (auto conv = create.getReal().getDefiningOp()) { - if (auto intTy = dyn_cast(conv.getValue().getType())) -intExp = conv.getValue(); -} - } -} - +Value intExp = op.getRhs(); func::FuncOp callee; -SmallVector args; -if (intExp) { - unsigned realBits = cast(elemTy).getWidth(); - unsigned intBits = cast(intExp.getType()).getWidth(); - auto funcTy = builder.getFunctionType( - {complexTy, builder.getIntegerType(intBits)}, {complexTy}); - if (realBits == 32 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); - else if (realBits == 32 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); - else if (realBits == 64 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); - else if (realBits == 64 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); - else if (realBits == 128 && intBits == 32) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); - else if (realBits == 128 && intBits == 64) -callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); - else -return; - args = {base, intExp}; -} else { - unsigned realBits = cast(elemTy).getWidth(); - auto funcTy = - builder.getFunctionType({complexTy, complexTy}, {complexTy}); - if (realBits == 32) -callee = getOrDeclare(builder, loc, "cpowf", funcTy); - else if (realBits == 64) -callee = getOrDeclare(builder, loc, "cpow", funcTy); - else if (realBits == 128) -callee = getOrDeclare(builder, loc, RTNAME_STRING(CPowF128), funcTy); - else -return; - args = {base, rhs}; -} +unsigned realBits = cast(elemTy).getWidth(); +unsigned intBits = cast(intExp.getType()).getWidth(); +auto funcTy = builder.getFunctionType( +{complexTy, builder.getIntegerType(intBits)}, {complexTy}); +if (realBits == 32 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowi), funcTy); +else if (realBits == 32 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cpowk), funcTy); +else if (realBits == 64 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowi), funcTy); +else if (realBits == 64 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(zpowk), funcTy); +else if (realBits == 128 && intBits == 32) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowi), funcTy); +else if (realBits == 128 && intBits == 64) + callee = getOrDeclare(builder, loc, RTNAME_STRING(cqpowk), funcTy); +else + return; +auto call = fir::CallOp::create(builder, loc, callee, {base, intExp}); +if (auto fmf = op.getFastmathAttr()) + call.setFastmathAttr(fmf); +op.replaceAllUsesWith(call.getResult(0)); +op.erase(); + }); -auto call = fir::CallOp::create(builder, loc, callee, args); + mod.walk([&](complex::PowOp op) { joker-eph wrote: We should not walk multiple times if we can do it in a single traversal, can you replace this with a walk on Operation* and dispatch inside the walk? https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AllocToken, Clang] Implement TypeHashPointerSplit mode (PR #156840)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156840 >From 14c75441e84aa32e4f5876598b9a2c59d4ecbe65 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 21:32:21 +0200 Subject: [PATCH 1/2] fixup! fix for incomplete types Created using spr 1.3.8-beta.1 --- clang/lib/CodeGen/CGExpr.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 288b41bc42203..455de644daf00 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -1289,6 +1289,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, // Check if QualType contains a pointer. Implements a simple DFS to // recursively check if a type contains a pointer type. llvm::SmallPtrSet VisitedRD; + bool IncompleteType = false; auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool { QualType CanonicalType = T.getCanonicalType(); if (CanonicalType->isPointerType()) @@ -1312,6 +1313,10 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, return self(self, AT->getElementType()); // The type is a struct, class, or union. if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) { + if (!RD->isCompleteDefinition()) { +IncompleteType = true; +return false; + } if (!VisitedRD.insert(RD).second) return false; // already visited // Check all fields. @@ -1333,6 +1338,8 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, return false; }; const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType); + if (!ContainsPtr && IncompleteType) +return nullptr; auto *ContainsPtrC = Builder.getInt1(ContainsPtr); auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC); >From 7f706618ddc40375d4085bc2ebe03f02ec78823a Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 8 Sep 2025 21:58:01 +0200 Subject: [PATCH 2/2] fixup! Created using spr 1.3.8-beta.1 --- clang/lib/CodeGen/CGExpr.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 455de644daf00..e7a0e7696e204 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -1339,7 +1339,7 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, }; const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType); if (!ContainsPtr && IncompleteType) -return nullptr; +return; auto *ContainsPtrC = Builder.getInt1(ContainsPtr); auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
https://github.com/tobias-stadler updated https://github.com/llvm/llvm-project/pull/156715 >From d33b31f01aeeb9005581b0a2a1f21c898463aa02 Mon Sep 17 00:00:00 2001 From: Tobias Stadler Date: Thu, 18 Sep 2025 12:34:55 +0100 Subject: [PATCH 1/3] Replace bitstream blobs by yaml Created using spr 1.3.7-wip --- llvm/lib/Remarks/BitstreamRemarkParser.cpp| 5 +- .../dsymutil/ARM/remarks-linking-bundle.test | 13 +- .../basic1.macho.remarks.arm64.opt.bitstream | Bin 824 -> 0 bytes .../basic1.macho.remarks.arm64.opt.yaml | 47 + ...c1.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic2.macho.remarks.arm64.opt.bitstream | Bin 1696 -> 0 bytes .../basic2.macho.remarks.arm64.opt.yaml | 194 ++ ...c2.macho.remarks.empty.arm64.opt.bitstream | 0 .../basic3.macho.remarks.arm64.opt.bitstream | Bin 1500 -> 0 bytes .../basic3.macho.remarks.arm64.opt.yaml | 181 ...c3.macho.remarks.empty.arm64.opt.bitstream | 0 .../fat.macho.remarks.x86_64.opt.bitstream| Bin 820 -> 0 bytes .../remarks/fat.macho.remarks.x86_64.opt.yaml | 53 + .../fat.macho.remarks.x86_64h.opt.bitstream | Bin 820 -> 0 bytes .../fat.macho.remarks.x86_64h.opt.yaml| 53 + .../X86/remarks-linking-fat-bundle.test | 8 +- 16 files changed, 543 insertions(+), 11 deletions(-) delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic1.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic2.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/basic3.macho.remarks.empty.arm64.opt.bitstream delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64.opt.yaml delete mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.bitstream create mode 100644 llvm/test/tools/dsymutil/Inputs/private/tmp/remarks/fat.macho.remarks.x86_64h.opt.yaml diff --git a/llvm/lib/Remarks/BitstreamRemarkParser.cpp b/llvm/lib/Remarks/BitstreamRemarkParser.cpp index 63b16bd2df0ec..2b27a0f661d88 100644 --- a/llvm/lib/Remarks/BitstreamRemarkParser.cpp +++ b/llvm/lib/Remarks/BitstreamRemarkParser.cpp @@ -411,9 +411,8 @@ Error BitstreamRemarkParser::processExternalFilePath() { return E; if (ContainerType != BitstreamRemarkContainerType::RemarksFile) -return error( -"Error while parsing external file's BLOCK_META: wrong container " -"type."); +return ParserHelper->MetaHelper.error( +"Wrong container type in external file."); return Error::success(); } diff --git a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test index 09a60d7d044c6..e1b04455b0d9d 100644 --- a/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test +++ b/llvm/test/tools/dsymutil/ARM/remarks-linking-bundle.test @@ -1,22 +1,25 @@ RUN: rm -rf %t -RUN: mkdir -p %t +RUN: mkdir -p %t/private/tmp/remarks RUN: cat %p/../Inputs/remarks/basic.macho.remarks.arm64> %t/basic.macho.remarks.arm64 +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic1.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic1.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic2.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic2.macho.remarks.arm64.opt.bitstream +RUN: llvm-remarkutil yaml2bitstream %p/../Inputs/private/tmp/remarks/basic3.macho.remarks.arm64.opt.yaml -o %t/private/tmp/remarks/basic3.macho.remarks.arm64.opt.bitstream -RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.remarks.arm64 +RUN: dsymutil -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%t %t/basic.macho.remarks.arm64 Check that the remark file in the bundle exists and is sane: RUN: llvm-bcanalyzer -dump %t/basic.macho.remarks.arm64.dSYM/Contents/Resources/Remarks/basic.macho.remarks.arm64 | FileCheck %s -RUN: dsymutil --linker parallel -oso-prepend-path=%p/../Inputs -remarks-prepend-path=%p/../Inputs %t/basic.macho.r
[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/159632 Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some semantics-checking functions needed to be updated to operate on a single object at a time. >From 7bb9fb5b3b9a2dfcd1d00f01c86fe26c5d14c30f Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Thu, 18 Sep 2025 08:49:38 -0500 Subject: [PATCH] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some semantics-checking functions needed to be updated to operate on a single object at a time. --- flang/include/flang/Parser/openmp-utils.h| 4 +- flang/include/flang/Parser/parse-tree.h | 3 +- flang/include/flang/Semantics/openmp-utils.h | 3 +- flang/lib/Parser/openmp-parsers.cpp | 7 +- flang/lib/Parser/unparse.cpp | 7 +- flang/lib/Semantics/check-omp-structure.cpp | 89 +++- flang/lib/Semantics/check-omp-structure.h| 3 + flang/lib/Semantics/openmp-utils.cpp | 22 +++-- flang/lib/Semantics/resolve-directives.cpp | 11 ++- 9 files changed, 86 insertions(+), 63 deletions(-) diff --git a/flang/include/flang/Parser/openmp-utils.h b/flang/include/flang/Parser/openmp-utils.h index 032fb8996fe48..1372945427955 100644 --- a/flang/include/flang/Parser/openmp-utils.h +++ b/flang/include/flang/Parser/openmp-utils.h @@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, D::OMPD_declare_simd); MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target); MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate); MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires); -MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate); #undef MAKE_CONSTR_ID @@ -111,8 +110,7 @@ struct DirectiveNameScope { std::is_same_v || std::is_same_v || std::is_same_v || - std::is_same_v || - std::is_same_v) { + std::is_same_v) { return MakeName(std::get(x.t).source, ConstructId::id); } else { return GetFromTuple( diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h index 09a45476420df..8cb6d2e744876 100644 --- a/flang/include/flang/Parser/parse-tree.h +++ b/flang/include/flang/Parser/parse-tree.h @@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct { // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list) struct OpenMPThreadprivate { - TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate); + WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification); CharBlock source; - std::tuple t; }; // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause] diff --git a/flang/include/flang/Semantics/openmp-utils.h b/flang/include/flang/Semantics/openmp-utils.h index 68318d6093a1e..65441728c5549 100644 --- a/flang/include/flang/Semantics/openmp-utils.h +++ b/flang/include/flang/Semantics/openmp-utils.h @@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const parser::OmpObject &object); const parser::ArrayElement *GetArrayElementFromObj( const parser::OmpObject &object); const Symbol *GetObjectSymbol(const parser::OmpObject &object); -const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); std::optional GetObjectSource( const parser::OmpObject &object); +const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); +const parser::OmpObject *GetArgumentObject(const parser::OmpArgument &argument); bool IsCommonBlock(const Symbol &sym); bool IsExtendedListItem(const Symbol &sym); diff --git a/flang/lib/Parser/openmp-parsers.cpp b/flang/lib/Parser/openmp-parsers.cpp index 66526ba00b5ed..60ce71cf983f6 100644 --- a/flang/lib/Parser/openmp-parsers.cpp +++ b/flang/lib/Parser/openmp-parsers.cpp @@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct( verbatim("REQUIRES"_tok), Parser{}))) // 2.15.2 Threadprivate directive -TYPE_PARSER(sourced(construct( -verbatim("THREADPRIVATE"_tok), parenthesized(Parser{} +TYPE_PARSER(sourced( // +construct( +predicated(OmpDirectiveNameParser{}, +IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >= +Parser{}))) // 2.11.3 Declarative Allocate directive TYPE_PARSER( diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp index 189a34ee1dc56..db46525ac57b1 100644 --- a/flang/lib/Parser/unparse.cpp +++ b/flang/lib/Parser/unparse.cpp @@ -2611,12 +2611,11 @@ class UnparseVisitor { } void Unparse(const OpenMPThreadprivate &x) { BeginOpenMP(); -Word("!$OMP THREADPRIVATE ("); -Walk(std::get(x.t)); -Put(")\n"); +Word("!$OMP "); +Walk(x.v); +Put("\n"); EndOpenMP(); } - bool Pre(const OmpMessageClause &x) { Walk(x.v); return false; diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp index 1ee5385fb38a1..507957df
[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in THREADPRIVATE (PR #159632)
llvmbot wrote: @llvm/pr-subscribers-flang-semantics Author: Krzysztof Parzyszek (kparzysz) Changes Since ODS doesn't store a list of OmpObjects (i.e. not as OmpObjectList), some semantics-checking functions needed to be updated to operate on a single object at a time. --- Full diff: https://github.com/llvm/llvm-project/pull/159632.diff 9 Files Affected: - (modified) flang/include/flang/Parser/openmp-utils.h (+1-3) - (modified) flang/include/flang/Parser/parse-tree.h (+1-2) - (modified) flang/include/flang/Semantics/openmp-utils.h (+2-1) - (modified) flang/lib/Parser/openmp-parsers.cpp (+5-2) - (modified) flang/lib/Parser/unparse.cpp (+3-4) - (modified) flang/lib/Semantics/check-omp-structure.cpp (+48-41) - (modified) flang/lib/Semantics/check-omp-structure.h (+3) - (modified) flang/lib/Semantics/openmp-utils.cpp (+15-7) - (modified) flang/lib/Semantics/resolve-directives.cpp (+8-3) ``diff diff --git a/flang/include/flang/Parser/openmp-utils.h b/flang/include/flang/Parser/openmp-utils.h index 032fb8996fe48..1372945427955 100644 --- a/flang/include/flang/Parser/openmp-utils.h +++ b/flang/include/flang/Parser/openmp-utils.h @@ -49,7 +49,6 @@ MAKE_CONSTR_ID(OpenMPDeclareSimdConstruct, D::OMPD_declare_simd); MAKE_CONSTR_ID(OpenMPDeclareTargetConstruct, D::OMPD_declare_target); MAKE_CONSTR_ID(OpenMPExecutableAllocate, D::OMPD_allocate); MAKE_CONSTR_ID(OpenMPRequiresConstruct, D::OMPD_requires); -MAKE_CONSTR_ID(OpenMPThreadprivate, D::OMPD_threadprivate); #undef MAKE_CONSTR_ID @@ -111,8 +110,7 @@ struct DirectiveNameScope { std::is_same_v || std::is_same_v || std::is_same_v || - std::is_same_v || - std::is_same_v) { + std::is_same_v) { return MakeName(std::get(x.t).source, ConstructId::id); } else { return GetFromTuple( diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h index 09a45476420df..8cb6d2e744876 100644 --- a/flang/include/flang/Parser/parse-tree.h +++ b/flang/include/flang/Parser/parse-tree.h @@ -5001,9 +5001,8 @@ struct OpenMPRequiresConstruct { // 2.15.2 threadprivate -> THREADPRIVATE (variable-name-list) struct OpenMPThreadprivate { - TUPLE_CLASS_BOILERPLATE(OpenMPThreadprivate); + WRAPPER_CLASS_BOILERPLATE(OpenMPThreadprivate, OmpDirectiveSpecification); CharBlock source; - std::tuple t; }; // 2.11.3 allocate -> ALLOCATE (variable-name-list) [clause] diff --git a/flang/include/flang/Semantics/openmp-utils.h b/flang/include/flang/Semantics/openmp-utils.h index 68318d6093a1e..65441728c5549 100644 --- a/flang/include/flang/Semantics/openmp-utils.h +++ b/flang/include/flang/Semantics/openmp-utils.h @@ -58,9 +58,10 @@ const parser::DataRef *GetDataRefFromObj(const parser::OmpObject &object); const parser::ArrayElement *GetArrayElementFromObj( const parser::OmpObject &object); const Symbol *GetObjectSymbol(const parser::OmpObject &object); -const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); std::optional GetObjectSource( const parser::OmpObject &object); +const Symbol *GetArgumentSymbol(const parser::OmpArgument &argument); +const parser::OmpObject *GetArgumentObject(const parser::OmpArgument &argument); bool IsCommonBlock(const Symbol &sym); bool IsExtendedListItem(const Symbol &sym); diff --git a/flang/lib/Parser/openmp-parsers.cpp b/flang/lib/Parser/openmp-parsers.cpp index 66526ba00b5ed..60ce71cf983f6 100644 --- a/flang/lib/Parser/openmp-parsers.cpp +++ b/flang/lib/Parser/openmp-parsers.cpp @@ -1791,8 +1791,11 @@ TYPE_PARSER(sourced(construct( verbatim("REQUIRES"_tok), Parser{}))) // 2.15.2 Threadprivate directive -TYPE_PARSER(sourced(construct( -verbatim("THREADPRIVATE"_tok), parenthesized(Parser{} +TYPE_PARSER(sourced( // +construct( +predicated(OmpDirectiveNameParser{}, +IsDirective(llvm::omp::Directive::OMPD_threadprivate)) >= +Parser{}))) // 2.11.3 Declarative Allocate directive TYPE_PARSER( diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp index 189a34ee1dc56..db46525ac57b1 100644 --- a/flang/lib/Parser/unparse.cpp +++ b/flang/lib/Parser/unparse.cpp @@ -2611,12 +2611,11 @@ class UnparseVisitor { } void Unparse(const OpenMPThreadprivate &x) { BeginOpenMP(); -Word("!$OMP THREADPRIVATE ("); -Walk(std::get(x.t)); -Put(")\n"); +Word("!$OMP "); +Walk(x.v); +Put("\n"); EndOpenMP(); } - bool Pre(const OmpMessageClause &x) { Walk(x.v); return false; diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp index 1ee5385fb38a1..507957dfecb3d 100644 --- a/flang/lib/Semantics/check-omp-structure.cpp +++ b/flang/lib/Semantics/check-omp-structure.cpp @@ -669,11 +669,6 @@ template struct DirectiveSpellingVisitor { checker_(x.v.DirName().source, Directive::OMPD_groupprivate); return false; } - bool
[llvm-branch-commits] [llvm] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo (PR #158224)
@@ -1070,8 +1070,8 @@ void InstrInfoEmitter::run(raw_ostream &OS) { OS << "namespace llvm {\n"; OS << "struct " << ClassName << " : public TargetInstrInfo {\n" << " explicit " << ClassName - << "(const TargetSubtargetInfo &STI, unsigned CFSetupOpcode = ~0u, " -"unsigned CFDestroyOpcode = ~0u, " + << "(const TargetSubtargetInfo &STI, const TargetRegisterInfo &TRI, " arsenm wrote: The other option I considered was having unique_ptr in the generic base class https://github.com/llvm/llvm-project/pull/158224 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#159641** https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159641?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#159637** https://app.graphite.dev/github/pr/llvm/llvm-project/159637?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/159641 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
https://github.com/mtrofin created https://github.com/llvm/llvm-project/pull/159645 None >From 92728fa5d41bd5f6ef63837bcb3ea8e85b7a8764 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Mon, 15 Sep 2025 17:49:18 + Subject: [PATCH] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 86 --- .../SimplifyCFG/switch-to-select-two-case.ll | 72 +--- 2 files changed, 117 insertions(+), 41 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index a1f759dd1df83..276ca89d715f1 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -84,6 +84,7 @@ #include #include #include +#include #include #include #include @@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, PHINode *&PHI, // Helper function that checks if it is possible to transform a switch with only // two cases (or two cases + default) that produces a result into a select. // TODO: Handle switches with more than 2 cases that map to the same result. +// The branch weights correspond to the provided Condition (i.e. if Condition is +// modified from the original SwitchInst, the caller must adjust the weights) static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Constant *DefaultResult, Value *Condition, - IRBuilder<> &Builder, const DataLayout &DL) { + IRBuilder<> &Builder, const DataLayout &DL, + ArrayRef BranchWeights) { // If we are selecting between only two cases transform into a simple // select or a two-way select if default is possible. // Example: @@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, // case 20: return 2; > %2 = icmp eq i32 %a, 20 // default: return 4; %3 = select i1 %2, i32 2, i32 %1 // } + + const bool HasBranchWeights = + !BranchWeights.empty() && !ProfcheckDisableMetadataFixes; + if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 && ResultVector[1].second.size() == 1) { ConstantInt *FirstCase = ResultVector[0].second[0]; @@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, if (DefaultResult) { Value *ValueCompare = Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp"); - SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first, - DefaultResult, "switch.select"); + SelectInst *SelectValueInst = cast(Builder.CreateSelect( + ValueCompare, ResultVector[1].first, DefaultResult, "switch.select")); + SelectValue = SelectValueInst; + if (HasBranchWeights) { +// We start with 3 probabilities, where the numerator is the +// corresponding BranchWeights[i], and the denominator is the sum over +// BranchWeights. We want the probability and negative probability of +// Condition == SecondCase. +assert(BranchWeights.size() == 3); +setBranchWeights(SelectValueInst, BranchWeights[2], + BranchWeights[0] + BranchWeights[1], + /*IsExpected=*/false); +} } Value *ValueCompare = Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp"); -return Builder.CreateSelect(ValueCompare, ResultVector[0].first, -SelectValue, "switch.select"); +SelectInst *Ret = cast(Builder.CreateSelect( +ValueCompare, ResultVector[0].first, SelectValue, "switch.select")); +if (HasBranchWeights) { + // We may have had a DefaultResult. Base the position of the first and + // second's branch weights accordingly. Also the proability that Condition + // != FirstCase needs to take that into account. + assert(BranchWeights.size() >= 2); + size_t FirstCasePos = (Condition != nullptr); + size_t SecondCasePos = FirstCasePos + 1; + uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0; + setBranchWeights(Ret, BranchWeights[FirstCasePos], + DefaultCase + BranchWeights[SecondCasePos], + /*IsExpected=*/false); +} +return Ret; } // Handle the degenerate case where two cases have the same result value. @@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Value *And = Builder.CreateAnd(Condition, AndMask); Value *Cmp = Builder.CreateICmpEQ( And, Constant::getIntegerValue(And->getType(), AndMask)); - return Builder.CreateSelect(Cmp, ResultVector[0].first, - DefaultResult); +
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/159645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
@@ -1,5 +1,5 @@ -; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt < %s -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S | FileCheck %s +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals +; RUN: opt < %s -passes=prof-inject,simplifycfg -profcheck-weights-for-test -simplifycfg-require-and-preserve-domtree=1 -S | FileCheck %s mtrofin wrote: Note: this test is perfect in that it covers all the cases in the change (verified with some appropriately - placed `dbgs()`). To avoid cumbersomely adding `!prof` everywhere, we're using the feature introduced in the previous patch. https://github.com/llvm/llvm-project/pull/159645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Transforms/Utils/SimplifyCFG.cpp `` :warning: The reproduction instructions above might return results for more than one PR in a stack if you are using a stacked PR workflow. You can limit the results by changing `origin/main` to the base branch/commit you want to compare against. :warning: View the diff from clang-format here. ``diff diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 276ca89d7..f775991b5 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -6357,7 +6357,7 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, setBranchWeights(SelectValueInst, BranchWeights[2], BranchWeights[0] + BranchWeights[1], /*IsExpected=*/false); -} + } } Value *ValueCompare = Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp"); @@ -6411,8 +6411,8 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Value *And = Builder.CreateAnd(Condition, AndMask); Value *Cmp = Builder.CreateICmpEQ( And, Constant::getIntegerValue(And->getType(), AndMask)); - SelectInst *Ret = cast(Builder.CreateSelect(Cmp, ResultVector[0].first, - DefaultResult)); + SelectInst *Ret = cast( + Builder.CreateSelect(Cmp, ResultVector[0].first, DefaultResult)); if (HasBranchWeights) { // We know there's a Default case. We base the resulting branch // weights off its probability. `` https://github.com/llvm/llvm-project/pull/159645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
https://github.com/mtrofin ready_for_review https://github.com/llvm/llvm-project/pull/159645 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (PR #159645)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/159645 >From 6d3342f397d39e366a06eb6bcabddec0b3d5a963 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Mon, 15 Sep 2025 17:49:18 + Subject: [PATCH] [profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 86 --- .../SimplifyCFG/switch-to-select-two-case.ll | 72 +--- 2 files changed, 117 insertions(+), 41 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index a1f759dd1df83..f775991b5ba41 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -84,6 +84,7 @@ #include #include #include +#include #include #include #include @@ -6318,9 +6319,12 @@ static bool initializeUniqueCases(SwitchInst *SI, PHINode *&PHI, // Helper function that checks if it is possible to transform a switch with only // two cases (or two cases + default) that produces a result into a select. // TODO: Handle switches with more than 2 cases that map to the same result. +// The branch weights correspond to the provided Condition (i.e. if Condition is +// modified from the original SwitchInst, the caller must adjust the weights) static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Constant *DefaultResult, Value *Condition, - IRBuilder<> &Builder, const DataLayout &DL) { + IRBuilder<> &Builder, const DataLayout &DL, + ArrayRef BranchWeights) { // If we are selecting between only two cases transform into a simple // select or a two-way select if default is possible. // Example: @@ -6329,6 +6333,10 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, // case 20: return 2; > %2 = icmp eq i32 %a, 20 // default: return 4; %3 = select i1 %2, i32 2, i32 %1 // } + + const bool HasBranchWeights = + !BranchWeights.empty() && !ProfcheckDisableMetadataFixes; + if (ResultVector.size() == 2 && ResultVector[0].second.size() == 1 && ResultVector[1].second.size() == 1) { ConstantInt *FirstCase = ResultVector[0].second[0]; @@ -6337,13 +6345,37 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, if (DefaultResult) { Value *ValueCompare = Builder.CreateICmpEQ(Condition, SecondCase, "switch.selectcmp"); - SelectValue = Builder.CreateSelect(ValueCompare, ResultVector[1].first, - DefaultResult, "switch.select"); + SelectInst *SelectValueInst = cast(Builder.CreateSelect( + ValueCompare, ResultVector[1].first, DefaultResult, "switch.select")); + SelectValue = SelectValueInst; + if (HasBranchWeights) { +// We start with 3 probabilities, where the numerator is the +// corresponding BranchWeights[i], and the denominator is the sum over +// BranchWeights. We want the probability and negative probability of +// Condition == SecondCase. +assert(BranchWeights.size() == 3); +setBranchWeights(SelectValueInst, BranchWeights[2], + BranchWeights[0] + BranchWeights[1], + /*IsExpected=*/false); + } } Value *ValueCompare = Builder.CreateICmpEQ(Condition, FirstCase, "switch.selectcmp"); -return Builder.CreateSelect(ValueCompare, ResultVector[0].first, -SelectValue, "switch.select"); +SelectInst *Ret = cast(Builder.CreateSelect( +ValueCompare, ResultVector[0].first, SelectValue, "switch.select")); +if (HasBranchWeights) { + // We may have had a DefaultResult. Base the position of the first and + // second's branch weights accordingly. Also the proability that Condition + // != FirstCase needs to take that into account. + assert(BranchWeights.size() >= 2); + size_t FirstCasePos = (Condition != nullptr); + size_t SecondCasePos = FirstCasePos + 1; + uint32_t DefaultCase = (Condition != nullptr) ? BranchWeights[0] : 0; + setBranchWeights(Ret, BranchWeights[FirstCasePos], + DefaultCase + BranchWeights[SecondCasePos], + /*IsExpected=*/false); +} +return Ret; } // Handle the degenerate case where two cases have the same result value. @@ -6379,8 +6411,16 @@ static Value *foldSwitchToSelect(const SwitchCaseResultVectorTy &ResultVector, Value *And = Builder.CreateAnd(Condition, AndMask); Value *Cmp = Builder.CreateICmpEQ( And, Constant::getIntegerValue(And->getType(), AndMask)); - return Builder.CreateSelect(Cmp, ResultVector[0].first, - DefaultResult); + Select
[llvm-branch-commits] [llvm] [AMDGPU] gfx1251 VOP2 dpp support (PR #159641)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/159641 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] NonUniformResourceIndex implementation (PR #159655)
https://github.com/hekota updated https://github.com/llvm/llvm-project/pull/159655 >From 108bf356e743d36b4eb5d0217720cf47ab85f33f Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Thu, 18 Sep 2025 14:31:38 -0700 Subject: [PATCH 1/2] [HLSL] NonUniformResourceIndex implementation Adds HLSL function NonUniformResourceIndex to hlsl_intrinsics.h. The function calls a builtin `__builtin_hlsl_resource_nonuniformindex` which gets translated to LLVM intrinsic `llvm.{dx|spv}.resource_nonuniformindex. Depends on #159608 Closes #157923 --- clang/include/clang/Basic/Builtins.td | 6 +++ clang/lib/CodeGen/CGHLSLBuiltins.cpp | 7 clang/lib/CodeGen/CGHLSLRuntime.h | 2 + clang/lib/Headers/hlsl/hlsl_intrinsics.h | 25 .../resources/NonUniformResourceIndex.hlsl| 38 +++ 5 files changed, 78 insertions(+) create mode 100644 clang/test/CodeGenHLSL/resources/NonUniformResourceIndex.hlsl diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index 27639f06529cb..96676bd810631 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -4933,6 +4933,12 @@ def HLSLResourceHandleFromImplicitBinding : LangBuiltin<"HLSL_LANG"> { let Prototype = "void(...)"; } +def HLSLResourceNonUniformIndex : LangBuiltin<"HLSL_LANG"> { + let Spellings = ["__builtin_hlsl_resource_nonuniformindex"]; + let Attributes = [NoThrow]; + let Prototype = "uint32_t(uint32_t)"; +} + def HLSLAll : LangBuiltin<"HLSL_LANG"> { let Spellings = ["__builtin_hlsl_all"]; let Attributes = [NoThrow, Const]; diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp index 7b5b924b1fe82..9f87afa5a8a3d 100644 --- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp +++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp @@ -352,6 +352,13 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID, SmallVector Args{OrderID, SpaceOp, RangeOp, IndexOp, Name}; return Builder.CreateIntrinsic(HandleTy, IntrinsicID, Args); } + case Builtin::BI__builtin_hlsl_resource_nonuniformindex: { +Value *IndexOp = EmitScalarExpr(E->getArg(0)); +llvm::Type *RetTy = ConvertType(E->getType()); +return Builder.CreateIntrinsic( +RetTy, CGM.getHLSLRuntime().getNonUniformResourceIndexIntrinsic(), +ArrayRef{IndexOp}); + } case Builtin::BI__builtin_hlsl_all: { Value *Op0 = EmitScalarExpr(E->getArg(0)); return Builder.CreateIntrinsic( diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h index 370f3d5c5d30d..f4b410664d60c 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.h +++ b/clang/lib/CodeGen/CGHLSLRuntime.h @@ -129,6 +129,8 @@ class CGHLSLRuntime { resource_handlefrombinding) GENERATE_HLSL_INTRINSIC_FUNCTION(CreateHandleFromImplicitBinding, resource_handlefromimplicitbinding) + GENERATE_HLSL_INTRINSIC_FUNCTION(NonUniformResourceIndex, + resource_nonuniformindex) GENERATE_HLSL_INTRINSIC_FUNCTION(BufferUpdateCounter, resource_updatecounter) GENERATE_HLSL_INTRINSIC_FUNCTION(GroupMemoryBarrierWithGroupSync, group_memory_barrier_with_group_sync) diff --git a/clang/lib/Headers/hlsl/hlsl_intrinsics.h b/clang/lib/Headers/hlsl/hlsl_intrinsics.h index d9d87c827e6a4..0eab2ff56c519 100644 --- a/clang/lib/Headers/hlsl/hlsl_intrinsics.h +++ b/clang/lib/Headers/hlsl/hlsl_intrinsics.h @@ -422,6 +422,31 @@ constexpr int4 D3DCOLORtoUBYTE4(float4 V) { return __detail::d3d_color_to_ubyte4_impl(V); } +//===--===// +// NonUniformResourceIndex builtin +//===--===// + +/// \fn uint NonUniformResourceIndex(uint I) +/// \brief A compiler hint to indicate that a resource index varies across +/// threads. +// / within a wave (i.e., it is non-uniform). +/// \param I [in] Resource array index +/// +/// The return value is the \Index parameter. +/// +/// When indexing into an array of shader resources (e.g., textures, buffers), +/// some GPU hardware and drivers require the compiler to know whether the index +/// is uniform (same for all threads) or non-uniform (varies per thread). +/// +/// Using NonUniformResourceIndex explicitly marks an index as non-uniform, . +/// disabling certain assumptions or optimizations that could lead to incorrect +/// behavior when dynamically accessing resource arrays with non-uniform +/// indices. + +constexpr uint32_t NonUniformResourceIndex(uint32_t Index) { + return __builtin_hlsl_resource_nonuniformindex(Index); +} + //===--===// // reflect builtin //===--===// diff --git a/clang/test/CodeGenHLSL/
[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
https://github.com/mtrofin approved this pull request. https://github.com/llvm/llvm-project/pull/158376 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158274 >From 7d3e2fa03f76098b2f4f90a2c4407e18d59423c5 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 9 Sep 2025 11:15:47 +0900 Subject: [PATCH] X86: Switch to RegClassByHwMode Replace the target uses of PointerLikeRegClass with RegClassByHwMode --- .../X86/MCTargetDesc/X86MCTargetDesc.cpp | 3 ++ llvm/lib/Target/X86/X86.td| 2 ++ llvm/lib/Target/X86/X86InstrInfo.td | 8 ++--- llvm/lib/Target/X86/X86InstrOperands.td | 30 +++- llvm/lib/Target/X86/X86InstrPredicates.td | 14 llvm/lib/Target/X86/X86RegisterInfo.cpp | 35 +-- llvm/lib/Target/X86/X86Subtarget.h| 4 +-- llvm/utils/TableGen/X86FoldTablesEmitter.cpp | 4 +-- 8 files changed, 57 insertions(+), 43 deletions(-) diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp index bb1e716c33ed5..1d5ef8b0996dc 100644 --- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp +++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp @@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) { else FS = "-64bit-mode,-32bit-mode,+16bit-mode"; + if (TT.isX32()) +FS += ",+x32"; + return FS; } diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td index 7c9e821c02fda..3af8b3e060a16 100644 --- a/llvm/lib/Target/X86/X86.td +++ b/llvm/lib/Target/X86/X86.td @@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", "true", "32-bit mode (80386)">; def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true", "16-bit mode (i8086)">; +def IsX32 : SubtargetFeature<"x32", "IsX32", "true", + "64-bit with ILP32 programming model (e.g. x32 ABI)">; //===--===// // X86 Subtarget ISA features diff --git a/llvm/lib/Target/X86/X86InstrInfo.td b/llvm/lib/Target/X86/X86InstrInfo.td index 7f6c5614847e3..0c4abc2c400f6 100644 --- a/llvm/lib/Target/X86/X86InstrInfo.td +++ b/llvm/lib/Target/X86/X86InstrInfo.td @@ -18,14 +18,14 @@ include "X86InstrFragments.td" include "X86InstrFragmentsSIMD.td" //===--===// -// X86 Operand Definitions. +// X86 Predicate Definitions. // -include "X86InstrOperands.td" +include "X86InstrPredicates.td" //===--===// -// X86 Predicate Definitions. +// X86 Operand Definitions. // -include "X86InstrPredicates.td" +include "X86InstrOperands.td" //===--===// // X86 Instruction Format Definitions. diff --git a/llvm/lib/Target/X86/X86InstrOperands.td b/llvm/lib/Target/X86/X86InstrOperands.td index 80843f6bb80e6..5207ecad127a2 100644 --- a/llvm/lib/Target/X86/X86InstrOperands.td +++ b/llvm/lib/Target/X86/X86InstrOperands.td @@ -6,9 +6,15 @@ // //===--===// +def x86_ptr_rc : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32, GR64, LOW32_ADDR_ACCESS]>; + // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for // the index operand of an address, to conform to x86 encoding restrictions. -def ptr_rc_nosp : PointerLikeRegClass<1>; +def ptr_rc_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOSP, GR64_NOSP, GR32_NOSP]>; // *mem - Operand definitions for the funky X86 addressing mode operands. // @@ -53,7 +59,7 @@ class X86MemOperand : Operand { let PrintMethod = printMethod; - let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); let ParserMatchClass = parserMatchClass; let OperandType = "OPERAND_MEMORY"; int Size = size; @@ -63,7 +69,7 @@ class X86MemOperand : X86MemOperand { - let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); } def anymem : X86MemOperand<"printMemReference">; @@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", X86Mem64AsmOperand>; // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead // of a plain GPR, so that it doesn't potentially require a REX prefix. -def ptr_rc_norex : PointerLikeRegClass<2>; -def ptr_rc_norex_nosp : PointerLikeRegClass<3>; +def ptr_rc_norex : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX, GR64_NOREX, GR32_NOREX]>; + +def ptr_rc_norex_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>; + def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> { let MIOpe
[llvm-branch-commits] [llvm] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (PR #158271)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158271 >From e7ef891fb2c4e21bec4d23af954ad9204f3eb48f Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 8 Sep 2025 14:04:59 +0900 Subject: [PATCH] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass --- .../Sparc/Disassembler/SparcDisassembler.cpp | 8 --- llvm/lib/Target/Sparc/SparcInstrInfo.td | 21 +-- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp b/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp index c3d60f3689e1f..e585e5af42d32 100644 --- a/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp +++ b/llvm/lib/Target/Sparc/Disassembler/SparcDisassembler.cpp @@ -159,14 +159,6 @@ static DecodeStatus DecodeI64RegsRegisterClass(MCInst &Inst, unsigned RegNo, return DecodeIntRegsRegisterClass(Inst, RegNo, Address, Decoder); } -// This is used for the type "ptr_rc", which is either IntRegs or I64Regs -// depending on SparcRegisterInfo::getPointerRegClass. -static DecodeStatus DecodePointerLikeRegClass0(MCInst &Inst, unsigned RegNo, - uint64_t Address, - const MCDisassembler *Decoder) { - return DecodeIntRegsRegisterClass(Inst, RegNo, Address, Decoder); -} - static DecodeStatus DecodeFPRegsRegisterClass(MCInst &Inst, unsigned RegNo, uint64_t Address, const MCDisassembler *Decoder) { diff --git a/llvm/lib/Target/Sparc/SparcInstrInfo.td b/llvm/lib/Target/Sparc/SparcInstrInfo.td index 53972d6c105a4..97e7fd7769edb 100644 --- a/llvm/lib/Target/Sparc/SparcInstrInfo.td +++ b/llvm/lib/Target/Sparc/SparcInstrInfo.td @@ -95,10 +95,27 @@ def HasFSMULD : Predicate<"!Subtarget->hasNoFSMULD()">; // will pick deprecated instructions. def UseDeprecatedInsts : Predicate<"Subtarget->useV8DeprecatedInsts()">; +//===--===// +// HwModes Pattern Stuff +//===--===// + +defvar SPARC32 = DefaultMode; +def SPARC64 : HwMode<[Is64Bit]>; + //===--===// // Instruction Pattern Stuff //===--===// +def sparc_ptr_rc : RegClassByHwMode< + [SPARC32, SPARC64], + [IntRegs, I64Regs]>; + +// Both cases can use the same decoder method, so avoid the dispatch +// by hwmode by setting an explicit DecoderMethod +def ptr_op : RegisterOperand { + let DecoderMethod = "DecodeIntRegsRegisterClass"; +} + // FIXME these should have AsmOperandClass. def uimm3 : PatLeaf<(imm), [{ return isUInt<3>(N->getZExtValue()); }]>; @@ -178,12 +195,12 @@ def simm13Op : Operand { def MEMrr : Operand { let PrintMethod = "printMemOperand"; - let MIOperandInfo = (ops ptr_rc, ptr_rc); + let MIOperandInfo = (ops ptr_op, ptr_op); let ParserMatchClass = SparcMEMrrAsmOperand; } def MEMri : Operand { let PrintMethod = "printMemOperand"; - let MIOperandInfo = (ops ptr_rc, simm13Op); + let MIOperandInfo = (ops ptr_op, simm13Op); let ParserMatchClass = SparcMEMriAsmOperand; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158274 >From 7d3e2fa03f76098b2f4f90a2c4407e18d59423c5 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 9 Sep 2025 11:15:47 +0900 Subject: [PATCH] X86: Switch to RegClassByHwMode Replace the target uses of PointerLikeRegClass with RegClassByHwMode --- .../X86/MCTargetDesc/X86MCTargetDesc.cpp | 3 ++ llvm/lib/Target/X86/X86.td| 2 ++ llvm/lib/Target/X86/X86InstrInfo.td | 8 ++--- llvm/lib/Target/X86/X86InstrOperands.td | 30 +++- llvm/lib/Target/X86/X86InstrPredicates.td | 14 llvm/lib/Target/X86/X86RegisterInfo.cpp | 35 +-- llvm/lib/Target/X86/X86Subtarget.h| 4 +-- llvm/utils/TableGen/X86FoldTablesEmitter.cpp | 4 +-- 8 files changed, 57 insertions(+), 43 deletions(-) diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp index bb1e716c33ed5..1d5ef8b0996dc 100644 --- a/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp +++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp @@ -55,6 +55,9 @@ std::string X86_MC::ParseX86Triple(const Triple &TT) { else FS = "-64bit-mode,-32bit-mode,+16bit-mode"; + if (TT.isX32()) +FS += ",+x32"; + return FS; } diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td index 7c9e821c02fda..3af8b3e060a16 100644 --- a/llvm/lib/Target/X86/X86.td +++ b/llvm/lib/Target/X86/X86.td @@ -25,6 +25,8 @@ def Is32Bit : SubtargetFeature<"32bit-mode", "Is32Bit", "true", "32-bit mode (80386)">; def Is16Bit : SubtargetFeature<"16bit-mode", "Is16Bit", "true", "16-bit mode (i8086)">; +def IsX32 : SubtargetFeature<"x32", "IsX32", "true", + "64-bit with ILP32 programming model (e.g. x32 ABI)">; //===--===// // X86 Subtarget ISA features diff --git a/llvm/lib/Target/X86/X86InstrInfo.td b/llvm/lib/Target/X86/X86InstrInfo.td index 7f6c5614847e3..0c4abc2c400f6 100644 --- a/llvm/lib/Target/X86/X86InstrInfo.td +++ b/llvm/lib/Target/X86/X86InstrInfo.td @@ -18,14 +18,14 @@ include "X86InstrFragments.td" include "X86InstrFragmentsSIMD.td" //===--===// -// X86 Operand Definitions. +// X86 Predicate Definitions. // -include "X86InstrOperands.td" +include "X86InstrPredicates.td" //===--===// -// X86 Predicate Definitions. +// X86 Operand Definitions. // -include "X86InstrPredicates.td" +include "X86InstrOperands.td" //===--===// // X86 Instruction Format Definitions. diff --git a/llvm/lib/Target/X86/X86InstrOperands.td b/llvm/lib/Target/X86/X86InstrOperands.td index 80843f6bb80e6..5207ecad127a2 100644 --- a/llvm/lib/Target/X86/X86InstrOperands.td +++ b/llvm/lib/Target/X86/X86InstrOperands.td @@ -6,9 +6,15 @@ // //===--===// +def x86_ptr_rc : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32, GR64, LOW32_ADDR_ACCESS]>; + // A version of ptr_rc which excludes SP, ESP, and RSP. This is used for // the index operand of an address, to conform to x86 encoding restrictions. -def ptr_rc_nosp : PointerLikeRegClass<1>; +def ptr_rc_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOSP, GR64_NOSP, GR32_NOSP]>; // *mem - Operand definitions for the funky X86 addressing mode operands. // @@ -53,7 +59,7 @@ class X86MemOperand : Operand { let PrintMethod = printMethod; - let MIOperandInfo = (ops ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, ptr_rc_nosp, i32imm, SEGMENT_REG); let ParserMatchClass = parserMatchClass; let OperandType = "OPERAND_MEMORY"; int Size = size; @@ -63,7 +69,7 @@ class X86MemOperand : X86MemOperand { - let MIOperandInfo = (ops ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); + let MIOperandInfo = (ops x86_ptr_rc, i8imm, RC, i32imm, SEGMENT_REG); } def anymem : X86MemOperand<"printMemReference">; @@ -113,8 +119,14 @@ def sdmem : X86MemOperand<"printqwordmem", X86Mem64AsmOperand>; // A version of i8mem for use on x86-64 and x32 that uses a NOREX GPR instead // of a plain GPR, so that it doesn't potentially require a REX prefix. -def ptr_rc_norex : PointerLikeRegClass<2>; -def ptr_rc_norex_nosp : PointerLikeRegClass<3>; +def ptr_rc_norex : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX, GR64_NOREX, GR32_NOREX]>; + +def ptr_rc_norex_nosp : RegClassByHwMode< + [X86_32, X86_64, X86_64_X32], + [GR32_NOREX_NOSP, GR64_NOREX_NOSP, GR32_NOREX_NOSP]>; + def i8mem_NOREX : X86MemOperand<"printbytemem", X86Mem8AsmOperand, 8> { let MIOpe
[llvm-branch-commits] [llvm] Mips: Switch to RegClassByHwMode (PR #158273)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158273 >From 5b8f38bb56b46b9e63fe2031f9b43e4bbba333fb Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 6 Sep 2025 21:14:45 +0900 Subject: [PATCH 1/3] Mips: Switch to RegClassByHwMode --- .../Target/Mips/AsmParser/MipsAsmParser.cpp | 9 +-- .../Mips/Disassembler/MipsDisassembler.cpp| 24 +++ llvm/lib/Target/Mips/MicroMipsInstrInfo.td| 12 +++--- llvm/lib/Target/Mips/Mips.td | 15 llvm/lib/Target/Mips/MipsInstrInfo.td | 20 +++- llvm/lib/Target/Mips/MipsRegisterInfo.cpp | 16 ++--- llvm/lib/Target/Mips/MipsRegisterInfo.td | 16 + 7 files changed, 76 insertions(+), 36 deletions(-) diff --git a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp index 8a5cb517c94c5..ba70c9e6cb9e8 100644 --- a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp +++ b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp @@ -3706,7 +3706,9 @@ void MipsAsmParser::expandMem16Inst(MCInst &Inst, SMLoc IDLoc, MCStreamer &Out, MCRegister TmpReg = DstReg; const MCInstrDesc &Desc = MII.get(OpCode); - int16_t DstRegClass = Desc.operands()[StartOp].RegClass; + int16_t DstRegClass = + MII.getOpRegClassID(Desc.operands()[StartOp], + STI->getHwMode(MCSubtargetInfo::HwMode_RegInfo)); unsigned DstRegClassID = getContext().getRegisterInfo()->getRegClass(DstRegClass).getID(); bool IsGPR = (DstRegClassID == Mips::GPR32RegClassID) || @@ -3834,7 +3836,10 @@ void MipsAsmParser::expandMem9Inst(MCInst &Inst, SMLoc IDLoc, MCStreamer &Out, MCRegister TmpReg = DstReg; const MCInstrDesc &Desc = MII.get(OpCode); - int16_t DstRegClass = Desc.operands()[StartOp].RegClass; + int16_t DstRegClass = + MII.getOpRegClassID(Desc.operands()[StartOp], + STI->getHwMode(MCSubtargetInfo::HwMode_RegInfo)); + unsigned DstRegClassID = getContext().getRegisterInfo()->getRegClass(DstRegClass).getID(); bool IsGPR = (DstRegClassID == Mips::GPR32RegClassID) || diff --git a/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp b/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp index c22b8f61b12dc..705695c74803f 100644 --- a/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp +++ b/llvm/lib/Target/Mips/Disassembler/MipsDisassembler.cpp @@ -916,6 +916,30 @@ DecodeGPRMM16MovePRegisterClass(MCInst &Inst, unsigned RegNo, uint64_t Address, return MCDisassembler::Success; } +static DecodeStatus DecodeGP32RegisterClass(MCInst &Inst, unsigned RegNo, +uint64_t Address, +const MCDisassembler *Decoder) { + llvm_unreachable("this is unused"); +} + +static DecodeStatus DecodeGP64RegisterClass(MCInst &Inst, unsigned RegNo, +uint64_t Address, +const MCDisassembler *Decoder) { + llvm_unreachable("this is unused"); +} + +static DecodeStatus DecodeSP32RegisterClass(MCInst &Inst, unsigned RegNo, +uint64_t Address, +const MCDisassembler *Decoder) { + llvm_unreachable("this is unused"); +} + +static DecodeStatus DecodeSP64RegisterClass(MCInst &Inst, unsigned RegNo, +uint64_t Address, +const MCDisassembler *Decoder) { + llvm_unreachable("this is unused"); +} + static DecodeStatus DecodeGPR32RegisterClass(MCInst &Inst, unsigned RegNo, uint64_t Address, const MCDisassembler *Decoder) { diff --git a/llvm/lib/Target/Mips/MicroMipsInstrInfo.td b/llvm/lib/Target/Mips/MicroMipsInstrInfo.td index b3fd8f422f429..b44bf1391b73e 100644 --- a/llvm/lib/Target/Mips/MicroMipsInstrInfo.td +++ b/llvm/lib/Target/Mips/MicroMipsInstrInfo.td @@ -57,12 +57,6 @@ def MicroMipsMemGPRMM16AsmOperand : AsmOperandClass { let PredicateMethod = "isMemWithGRPMM16Base"; } -// Define the classes of pointers used by microMIPS. -// The numbers must match those in MipsRegisterInfo::MipsPtrClass. -def ptr_gpr16mm_rc : PointerLikeRegClass<1>; -def ptr_sp_rc : PointerLikeRegClass<2>; -def ptr_gp_rc : PointerLikeRegClass<3>; - class mem_mm_4_generic : Operand { let PrintMethod = "printMemOperand"; let MIOperandInfo = (ops ptr_gpr16mm_rc, simm4); @@ -114,7 +108,7 @@ def mem_mm_gp_simm7_lsl2 : Operand { def mem_mm_9 : Operand { let PrintMethod = "printMemOperand"; - let MIOperandInfo = (ops ptr_rc, simm9); + let MIOperandInfo = (ops mips_ptr_rc, simm9); let EncoderMethod = "getMemEncodingMMImm9"; let ParserMatchClass = MipsMemSimmAsmOperand<9>; let OperandType = "OPERAND_MEMORY"; @@ -130,7 +124,7 @@ def mem_m