[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
@@ -68,6 +156,57 @@ struct SLSHardeningInserter : ThunkInserter { } // end anonymous namespace +const ThunkKind ThunkKind::BR = {ThunkBR, "", false, false, AArch64::BR}; +const ThunkKind ThunkKind::BRAA = {ThunkBRAA, "aa_", true, true, AArch64::BRAA}; +const ThunkKind ThunkKind::BRAB = {ThunkBRAB, "ab_", true, true, AArch64::BRAB}; +const ThunkKind ThunkKind::BRAAZ = {ThunkBRAAZ, "aaz_", false, true, +AArch64::BRAAZ}; +const ThunkKind ThunkKind::BRABZ = {ThunkBRABZ, "abz_", false, true, +AArch64::BRABZ}; kbeyls wrote: very minor nitpick: These probably can go directly after the definition of `struct ThunkKind`? If so, it's easier to understand what all the "false" and "true"s mean here without having to scroll a lot to where `struct ThunkKind` is defined. https://github.com/llvm/llvm-project/pull/97605 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)
@@ -1117,9 +1155,11 @@ void ELFObjectFile::getRelocationTypeName( template Expected ELFObjectFile::getRelocationAddend(DataRefImpl Rel) const { - if (getRelSection(Rel)->sh_type != ELF::SHT_RELA) -return createError("Section is not SHT_RELA"); - return (int64_t)getRela(Rel)->r_addend; + if (getRelSection(Rel)->sh_type == ELF::SHT_RELA) +return (int64_t)getRela(Rel)->r_addend; + if (getRelSection(Rel)->sh_type == ELF::SHT_CREL) +return (int64_t)getCrel(Rel).r_addend; + return createError("Section is not SHT_RELA"); jh7370 wrote: I'm not sure this error quite makes sense anymore. Probably needs to say something about addends. https://github.com/llvm/llvm-project/pull/97382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CodeGen] Add dump() to MachineTraceMetrics.h (PR #97799)
https://github.com/wangpc-pp created https://github.com/llvm/llvm-project/pull/97799 To enhance debugging. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/80124 >From e3fb1fe7bdd4b7c24f9361c4d14dd1206fc8c067 Mon Sep 17 00:00:00 2001 From: wangpc Date: Sun, 18 Feb 2024 11:12:16 +0800 Subject: [PATCH 1/2] Move after addIRPasses Created using spr 1.3.4 --- llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index fdf1c023fff87..7a26e1956424c 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -450,15 +450,15 @@ void RISCVPassConfig::addIRPasses() { if (EnableLoopDataPrefetch) addPass(createLoopDataPrefetchPass()); -if (EnableSelectOpt && getOptLevel() == CodeGenOptLevel::Aggressive) - addPass(createSelectOptimizePass()); - addPass(createRISCVGatherScatterLoweringPass()); addPass(createInterleavedAccessPass()); addPass(createRISCVCodeGenPreparePass()); } TargetPassConfig::addIRPasses(); + + if (getOptLevel() == CodeGenOptLevel::Aggressive && EnableSelectOpt) +addPass(createSelectOptimizePass()); } bool RISCVPassConfig::addPreISel() { >From 5d5398596dc30c47c67572ec20137fb3f9434940 Mon Sep 17 00:00:00 2001 From: wangpc Date: Wed, 21 Feb 2024 21:21:28 +0800 Subject: [PATCH 2/2] Fix test Created using spr 1.3.4 --- llvm/test/CodeGen/RISCV/O3-pipeline.ll | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll index 62c1af52e6c20..8b52e3fe7b2f1 100644 --- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -34,15 +34,6 @@ ; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Scalar Evolution Analysis ; CHECK-NEXT: Loop Data Prefetch -; CHECK-NEXT: Post-Dominator Tree Construction -; CHECK-NEXT: Branch Probability Analysis -; CHECK-NEXT: Block Frequency Analysis -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis -; CHECK-NEXT: Optimization Remark Emitter -; CHECK-NEXT: Optimize selects -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: RISC-V gather/scatter lowering ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: RISC-V CodeGenPrepare @@ -77,6 +68,15 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: Post-Dominator Tree Construction +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter +; CHECK-NEXT: Optimize selects +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/80124 >From e3fb1fe7bdd4b7c24f9361c4d14dd1206fc8c067 Mon Sep 17 00:00:00 2001 From: wangpc Date: Sun, 18 Feb 2024 11:12:16 +0800 Subject: [PATCH 1/2] Move after addIRPasses Created using spr 1.3.4 --- llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index fdf1c023fff878..7a26e1956424cb 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -450,15 +450,15 @@ void RISCVPassConfig::addIRPasses() { if (EnableLoopDataPrefetch) addPass(createLoopDataPrefetchPass()); -if (EnableSelectOpt && getOptLevel() == CodeGenOptLevel::Aggressive) - addPass(createSelectOptimizePass()); - addPass(createRISCVGatherScatterLoweringPass()); addPass(createInterleavedAccessPass()); addPass(createRISCVCodeGenPreparePass()); } TargetPassConfig::addIRPasses(); + + if (getOptLevel() == CodeGenOptLevel::Aggressive && EnableSelectOpt) +addPass(createSelectOptimizePass()); } bool RISCVPassConfig::addPreISel() { >From 5d5398596dc30c47c67572ec20137fb3f9434940 Mon Sep 17 00:00:00 2001 From: wangpc Date: Wed, 21 Feb 2024 21:21:28 +0800 Subject: [PATCH 2/2] Fix test Created using spr 1.3.4 --- llvm/test/CodeGen/RISCV/O3-pipeline.ll | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll index 62c1af52e6c20e..8b52e3fe7b2f15 100644 --- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -34,15 +34,6 @@ ; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Scalar Evolution Analysis ; CHECK-NEXT: Loop Data Prefetch -; CHECK-NEXT: Post-Dominator Tree Construction -; CHECK-NEXT: Branch Probability Analysis -; CHECK-NEXT: Block Frequency Analysis -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis -; CHECK-NEXT: Optimization Remark Emitter -; CHECK-NEXT: Optimize selects -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: RISC-V gather/scatter lowering ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: RISC-V CodeGenPrepare @@ -77,6 +68,15 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: Post-Dominator Tree Construction +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter +; CHECK-NEXT: Optimize selects +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)
ChuanqiXu9 wrote: Sorry for bothering, I tried a new manner to update the patch but it shows I should better : ( https://github.com/llvm/llvm-project/pull/83237 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [libc] [lld] [lldb] [llvm] [Serialization] Introduce OnDiskHashTable for specializations (PR #83233)
https://github.com/ChuanqiXu9 updated https://github.com/llvm/llvm-project/pull/83233 error: too big or took too long to generate ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 82a6d15 - Revert "Revert "[symbolizer] Empty string is not an error" (#94424)"
Author: Serge Pavlov Date: 2024-07-05T09:10:03+07:00 New Revision: 82a6d1572ff4de1491ea58eb167967350eace9fa URL: https://github.com/llvm/llvm-project/commit/82a6d1572ff4de1491ea58eb167967350eace9fa DIFF: https://github.com/llvm/llvm-project/commit/82a6d1572ff4de1491ea58eb167967350eace9fa.diff LOG: Revert "Revert "[symbolizer] Empty string is not an error" (#94424)" This reverts commit c0bb16eaf7a6c16edadfd05ba4168fa536c227e2. Added: Modified: llvm/test/tools/llvm-symbolizer/get-input-file.test llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp Removed: diff --git a/llvm/test/tools/llvm-symbolizer/get-input-file.test b/llvm/test/tools/llvm-symbolizer/get-input-file.test index 8c21816591c81..50eb051968718 100644 --- a/llvm/test/tools/llvm-symbolizer/get-input-file.test +++ b/llvm/test/tools/llvm-symbolizer/get-input-file.test @@ -1,9 +1,9 @@ # If binary input file is not specified, llvm-symbolizer assumes it is the first # item in the command. -# No input items at all, complain about missing input file. +# No input items at all. Report an unknown line, but do not produce any output on stderr. RUN: echo | llvm-symbolizer 2>%t.1.err | FileCheck %s --check-prefix=NOSOURCE -RUN: FileCheck --input-file=%t.1.err --check-prefix=NOFILE %s +RUN: FileCheck --input-file=%t.1.err --implicit-check-not={{.}} --allow-empty %s # Only one input item, complain about missing addresses. RUN: llvm-symbolizer "foo" 2>%t.2.err | FileCheck %s --check-prefix=NOSOURCE @@ -32,8 +32,6 @@ RUN: FileCheck --input-file=%t.7.err --check-prefix=BAD-QUOTE %s NOSOURCE: ?? NOSOURCE-NEXT: ??:0:0 -NOFILE: error: no input filename has been specified - NOADDR: error: 'foo': no module offset has been specified NOTFOUND: error: 'foo': [[MSG]] diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp index b98bdbc388faf..6d7953f3109a5 100644 --- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp +++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp @@ -337,6 +337,14 @@ static void symbolizeInput(const opt::InputArgList , object::BuildID BuildID(IncomingBuildID.begin(), IncomingBuildID.end()); uint64_t Offset = 0; StringRef Symbol; + + // An empty input string may be used to check if the process is alive and + // responding to input. Do not emit a message on stderr in this case but + // respond on stdout. + if (InputString.empty()) { +printUnknownLineInfo(ModuleName, Printer); +return; + } if (Error E = parseCommand(Args.getLastArgValue(OPT_obj_EQ), IsAddr2Line, StringRef(InputString), Cmd, ModuleName, BuildID, Symbol, Offset)) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/97605 >From 84e7eb3c36b99ac7f673d9a7ad0c88c469f7f45d Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 1 Jul 2024 20:13:54 +0300 Subject: [PATCH 1/2] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. --- .../Target/AArch64/AArch64SLSHardening.cpp| 326 -- .../speculation-hardening-sls-blra.mir| 210 +++ 2 files changed, 432 insertions(+), 104 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/speculation-hardening-sls-blra.mir diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp index 00ba31b3e500d..4ae2e48af337f 100644 --- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp +++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp @@ -13,6 +13,7 @@ #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" +#include "llvm/ADT/StringSwitch.h" #include "llvm/CodeGen/IndirectThunks.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunction.h" @@ -23,6 +24,7 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FormatVariadic.h" #include "llvm/Target/TargetMachine.h" #include @@ -32,17 +34,103 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" -static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; +// Common name prefix of all thunks generated by this pass. +// +// The generic form is +// __llvm_slsblr_thunk_xNfor BLR thunks +// __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks +// __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks +static constexpr StringRef CommonNamePrefix = "__llvm_slsblr_thunk_"; namespace { -// Set of inserted thunks: bitmask with bits corresponding to -// indexes in SLSBLRThunks array. -typedef uint32_t ThunksSet; +struct ThunkKind { + enum ThunkKindId { +ThunkBR, +ThunkBRAA, +ThunkBRAB, +ThunkBRAAZ, +ThunkBRABZ, + }; + + ThunkKindId Id; + StringRef NameInfix; + bool HasXmOperand; + bool NeedsPAuth; + + // Opcode to perform indirect jump from inside the thunk. + unsigned BROpcode; + + static const ThunkKind BR; + static const ThunkKind BRAA; + static const ThunkKind BRAB; + static const ThunkKind BRAAZ; + static const ThunkKind BRABZ; +}; + +// Set of inserted thunks. +class ThunksSet { +public: + static constexpr unsigned NumXRegisters = 32; + + // Given Xn register, returns n. + static unsigned indexOfXReg(Register Xn); + // Given n, returns Xn register. + static Register xRegByIndex(unsigned N); + + ThunksSet |=(const ThunksSet ) { +BLRThunks |= Other.BLRThunks; +BLRAAZThunks |= Other.BLRAAZThunks; +BLRABZThunks |= Other.BLRABZThunks; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRAAThunks[I] |= Other.BLRAAThunks[I]; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRABThunks[I] |= Other.BLRABThunks[I]; + +return *this; + } + + bool get(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +return getBitmask(Kind, Xm) & XnBit; + } + + void set(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +getBitmask(Kind, Xm) |= XnBit; + } + +private: + // Bitmasks representing operands used, with n-th bit corresponding to Xn + // register operand. If the instruction has a second operand (Xm), an array + // of bitmasks is used, indexed by m. + // Indexes corresponding to the forbidden x16, x17 and x30 registers are + // always unset, for simplicity there are no holes. + uint32_t BLRThunks = 0; + uint32_t BLRAAZThunks = 0; + uint32_t BLRABZThunks = 0; + uint32_t BLRAAThunks[NumXRegisters] = {}; + uint32_t BLRABThunks[NumXRegisters] = {}; + + uint32_t (ThunkKind::ThunkKindId Kind, Register Xm) { +switch (Kind) { +case ThunkKind::ThunkBR: + return BLRThunks; +case ThunkKind::ThunkBRAAZ: + return BLRAAZThunks; +case ThunkKind::ThunkBRABZ: + return BLRABZThunks; +case ThunkKind::ThunkBRAA: + return BLRAAThunks[indexOfXReg(Xm)]; +case ThunkKind::ThunkBRAB: + return BLRABThunks[indexOfXReg(Xm)]; +} + } +}; struct SLSHardeningInserter : ThunkInserter { public: - const char *getThunkPrefix() { return SLSBLRNamePrefix; } + const char *getThunkPrefix() { return CommonNamePrefix.data(); } bool mayUseThunk(const
[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/97605 >From 84e7eb3c36b99ac7f673d9a7ad0c88c469f7f45d Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 1 Jul 2024 20:13:54 +0300 Subject: [PATCH] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. --- .../Target/AArch64/AArch64SLSHardening.cpp| 326 -- .../speculation-hardening-sls-blra.mir| 210 +++ 2 files changed, 432 insertions(+), 104 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/speculation-hardening-sls-blra.mir diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp index 00ba31b3e500d..4ae2e48af337f 100644 --- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp +++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp @@ -13,6 +13,7 @@ #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" +#include "llvm/ADT/StringSwitch.h" #include "llvm/CodeGen/IndirectThunks.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunction.h" @@ -23,6 +24,7 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FormatVariadic.h" #include "llvm/Target/TargetMachine.h" #include @@ -32,17 +34,103 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" -static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; +// Common name prefix of all thunks generated by this pass. +// +// The generic form is +// __llvm_slsblr_thunk_xNfor BLR thunks +// __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks +// __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks +static constexpr StringRef CommonNamePrefix = "__llvm_slsblr_thunk_"; namespace { -// Set of inserted thunks: bitmask with bits corresponding to -// indexes in SLSBLRThunks array. -typedef uint32_t ThunksSet; +struct ThunkKind { + enum ThunkKindId { +ThunkBR, +ThunkBRAA, +ThunkBRAB, +ThunkBRAAZ, +ThunkBRABZ, + }; + + ThunkKindId Id; + StringRef NameInfix; + bool HasXmOperand; + bool NeedsPAuth; + + // Opcode to perform indirect jump from inside the thunk. + unsigned BROpcode; + + static const ThunkKind BR; + static const ThunkKind BRAA; + static const ThunkKind BRAB; + static const ThunkKind BRAAZ; + static const ThunkKind BRABZ; +}; + +// Set of inserted thunks. +class ThunksSet { +public: + static constexpr unsigned NumXRegisters = 32; + + // Given Xn register, returns n. + static unsigned indexOfXReg(Register Xn); + // Given n, returns Xn register. + static Register xRegByIndex(unsigned N); + + ThunksSet |=(const ThunksSet ) { +BLRThunks |= Other.BLRThunks; +BLRAAZThunks |= Other.BLRAAZThunks; +BLRABZThunks |= Other.BLRABZThunks; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRAAThunks[I] |= Other.BLRAAThunks[I]; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRABThunks[I] |= Other.BLRABThunks[I]; + +return *this; + } + + bool get(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +return getBitmask(Kind, Xm) & XnBit; + } + + void set(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +getBitmask(Kind, Xm) |= XnBit; + } + +private: + // Bitmasks representing operands used, with n-th bit corresponding to Xn + // register operand. If the instruction has a second operand (Xm), an array + // of bitmasks is used, indexed by m. + // Indexes corresponding to the forbidden x16, x17 and x30 registers are + // always unset, for simplicity there are no holes. + uint32_t BLRThunks = 0; + uint32_t BLRAAZThunks = 0; + uint32_t BLRABZThunks = 0; + uint32_t BLRAAThunks[NumXRegisters] = {}; + uint32_t BLRABThunks[NumXRegisters] = {}; + + uint32_t (ThunkKind::ThunkKindId Kind, Register Xm) { +switch (Kind) { +case ThunkKind::ThunkBR: + return BLRThunks; +case ThunkKind::ThunkBRAAZ: + return BLRAAZThunks; +case ThunkKind::ThunkBRABZ: + return BLRABZThunks; +case ThunkKind::ThunkBRAA: + return BLRAAThunks[indexOfXReg(Xm)]; +case ThunkKind::ThunkBRAB: + return BLRABThunks[indexOfXReg(Xm)]; +} + } +}; struct SLSHardeningInserter : ThunkInserter { public: - const char *getThunkPrefix() { return SLSBLRNamePrefix; } + const char *getThunkPrefix() { return CommonNamePrefix.data(); } bool mayUseThunk(const MachineFunction )
[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/97605 >From f49c32c8465e9e68d7345fa82ae1294cc2faf0e7 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 1 Jul 2024 20:13:54 +0300 Subject: [PATCH] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. --- .../Target/AArch64/AArch64SLSHardening.cpp| 326 -- .../speculation-hardening-sls-blra.mir| 210 +++ 2 files changed, 432 insertions(+), 104 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/speculation-hardening-sls-blra.mir diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp index 24f023a3d70e7..4ae2e48af337f 100644 --- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp +++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp @@ -13,6 +13,7 @@ #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" +#include "llvm/ADT/StringSwitch.h" #include "llvm/CodeGen/IndirectThunks.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunction.h" @@ -23,6 +24,7 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FormatVariadic.h" #include "llvm/Target/TargetMachine.h" #include @@ -32,17 +34,103 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" -static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; +// Common name prefix of all thunks generated by this pass. +// +// The generic form is +// __llvm_slsblr_thunk_xNfor BLR thunks +// __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks +// __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks +static constexpr StringRef CommonNamePrefix = "__llvm_slsblr_thunk_"; namespace { -// Set of inserted thunks: bitmask with bits corresponding to -// indexes in SLSBLRThunks array. -typedef uint32_t ThunksSet; +struct ThunkKind { + enum ThunkKindId { +ThunkBR, +ThunkBRAA, +ThunkBRAB, +ThunkBRAAZ, +ThunkBRABZ, + }; + + ThunkKindId Id; + StringRef NameInfix; + bool HasXmOperand; + bool NeedsPAuth; + + // Opcode to perform indirect jump from inside the thunk. + unsigned BROpcode; + + static const ThunkKind BR; + static const ThunkKind BRAA; + static const ThunkKind BRAB; + static const ThunkKind BRAAZ; + static const ThunkKind BRABZ; +}; + +// Set of inserted thunks. +class ThunksSet { +public: + static constexpr unsigned NumXRegisters = 32; + + // Given Xn register, returns n. + static unsigned indexOfXReg(Register Xn); + // Given n, returns Xn register. + static Register xRegByIndex(unsigned N); + + ThunksSet |=(const ThunksSet ) { +BLRThunks |= Other.BLRThunks; +BLRAAZThunks |= Other.BLRAAZThunks; +BLRABZThunks |= Other.BLRABZThunks; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRAAThunks[I] |= Other.BLRAAThunks[I]; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRABThunks[I] |= Other.BLRABThunks[I]; + +return *this; + } + + bool get(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +return getBitmask(Kind, Xm) & XnBit; + } + + void set(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +getBitmask(Kind, Xm) |= XnBit; + } + +private: + // Bitmasks representing operands used, with n-th bit corresponding to Xn + // register operand. If the instruction has a second operand (Xm), an array + // of bitmasks is used, indexed by m. + // Indexes corresponding to the forbidden x16, x17 and x30 registers are + // always unset, for simplicity there are no holes. + uint32_t BLRThunks = 0; + uint32_t BLRAAZThunks = 0; + uint32_t BLRABZThunks = 0; + uint32_t BLRAAThunks[NumXRegisters] = {}; + uint32_t BLRABThunks[NumXRegisters] = {}; + + uint32_t (ThunkKind::ThunkKindId Kind, Register Xm) { +switch (Kind) { +case ThunkKind::ThunkBR: + return BLRThunks; +case ThunkKind::ThunkBRAAZ: + return BLRAAZThunks; +case ThunkKind::ThunkBRABZ: + return BLRABZThunks; +case ThunkKind::ThunkBRAA: + return BLRAAThunks[indexOfXReg(Xm)]; +case ThunkKind::ThunkBRAB: + return BLRABThunks[indexOfXReg(Xm)]; +} + } +}; struct SLSHardeningInserter : ThunkInserter { public: - const char *getThunkPrefix() { return SLSBLRNamePrefix; } + const char *getThunkPrefix() { return CommonNamePrefix.data(); } bool mayUseThunk(const MachineFunction )
[llvm-branch-commits] [flang] [mlir] [Flang][OpenMP] Add lowering support for DO SIMD (PR #97718)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-mlir-openmp Author: Sergio Afonso (skatrak) Changes This patch adds support for lowering 'DO SIMD' constructs to MLIR. SIMD information is now stored in an `omp.simd` loop wrapper, which is currently ignored by the OpenMP dialect to LLVM IR translation stage. The end result is that runtime behavior of compiled 'DO SIMD' constructs does not change after this patch, so 'DO SIMD' still runs like 'DO' (i.e. SIMD width = 1). However, all of the required information is now present in the resulting MLIR representation. To avoid confusion, the previous wsloop-simd.f90 lit test is renamed to wsloop-schedule.f90 and a new wsloop-simd.f90 test is created to check the addition of SIMD clauses to the `omp.simd` operation produced when a 'DO SIMD' construct is lowered to MLIR. --- Full diff: https://github.com/llvm/llvm-project/pull/97718.diff 10 Files Affected: - (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+36-13) - (removed) flang/test/Lower/OpenMP/Todo/omp-do-simd-aligned.f90 (-16) - (modified) flang/test/Lower/OpenMP/Todo/omp-do-simd-linear.f90 (+1-1) - (removed) flang/test/Lower/OpenMP/Todo/omp-do-simd-safelen.f90 (-14) - (removed) flang/test/Lower/OpenMP/Todo/omp-do-simd-simdlen.f90 (-14) - (modified) flang/test/Lower/OpenMP/if-clause.f90 (+31) - (modified) flang/test/Lower/OpenMP/loop-compound.f90 (+3) - (added) flang/test/Lower/OpenMP/wsloop-schedule.f90 (+37) - (modified) flang/test/Lower/OpenMP/wsloop-simd.f90 (+42-32) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+3) ``diff diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index 1830e31349cfb..9c23888a87173 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -2060,19 +2060,42 @@ static void genCompositeDoSimd(lower::AbstractConverter , const ConstructQueue , ConstructQueue::iterator item, DataSharingProcessor ) { - ClauseProcessor cp(converter, semaCtx, item->clauses); - cp.processTODO(loc, - llvm::omp::OMPD_do_simd); - // TODO: Add support for vectorization - add vectorization hints inside loop - // body. - // OpenMP standard does not specify the length of vector instructions. - // Currently we safely assume that for !$omp do simd pragma the SIMD length - // is equal to 1 (i.e. we generate standard workshare loop). - // When support for vectorization is enabled, then we need to add handling of - // if clause. Currently if clause can be skipped because we always assume - // SIMD length = 1. - genStandaloneDo(converter, symTable, semaCtx, eval, loc, queue, item, dsp); + lower::StatementContext stmtCtx; + + // Clause processing. + mlir::omp::WsloopClauseOps wsloopClauseOps; + llvm::SmallVector wsloopReductionSyms; + llvm::SmallVector wsloopReductionTypes; + genWsloopClauses(converter, semaCtx, stmtCtx, item->clauses, loc, + wsloopClauseOps, wsloopReductionTypes, wsloopReductionSyms); + + mlir::omp::SimdClauseOps simdClauseOps; + genSimdClauses(converter, semaCtx, item->clauses, loc, simdClauseOps); + + mlir::omp::LoopNestClauseOps loopNestClauseOps; + llvm::SmallVector iv; + genLoopNestClauses(converter, semaCtx, eval, item->clauses, loc, + loopNestClauseOps, iv); + + // Operation creation. + auto wsloopOp = + genWsloopWrapperOp(converter, semaCtx, eval, loc, wsloopClauseOps, + wsloopReductionSyms, wsloopReductionTypes); + + auto simdOp = genSimdWrapperOp(converter, semaCtx, eval, loc, simdClauseOps); + + // Construct wrapper entry block list and associated symbols. It is important + // that the symbol and block argument order match, so that the symbol-value + // bindings created are correct. + // TODO: Add omp.wsloop private and omp.simd private and reduction args. + auto wrapperArgs = llvm::to_vector(llvm::concat( + wsloopOp.getRegion().getArguments(), simdOp.getRegion().getArguments())); + + assert(wsloopReductionSyms.size() == wrapperArgs.size() && + "Number of symbols and wrapper block arguments must match"); + genLoopNestOp(converter, symTable, semaCtx, eval, loc, queue, item, +loopNestClauseOps, iv, wsloopReductionSyms, wrapperArgs, +llvm::omp::Directive::OMPD_do_simd, dsp); } static void genCompositeTaskloopSimd( diff --git a/flang/test/Lower/OpenMP/Todo/omp-do-simd-aligned.f90 b/flang/test/Lower/OpenMP/Todo/omp-do-simd-aligned.f90 deleted file mode 100644 index b62c54182442a..0 --- a/flang/test/Lower/OpenMP/Todo/omp-do-simd-aligned.f90 +++ /dev/null @@ -1,16 +0,0 @@ -! This test checks lowering of OpenMP do simd aligned() pragma - -! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s -! RUN: %not_todo_cmd
[llvm-branch-commits] [flang] [mlir] [Flang][OpenMP] Add lowering support for DO SIMD (PR #97718)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/97718 This patch adds support for lowering 'DO SIMD' constructs to MLIR. SIMD information is now stored in an `omp.simd` loop wrapper, which is currently ignored by the OpenMP dialect to LLVM IR translation stage. The end result is that runtime behavior of compiled 'DO SIMD' constructs does not change after this patch, so 'DO SIMD' still runs like 'DO' (i.e. SIMD width = 1). However, all of the required information is now present in the resulting MLIR representation. To avoid confusion, the previous wsloop-simd.f90 lit test is renamed to wsloop-schedule.f90 and a new wsloop-simd.f90 test is created to check the addition of SIMD clauses to the `omp.simd` operation produced when a 'DO SIMD' construct is lowered to MLIR. >From cca42d31ed43dee5521605f50f3fed380d034f70 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Thu, 4 Jul 2024 12:56:43 +0100 Subject: [PATCH] [Flang][OpenMP] Add lowering support for DO SIMD This patch adds support for lowering 'DO SIMD' constructs to MLIR. SIMD information is now stored in an `omp.simd` loop wrapper, which is currently ignored by the OpenMP dialect to LLVM IR translation stage. The end result is that runtime behavior of compiled 'DO SIMD' constructs does not change after this patch, so 'DO SIMD' still runs like 'DO' (i.e. SIMD width = 1). However, all of the required information is now present in the resulting MLIR representation. To avoid confusion, the previous wsloop-simd.f90 lit test is renamed to wsloop-schedule.f90 and a new wsloop-simd.f90 test is created to check the addition of SIMD clauses to the `omp.simd` operation produced when a 'DO SIMD' construct is lowered to MLIR. --- flang/lib/Lower/OpenMP/OpenMP.cpp | 49 .../Lower/OpenMP/Todo/omp-do-simd-aligned.f90 | 16 .../Lower/OpenMP/Todo/omp-do-simd-linear.f90 | 2 +- .../Lower/OpenMP/Todo/omp-do-simd-safelen.f90 | 14 .../Lower/OpenMP/Todo/omp-do-simd-simdlen.f90 | 14 flang/test/Lower/OpenMP/if-clause.f90 | 31 flang/test/Lower/OpenMP/loop-compound.f90 | 3 + flang/test/Lower/OpenMP/wsloop-schedule.f90 | 37 ++ flang/test/Lower/OpenMP/wsloop-simd.f90 | 74 +++ .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 3 + 10 files changed, 153 insertions(+), 90 deletions(-) delete mode 100644 flang/test/Lower/OpenMP/Todo/omp-do-simd-aligned.f90 delete mode 100644 flang/test/Lower/OpenMP/Todo/omp-do-simd-safelen.f90 delete mode 100644 flang/test/Lower/OpenMP/Todo/omp-do-simd-simdlen.f90 create mode 100644 flang/test/Lower/OpenMP/wsloop-schedule.f90 diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index 1830e31349cfb..9c23888a87173 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -2060,19 +2060,42 @@ static void genCompositeDoSimd(lower::AbstractConverter , const ConstructQueue , ConstructQueue::iterator item, DataSharingProcessor ) { - ClauseProcessor cp(converter, semaCtx, item->clauses); - cp.processTODO(loc, - llvm::omp::OMPD_do_simd); - // TODO: Add support for vectorization - add vectorization hints inside loop - // body. - // OpenMP standard does not specify the length of vector instructions. - // Currently we safely assume that for !$omp do simd pragma the SIMD length - // is equal to 1 (i.e. we generate standard workshare loop). - // When support for vectorization is enabled, then we need to add handling of - // if clause. Currently if clause can be skipped because we always assume - // SIMD length = 1. - genStandaloneDo(converter, symTable, semaCtx, eval, loc, queue, item, dsp); + lower::StatementContext stmtCtx; + + // Clause processing. + mlir::omp::WsloopClauseOps wsloopClauseOps; + llvm::SmallVector wsloopReductionSyms; + llvm::SmallVector wsloopReductionTypes; + genWsloopClauses(converter, semaCtx, stmtCtx, item->clauses, loc, + wsloopClauseOps, wsloopReductionTypes, wsloopReductionSyms); + + mlir::omp::SimdClauseOps simdClauseOps; + genSimdClauses(converter, semaCtx, item->clauses, loc, simdClauseOps); + + mlir::omp::LoopNestClauseOps loopNestClauseOps; + llvm::SmallVector iv; + genLoopNestClauses(converter, semaCtx, eval, item->clauses, loc, + loopNestClauseOps, iv); + + // Operation creation. + auto wsloopOp = + genWsloopWrapperOp(converter, semaCtx, eval, loc, wsloopClauseOps, + wsloopReductionSyms, wsloopReductionTypes); + + auto simdOp = genSimdWrapperOp(converter, semaCtx, eval, loc, simdClauseOps); + + // Construct wrapper entry block list and associated symbols. It is important + // that the symbol and block argument order match, so that the symbol-value + // bindings created are correct. +
[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)
@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr addrspace(3) %ptr, <2 x half> define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) %ptr, <2 x i16> %data) { ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret: ; GFX940: ; %bb.0: -; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24 +; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24 cdevadas wrote: Opened https://github.com/llvm/llvm-project/issues/97715. https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] NFC: Share DataSharingProcessor creation logic for all loop directives (PR #97565)
https://github.com/mjklemm approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/97565 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] NFC: Remove unused argument for omp.target lowering (PR #97564)
https://github.com/mjklemm approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/97564 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Prevent allocas from being inserted into loop wrappers (PR #97563)
https://github.com/tblah approved this pull request. Code changes look good. I would prefer to see a lit test for this code path. It is good enough if this is used by a test added later in your PR stack. Otherwise, please could you add a lit test which hits this condition so that we can catch if this gets broken by any changes in the future. https://github.com/llvm/llvm-project/pull/97563 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)
@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr addrspace(3) %ptr, <2 x half> define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) %ptr, <2 x i16> %data) { ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret: ; GFX940: ; %bb.0: -; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24 +; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24 arsenm wrote: Can you open an issue for this https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
@@ -1492,7 +1466,7 @@ genParallelOp(lower::AbstractConverter , lower::SymMap , firOpBuilder.createBlock(, /*insertPt=*/{}, allRegionArgTypes, allRegionArgLocs); -llvm::SmallVector allSymbols = reductionSyms; +llvm::SmallVector allSymbols(reductionSyms); tblah wrote: ultra-nit, feel free to ignore Personally I don't like using `()` here because it can parse (to humans and computers) like a function declaration. ```suggestion llvm::SmallVector allSymbols{reductionSyms}; ``` https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
https://github.com/tblah commented: This looks good overall. Do you expect there to be a lot more wrapper operations in the near future? If so, they look like there is a common pattern that could be further abstracted. Something like ```c++ template static OP genWrapperOp(..., llvm::ArrayRef extraBlockArgTypes) { fir::FirOpBuilder = ...; auto op = firOpBuilder.create(loc, clauseOps); llvm::SmallVector blockArgLocs(extraBlockArgTypes.size(), loc); firOpBuilder.createBlock(...) firOpBuilder.setInsertionPoint(...) return op; } ``` https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
@@ -518,8 +518,8 @@ struct OpWithBodyGenInfo { } OpWithBodyGenInfo & - setReductions(llvm::SmallVectorImpl *value1, -llvm::SmallVectorImpl *value2) { + setReductions(llvm::ArrayRef *value1, +llvm::ArrayRef *value2) { tblah wrote: Can we pass these by value instead of as pointers now? https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
wangpc-pp wrote: https://github.com/llvm/llvm-project/pull/97708 is splitted out for adding `FeaturePredictableSelectIsExpensive`. https://github.com/llvm/llvm-project/pull/80124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] NFC: Share DataSharingProcessor creation logic for all loop directives (PR #97565)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/97565 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96444 >From 4590c05051bbf57f0b269b2561aa1a7c74f06fbc Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 17:07:53 +0200 Subject: [PATCH] AMDGPU: Add subtarget feature for memory atomic fadd f64 --- llvm/lib/Target/AMDGPU/AMDGPU.td | 21 ++--- llvm/lib/Target/AMDGPU/BUFInstructions.td | 10 ++ llvm/lib/Target/AMDGPU/FLATInstructions.td | 6 +++--- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 10 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 2 +- 5 files changed, 31 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index bea233bfb27bd..94e8e77b3c052 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureFlatBufferGlobalAtomicFaddF64Inst + : SubtargetFeature<"flat-buffer-global-fadd-f64-inst", + "HasFlatBufferGlobalAtomicFaddF64Inst", + "true", + "Has flat, buffer, and global instructions for f64 atomic fadd" +>; + def FeatureMemoryAtomicFAddF32DenormalSupport : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", "HasMemoryAtomicFaddF32DenormalSupport", @@ -1390,7 +1397,8 @@ def FeatureISAVersion9_0_A : FeatureSet< FeatureBackOffBarrier, FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF64FlatInsts, + FeatureFlatBufferGlobalAtomicFaddF64Inst ])>; def FeatureISAVersion9_0_C : FeatureSet< @@ -1435,7 +1443,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, FeatureAgentScopeFineGrainedRemoteMemoryAtomics, - FeatureMemoryAtomicFAddF32DenormalSupport + FeatureMemoryAtomicFAddF32DenormalSupport, + FeatureFlatBufferGlobalAtomicFaddF64Inst ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1932,11 +1941,9 @@ def isGFX12Plus : def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">, AssemblerPredicate<(all_of FeatureFlatAddressSpace)>; - -def HasBufferFlatGlobalAtomicsF64 : // FIXME: Rename to show it's only for fadd - Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">, - // FIXME: This is too coarse, and working around using pseudo's predicates on real instruction. - AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, FeatureSouthernIslands, FeatureSeaIslands)>; +def HasFlatBufferGlobalAtomicFaddF64Inst : + Predicate<"Subtarget->hasFlatBufferGlobalAtomicFaddF64Inst()">, + AssemblerPredicate<(any_of FeatureFlatBufferGlobalAtomicFaddF64Inst)>; def HasAtomicFMinFMaxF32GlobalInsts : Predicate<"Subtarget->hasAtomicFMinFMaxF32GlobalInsts()">, diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 3b8d94b744000..a904c8483dbf5 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1312,14 +1312,16 @@ let SubtargetPredicate = isGFX90APlus in { } } // End SubtargetPredicate = isGFX90APlus -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", VReg_64, f64>; +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst +let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2 // depending on some subtargets. defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", VReg_64, f64>; defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", VReg_64, f64>; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} def BUFFER_INV : MUBUF_Invalidate<"buffer_inv"> { let SubtargetPredicate = isGFX940Plus; @@ -1836,9 +1838,9 @@ let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", v2f16, "BUFFER_ATOMIC_PK_ADD_F16", ["ret"]>; } // End SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f64, "BUFFER_ATOMIC_ADD_F64">; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, "BUFFER_ATOMIC_MIN_F64">; diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td index 4bf8f20269a15..16dc019ede810 100644 ---
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96443 >From b29740ba988e2286a7edc67b5a96c5dce0e600a6 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 16:44:08 +0200 Subject: [PATCH 1/3] AMDGPU: Add subtarget feature for global atomic fadd denormal support Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 14 -- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 7 +++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 3f35db8883716..51c077598df74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureMemoryAtomicFaddF32DenormalSupport + : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", + "HasAtomicMemoryAtomicFaddF32DenormalSupport", + "true", + "global/flat/buffer atomic fadd for float supports denormal handling" +>; + def FeatureAgentScopeFineGrainedRemoteMemoryAtomics : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics", "HasAgentScopeFineGrainedRemoteMemoryAtomics", @@ -1427,7 +1434,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, - FeatureAgentScopeFineGrainedRemoteMemoryAtomics + FeatureAgentScopeFineGrainedRemoteMemoryAtomics, + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1631,7 +1639,9 @@ def FeatureISAVersion12 : FeatureSet< FeatureScalarDwordx3Loads, FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, - Feature1_5xVGPRs]>; + Feature1_5xVGPRs, + FeatureMemoryAtomicFaddF32DenormalSupport]>; + ]>; def FeatureISAVersion12_Generic: FeatureSet< !listconcat(FeatureISAVersion12.Features, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 9e2a316a9ed28..db0b2b67a0388 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasAtomicFlatPkAdd16Insts = false; bool HasAtomicFaddRtnInsts = false; bool HasAtomicFaddNoRtnInsts = false; + bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false; bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false; bool HasAtomicBufferGlobalPkAddF16Insts = false; bool HasAtomicCSubNoRtnInsts = false; @@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; } + /// \return true if the target's flat, global, and buffer atomic fadd for + /// float supports denormal handling. + bool hasMemoryAtomicFaddF32DenormalSupport() const { +return HasAtomicMemoryAtomicFaddF32DenormalSupport; + } + /// \return true if atomic operations targeting fine-grained memory work /// correctly at device scope, in allocations in host or peer PCIe device /// memory. >From 84819ef58eb12bd3606fbf250b516793dbf36add Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 24 Jun 2024 12:10:37 +0200 Subject: [PATCH 2/3] Add to gfx11. RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 51c077598df74..370992eb81ff3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1547,7 +1547,8 @@ def FeatureISAVersion11_Common : FeatureSet< FeatureFlatAtomicFaddF32Inst, FeatureImageInsts, FeaturePackedTID, - FeatureVcmpxPermlaneHazard]>; + FeatureVcmpxPermlaneHazard, + FeatureMemoryAtomicFaddF32DenormalSupport]>; // There are few workarounds that need to be // added to all targets. This pessimizes codegen @@ -1640,7 +1641,7 @@ def FeatureISAVersion12 : FeatureSet< FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, Feature1_5xVGPRs, - FeatureMemoryAtomicFaddF32DenormalSupport]>; + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion12_Generic: FeatureSet< >From 5ef29a5699514b2c3956d1f44b8a0393b7c87004 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 11:30:51 +0200 Subject: [PATCH 3/3] Rename --- llvm/lib/Target/AMDGPU/AMDGPU.td | 10 +- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 370992eb81ff3..bea233bfb27bd 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96443 >From b633b82d58ecc55bb88eaefd87d5b3030799f2c0 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 16:44:08 +0200 Subject: [PATCH 1/3] AMDGPU: Add subtarget feature for global atomic fadd denormal support Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 14 -- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 7 +++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 3f35db8883716d..51c077598df749 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureMemoryAtomicFaddF32DenormalSupport + : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", + "HasAtomicMemoryAtomicFaddF32DenormalSupport", + "true", + "global/flat/buffer atomic fadd for float supports denormal handling" +>; + def FeatureAgentScopeFineGrainedRemoteMemoryAtomics : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics", "HasAgentScopeFineGrainedRemoteMemoryAtomics", @@ -1427,7 +1434,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, - FeatureAgentScopeFineGrainedRemoteMemoryAtomics + FeatureAgentScopeFineGrainedRemoteMemoryAtomics, + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1631,7 +1639,9 @@ def FeatureISAVersion12 : FeatureSet< FeatureScalarDwordx3Loads, FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, - Feature1_5xVGPRs]>; + Feature1_5xVGPRs, + FeatureMemoryAtomicFaddF32DenormalSupport]>; + ]>; def FeatureISAVersion12_Generic: FeatureSet< !listconcat(FeatureISAVersion12.Features, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 9e2a316a9ed28d..db0b2b67a03884 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasAtomicFlatPkAdd16Insts = false; bool HasAtomicFaddRtnInsts = false; bool HasAtomicFaddNoRtnInsts = false; + bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false; bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false; bool HasAtomicBufferGlobalPkAddF16Insts = false; bool HasAtomicCSubNoRtnInsts = false; @@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; } + /// \return true if the target's flat, global, and buffer atomic fadd for + /// float supports denormal handling. + bool hasMemoryAtomicFaddF32DenormalSupport() const { +return HasAtomicMemoryAtomicFaddF32DenormalSupport; + } + /// \return true if atomic operations targeting fine-grained memory work /// correctly at device scope, in allocations in host or peer PCIe device /// memory. >From 55ee6de3f2ac99d981ed57c3f933cc18e9a738a2 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 24 Jun 2024 12:10:37 +0200 Subject: [PATCH 2/3] Add to gfx11. RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 51c077598df749..370992eb81ff33 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1547,7 +1547,8 @@ def FeatureISAVersion11_Common : FeatureSet< FeatureFlatAtomicFaddF32Inst, FeatureImageInsts, FeaturePackedTID, - FeatureVcmpxPermlaneHazard]>; + FeatureVcmpxPermlaneHazard, + FeatureMemoryAtomicFaddF32DenormalSupport]>; // There are few workarounds that need to be // added to all targets. This pessimizes codegen @@ -1640,7 +1641,7 @@ def FeatureISAVersion12 : FeatureSet< FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, Feature1_5xVGPRs, - FeatureMemoryAtomicFaddF32DenormalSupport]>; + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion12_Generic: FeatureSet< >From 573e7bcdedc1fd56575f580d0fafd9e1ad9dbc96 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 11:30:51 +0200 Subject: [PATCH 3/3] Rename --- llvm/lib/Target/AMDGPU/AMDGPU.td | 10 +- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 370992eb81ff33..bea233bfb27bd6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
wangpc-pp wrote: Ping. I'd like to push this forward because we don't take branch probabilities into consideration now. Example: https://godbolt.org/z/doGhYadKM We should use branches instead of selects in this case and this patch (the enabling of SelectOpt) will optimize this. `clang -O3 -march=rv64gc_zba_zbb_zbc_zbs_zicond -Xclang -target-feature -Xclang +enable-select-opt -Xclang -target-feature -Xclang +predictable-select-expensive` ``` New Select group with %. = select i1 %cmp, i32 5, i32 13, !prof !9 Analyzing select group containing %. = select i1 %cmp, i32 5, i32 13, !prof !9 Converted to branch because of highly predictable branch. ``` ```asm func0: # @func0 li a2, 5 mul a1, a0, a0 bge a2, a0, .LBB0_2 addwa0, a1, a2 ret .LBB0_2:# %select.false li a2, 13 addwa0, a1, a2 ret ``` https://github.com/llvm/llvm-project/pull/80124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
https://github.com/wangpc-pp edited https://github.com/llvm/llvm-project/pull/80124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/80124 >From e3fb1fe7bdd4b7c24f9361c4d14dd1206fc8c067 Mon Sep 17 00:00:00 2001 From: wangpc Date: Sun, 18 Feb 2024 11:12:16 +0800 Subject: [PATCH 1/2] Move after addIRPasses Created using spr 1.3.4 --- llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index fdf1c023fff87..7a26e1956424c 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -450,15 +450,15 @@ void RISCVPassConfig::addIRPasses() { if (EnableLoopDataPrefetch) addPass(createLoopDataPrefetchPass()); -if (EnableSelectOpt && getOptLevel() == CodeGenOptLevel::Aggressive) - addPass(createSelectOptimizePass()); - addPass(createRISCVGatherScatterLoweringPass()); addPass(createInterleavedAccessPass()); addPass(createRISCVCodeGenPreparePass()); } TargetPassConfig::addIRPasses(); + + if (getOptLevel() == CodeGenOptLevel::Aggressive && EnableSelectOpt) +addPass(createSelectOptimizePass()); } bool RISCVPassConfig::addPreISel() { >From 5d5398596dc30c47c67572ec20137fb3f9434940 Mon Sep 17 00:00:00 2001 From: wangpc Date: Wed, 21 Feb 2024 21:21:28 +0800 Subject: [PATCH 2/2] Fix test Created using spr 1.3.4 --- llvm/test/CodeGen/RISCV/O3-pipeline.ll | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll index 62c1af52e6c20..8b52e3fe7b2f1 100644 --- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -34,15 +34,6 @@ ; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Scalar Evolution Analysis ; CHECK-NEXT: Loop Data Prefetch -; CHECK-NEXT: Post-Dominator Tree Construction -; CHECK-NEXT: Branch Probability Analysis -; CHECK-NEXT: Block Frequency Analysis -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis -; CHECK-NEXT: Optimization Remark Emitter -; CHECK-NEXT: Optimize selects -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: RISC-V gather/scatter lowering ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: RISC-V CodeGenPrepare @@ -77,6 +68,15 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: Post-Dominator Tree Construction +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter +; CHECK-NEXT: Optimize selects +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support select optimization (PR #80124)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/80124 >From e3fb1fe7bdd4b7c24f9361c4d14dd1206fc8c067 Mon Sep 17 00:00:00 2001 From: wangpc Date: Sun, 18 Feb 2024 11:12:16 +0800 Subject: [PATCH 1/2] Move after addIRPasses Created using spr 1.3.4 --- llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index fdf1c023fff87..7a26e1956424c 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -450,15 +450,15 @@ void RISCVPassConfig::addIRPasses() { if (EnableLoopDataPrefetch) addPass(createLoopDataPrefetchPass()); -if (EnableSelectOpt && getOptLevel() == CodeGenOptLevel::Aggressive) - addPass(createSelectOptimizePass()); - addPass(createRISCVGatherScatterLoweringPass()); addPass(createInterleavedAccessPass()); addPass(createRISCVCodeGenPreparePass()); } TargetPassConfig::addIRPasses(); + + if (getOptLevel() == CodeGenOptLevel::Aggressive && EnableSelectOpt) +addPass(createSelectOptimizePass()); } bool RISCVPassConfig::addPreISel() { >From 5d5398596dc30c47c67572ec20137fb3f9434940 Mon Sep 17 00:00:00 2001 From: wangpc Date: Wed, 21 Feb 2024 21:21:28 +0800 Subject: [PATCH 2/2] Fix test Created using spr 1.3.4 --- llvm/test/CodeGen/RISCV/O3-pipeline.ll | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll index 62c1af52e6c20..8b52e3fe7b2f1 100644 --- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -34,15 +34,6 @@ ; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Scalar Evolution Analysis ; CHECK-NEXT: Loop Data Prefetch -; CHECK-NEXT: Post-Dominator Tree Construction -; CHECK-NEXT: Branch Probability Analysis -; CHECK-NEXT: Block Frequency Analysis -; CHECK-NEXT: Lazy Branch Probability Analysis -; CHECK-NEXT: Lazy Block Frequency Analysis -; CHECK-NEXT: Optimization Remark Emitter -; CHECK-NEXT: Optimize selects -; CHECK-NEXT: Dominator Tree Construction -; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: RISC-V gather/scatter lowering ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: RISC-V CodeGenPrepare @@ -77,6 +68,15 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: Post-Dominator Tree Construction +; CHECK-NEXT: Branch Probability Analysis +; CHECK-NEXT: Block Frequency Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter +; CHECK-NEXT: Optimize selects +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [RISCV][lld] Support merging RISC-V Atomics ABI attributes (PR #97347)
MaskRay wrote: > [RISCV][lld] ... I usually omit `[RISCV]` when the title already contains `RISC-V` or `RISCV`... https://github.com/llvm/llvm-project/pull/97347 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [RISCV][lld] Support merging RISC-V Atomics ABI attributes (PR #97347)
https://github.com/MaskRay approved this pull request. https://github.com/llvm/llvm-project/pull/97347 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)" (PR #90267)
https://github.com/MaskRay approved this pull request. https://github.com/llvm/llvm-project/pull/90267 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] 68b8f5f - Revert "[MLIR][Vector] Generalize DropUnitDimFromElementwiseOps to non leadin…"
Author: Han-Chung Wang Date: 2024-07-03T16:02:17-07:00 New Revision: 68b8f5f684395f5057731f1dc67d27493d7660fa URL: https://github.com/llvm/llvm-project/commit/68b8f5f684395f5057731f1dc67d27493d7660fa DIFF: https://github.com/llvm/llvm-project/commit/68b8f5f684395f5057731f1dc67d27493d7660fa.diff LOG: Revert "[MLIR][Vector] Generalize DropUnitDimFromElementwiseOps to non leadin…" This reverts commit 2c06fb899966b49ff0fe4adf55fceb7d1941fbca. Added: Modified: mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp mlir/test/Dialect/Vector/vector-transfer-flatten.mlir Removed: diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp index c7d3022eff4d3..da5954b70a2ec 100644 --- a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp +++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp @@ -1622,27 +1622,7 @@ struct ChainedReduction final : OpRewritePattern { } }; -// Scalable unit dimensions are not supported. Folding such dimensions would -// require "shifting" the scalable flag onto some other fixed-width dim (e.g. -// vector<[1]x4xf32> -> vector<[4]xf32>). This could be implemented in the -// future. -static VectorType dropNonScalableUnitDimFromType(VectorType inVecTy) { - auto inVecShape = inVecTy.getShape(); - SmallVector newShape; - SmallVector newScalableDims; - for (auto [dim, isScalable] : - llvm::zip_equal(inVecShape, inVecTy.getScalableDims())) { -if (dim == 1 && !isScalable) - continue; - -newShape.push_back(dim); -newScalableDims.push_back(isScalable); - } - - return VectorType::get(newShape, inVecTy.getElementType(), newScalableDims); -} - -/// For vectors with at least an unit dim, replaces: +/// For vectors with either leading or trailing unit dim, replaces: /// elementwise(a, b) /// with: /// sc_a = shape_cast(a) @@ -1654,16 +1634,20 @@ static VectorType dropNonScalableUnitDimFromType(VectorType inVecTy) { /// required to be rank > 1. /// /// Ex: +/// ``` /// %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32> /// %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32> +/// ``` /// /// gets converted to: /// +/// ``` /// %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32> /// %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32> /// %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32> /// %cast_new = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32> /// %cast = vector.shape_cast %cast_new : vector<1x[4]xf32> to vector<[4]xf32> +/// ``` /// /// Patterns for folding shape_casts should instantly eliminate `%cast_new` and /// `%cast`. @@ -1683,29 +1667,42 @@ struct DropUnitDimFromElementwiseOps final // guaranteed to have identical shapes (with some exceptions such as // `arith.select`) and it suffices to only check one of them. auto sourceVectorType = dyn_cast(op->getOperand(0).getType()); -if (!sourceVectorType || sourceVectorType.getRank() < 2) +if (!sourceVectorType) + return failure(); +if (sourceVectorType.getRank() < 2) + return failure(); + +bool hasTrailingDimUnitFixed = +((sourceVectorType.getShape().back() == 1) && + (!sourceVectorType.getScalableDims().back())); +bool hasLeadingDimUnitFixed = +((sourceVectorType.getShape().front() == 1) && + (!sourceVectorType.getScalableDims().front())); +if (!hasLeadingDimUnitFixed && !hasTrailingDimUnitFixed) return failure(); +// Drop leading/trailing unit dim by applying vector.shape_cast to all +// operands +int64_t dim = hasLeadingDimUnitFixed ? 0 : sourceVectorType.getRank() - 1; + SmallVector newOperands; auto loc = op->getLoc(); for (auto operand : op->getOperands()) { auto opVectorType = cast(operand.getType()); - auto newVType = dropNonScalableUnitDimFromType(opVectorType); - if (newVType == opVectorType) -return rewriter.notifyMatchFailure(op, "No unit dimension to remove."); - + VectorType newVType = VectorType::Builder(opVectorType).dropDim(dim); auto opSC = rewriter.create(loc, newVType, operand); newOperands.push_back(opSC); } VectorType newResultVectorType = -dropNonScalableUnitDimFromType(resultVectorType); -// Create an updated elementwise Op without unit dim. +VectorType::Builder(resultVectorType).dropDim(dim); +// Create an updated elementwise Op without leading/trailing unit dim Operation *elementwiseOp = rewriter.create(loc, op->getName().getIdentifier(), newOperands, newResultVectorType, op->getAttrs()); -// Restore the unit dim by applying vector.shape_cast to the result. +// Restore the leading/trailing unit dim by applying vector.shape_cast +//
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96444 >From 308e31175185edc0d1aba78653b137c6a6f53a0e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 17:07:53 +0200 Subject: [PATCH] AMDGPU: Add subtarget feature for memory atomic fadd f64 --- llvm/lib/Target/AMDGPU/AMDGPU.td | 21 ++--- llvm/lib/Target/AMDGPU/BUFInstructions.td | 10 ++ llvm/lib/Target/AMDGPU/FLATInstructions.td | 6 +++--- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 10 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 2 +- 5 files changed, 31 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index bea233bfb27bd..94e8e77b3c052 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureFlatBufferGlobalAtomicFaddF64Inst + : SubtargetFeature<"flat-buffer-global-fadd-f64-inst", + "HasFlatBufferGlobalAtomicFaddF64Inst", + "true", + "Has flat, buffer, and global instructions for f64 atomic fadd" +>; + def FeatureMemoryAtomicFAddF32DenormalSupport : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", "HasMemoryAtomicFaddF32DenormalSupport", @@ -1390,7 +1397,8 @@ def FeatureISAVersion9_0_A : FeatureSet< FeatureBackOffBarrier, FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF64FlatInsts, + FeatureFlatBufferGlobalAtomicFaddF64Inst ])>; def FeatureISAVersion9_0_C : FeatureSet< @@ -1435,7 +1443,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, FeatureAgentScopeFineGrainedRemoteMemoryAtomics, - FeatureMemoryAtomicFAddF32DenormalSupport + FeatureMemoryAtomicFAddF32DenormalSupport, + FeatureFlatBufferGlobalAtomicFaddF64Inst ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1932,11 +1941,9 @@ def isGFX12Plus : def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">, AssemblerPredicate<(all_of FeatureFlatAddressSpace)>; - -def HasBufferFlatGlobalAtomicsF64 : // FIXME: Rename to show it's only for fadd - Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">, - // FIXME: This is too coarse, and working around using pseudo's predicates on real instruction. - AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, FeatureSouthernIslands, FeatureSeaIslands)>; +def HasFlatBufferGlobalAtomicFaddF64Inst : + Predicate<"Subtarget->hasFlatBufferGlobalAtomicFaddF64Inst()">, + AssemblerPredicate<(any_of FeatureFlatBufferGlobalAtomicFaddF64Inst)>; def HasAtomicFMinFMaxF32GlobalInsts : Predicate<"Subtarget->hasAtomicFMinFMaxF32GlobalInsts()">, diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 3b8d94b744000..a904c8483dbf5 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1312,14 +1312,16 @@ let SubtargetPredicate = isGFX90APlus in { } } // End SubtargetPredicate = isGFX90APlus -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", VReg_64, f64>; +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst +let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2 // depending on some subtargets. defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", VReg_64, f64>; defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", VReg_64, f64>; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} def BUFFER_INV : MUBUF_Invalidate<"buffer_inv"> { let SubtargetPredicate = isGFX940Plus; @@ -1836,9 +1838,9 @@ let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", v2f16, "BUFFER_ATOMIC_PK_ADD_F16", ["ret"]>; } // End SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f64, "BUFFER_ATOMIC_ADD_F64">; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, "BUFFER_ATOMIC_MIN_F64">; diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td index 4bf8f20269a15..16dc019ede810 100644 ---
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96443 >From 637bb436aa8472c2380364e573219c2a7524fdb1 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 16:44:08 +0200 Subject: [PATCH 1/3] AMDGPU: Add subtarget feature for global atomic fadd denormal support Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 14 -- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 7 +++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 3f35db8883716..51c077598df74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureMemoryAtomicFaddF32DenormalSupport + : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", + "HasAtomicMemoryAtomicFaddF32DenormalSupport", + "true", + "global/flat/buffer atomic fadd for float supports denormal handling" +>; + def FeatureAgentScopeFineGrainedRemoteMemoryAtomics : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics", "HasAgentScopeFineGrainedRemoteMemoryAtomics", @@ -1427,7 +1434,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, - FeatureAgentScopeFineGrainedRemoteMemoryAtomics + FeatureAgentScopeFineGrainedRemoteMemoryAtomics, + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1631,7 +1639,9 @@ def FeatureISAVersion12 : FeatureSet< FeatureScalarDwordx3Loads, FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, - Feature1_5xVGPRs]>; + Feature1_5xVGPRs, + FeatureMemoryAtomicFaddF32DenormalSupport]>; + ]>; def FeatureISAVersion12_Generic: FeatureSet< !listconcat(FeatureISAVersion12.Features, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 9e2a316a9ed28..db0b2b67a0388 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasAtomicFlatPkAdd16Insts = false; bool HasAtomicFaddRtnInsts = false; bool HasAtomicFaddNoRtnInsts = false; + bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false; bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false; bool HasAtomicBufferGlobalPkAddF16Insts = false; bool HasAtomicCSubNoRtnInsts = false; @@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; } + /// \return true if the target's flat, global, and buffer atomic fadd for + /// float supports denormal handling. + bool hasMemoryAtomicFaddF32DenormalSupport() const { +return HasAtomicMemoryAtomicFaddF32DenormalSupport; + } + /// \return true if atomic operations targeting fine-grained memory work /// correctly at device scope, in allocations in host or peer PCIe device /// memory. >From d954785fffda502d8325cca1ffb6a0adc15dc54a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 24 Jun 2024 12:10:37 +0200 Subject: [PATCH 2/3] Add to gfx11. RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 51c077598df74..370992eb81ff3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1547,7 +1547,8 @@ def FeatureISAVersion11_Common : FeatureSet< FeatureFlatAtomicFaddF32Inst, FeatureImageInsts, FeaturePackedTID, - FeatureVcmpxPermlaneHazard]>; + FeatureVcmpxPermlaneHazard, + FeatureMemoryAtomicFaddF32DenormalSupport]>; // There are few workarounds that need to be // added to all targets. This pessimizes codegen @@ -1640,7 +1641,7 @@ def FeatureISAVersion12 : FeatureSet< FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, Feature1_5xVGPRs, - FeatureMemoryAtomicFaddF32DenormalSupport]>; + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion12_Generic: FeatureSet< >From deebca23726296fff2892f9c780e3049db64749a Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 11:30:51 +0200 Subject: [PATCH 3/3] Rename --- llvm/lib/Target/AMDGPU/AMDGPU.td | 10 +- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 370992eb81ff3..bea233bfb27bd 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
maksfb wrote: Could you please reword the summary and add an example where the new matching technique helps. https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; + break; +case HashFunction::XXH3: + outs() << "xxh3\n"; + break; +} + } + YamlProfileToFunction.resize(YamlBP.Functions.size() + 1); + + // Computes hash for binary functions. + if (opts::MatchProfileWithFunctionHash) { +for (auto &[_, BF] : BC.getBinaryFunctions()) { + BF.computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } else if (!opts::IgnoreHash) { +for (BinaryFunction *BF : ProfileBFs) { + if (!BF) +continue; + BF->computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } + + size_t MatchedWithExactName = matchWithExactName(); + size_t MatchedWithHash = matchWithHash(BC); + size_t MatchedWithLTOCommonName = matchWithLTOCommonName(); maksfb wrote: nit: make them `const`. https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm-objcopy] Support CREL (PR #97521)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/97521 >From 9bedda3fa950fbb418a53945f6e36da9a7582e3b Mon Sep 17 00:00:00 2001 From: Fangrui Song Date: Wed, 3 Jul 2024 11:45:26 -0700 Subject: [PATCH] fix header Created using spr 1.3.5-bogner --- llvm/include/llvm/ADT/bit.h| 1 - llvm/include/llvm/MC/MCELFExtras.h | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/include/llvm/ADT/bit.h b/llvm/include/llvm/ADT/bit.h index 1c8bd46648256..c42b5e686bdc9 100644 --- a/llvm/include/llvm/ADT/bit.h +++ b/llvm/include/llvm/ADT/bit.h @@ -14,7 +14,6 @@ #ifndef LLVM_ADT_BIT_H #define LLVM_ADT_BIT_H -#include "llvm/ADT/bit.h" #include "llvm/Support/Compiler.h" #include #include diff --git a/llvm/include/llvm/MC/MCELFExtras.h b/llvm/include/llvm/MC/MCELFExtras.h index 0f0c10edca2cf..498d477fbedc4 100644 --- a/llvm/include/llvm/MC/MCELFExtras.h +++ b/llvm/include/llvm/MC/MCELFExtras.h @@ -10,6 +10,7 @@ #define LLVM_MC_MCELFEXTRAS_H #include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/bit.h" #include "llvm/BinaryFormat/ELF.h" #include "llvm/Support/LEB128.h" #include "llvm/Support/raw_ostream.h" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
MaskRay wrote: > Not that the patch is especially long/complicated, but could be split into > the refactor/move of the MC function, then the new usage, if you like (usual > reasons - smaller patches are easier to root cause, functionality can be > reverted without thrashing the refactored code (or refactored code can be > reverted if issues are found in that before the usage goes in), etc) The body of `encodeCrel` is a simple move. Even with a signature change, the two parts (extract and adapt assembler + support llvm-objcopy) could still be considered separate. However, some reviewers might prefer seeing both parts together for a better understanding of the extracted API. Based on the comments from jh7370 and smithp35, the extraction seems reasonable. **How about I landing the extraction part separately after receiving official feedback? I will then rebase this llvm-objcopy patch.** (I maintain patches in a stack and ensure the final one https://github.com/MaskRay/llvm-project/commits/demo-crel/ passes a local integration test. There is some process inconvenience given that the llvm-objdump PR also modifies Object and has been approved yet...) https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/96596 >From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:00:59 -0700 Subject: [PATCH 01/14] Added call to matchWithCallsAsAnchors Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index aafffac3d4b1c..1a0e5d239d252 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF)) matchProfileToFunction(YamlBF, *BF); + uint64_t MatchedWithCallsAsAnchors = 0; + matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:11:43 -0700 Subject: [PATCH 02/14] Changed CallHashToBF representation Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 1a0e5d239d252..91b01a99c7485 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -29,6 +29,10 @@ static llvm::cl::opt cl::desc("ignore hash while reading function profile"), cl::Hidden, cl::cat(BoltOptCategory)); +llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors", + cl::desc("Matches with calls as anchors"), + cl::Hidden, cl::cat(BoltOptCategory)); + llvm::cl::opt ProfileUseDFS("profile-use-dfs", cl::desc("use DFS order for YAML profile"), cl::Hidden, cl::cat(BoltOptCategory)); @@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( llvm_unreachable("Unhandled HashFunction"); }; - std::unordered_map CallHashToBF; + std::unordered_map CallHashToBF; for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { if (ProfiledFunctions.count(BF)) @@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors( for (const std::string : FunctionNames) HashString.append(FunctionName); } -CallHashToBF.emplace(ComputeCallHash(HashString), BF); +CallHashToBF[ComputeCallHash(HashString)] = BF; } std::unordered_map ProfiledFunctionIdToName; - for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions) + for (const yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name; for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) { @@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( auto It = CallHashToBF.find(Hash); if (It == CallHashToBF.end()) continue; -matchProfileToFunction(YamlBF, It->second); +matchProfileToFunction(YamlBF, *It->second); ++MatchedWithCallsAsAnchors; } } @@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { matchProfileToFunction(YamlBF, *BF); uint64_t MatchedWithCallsAsAnchors = 0; - matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + if (opts::MatchWithCallsAsAnchors) +matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) >From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Tue, 25 Jun 2024 09:28:39 -0700 Subject: [PATCH 03/14] Changed BF called FunctionNames to multiset Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 91b01a99c7485..3b3d73f7af023 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string HashString; for (const auto : BF->blocks()) { - std::set FunctionNames; + std::multiset FunctionNames; for (const MCInst : BB) { // Skip non-call instructions. if (!BC.MIB->isCall(Instr)) @@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string = ProfiledFunctionIdToName[CallSite.DestId]; FunctionNames.insert(FunctionName); } - for (const std::string : FunctionNames) { + for
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
@@ -220,17 +245,27 @@ class StaleMatcher { return BestBlock; } - /// Returns true if the two basic blocks (in the binary and in the profile) - /// corresponding to the given hashes are matched to each other with a high - /// confidence. - static bool isHighConfidenceMatch(BlendedBlockHash Hash1, -BlendedBlockHash Hash2) { -return Hash1.InstrHash == Hash2.InstrHash; + // Uses CallHash to find the most similar block for a given hash. + const FlowBlock *matchWithCalls(BlendedBlockHash , aaupov wrote: ditto https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
@@ -193,18 +193,43 @@ class StaleMatcher { public: /// Initialize stale matcher. void init(const std::vector , -const std::vector ) { +const std::vector , +const std::vector ) { assert(Blocks.size() == Hashes.size() && + Hashes.size() == CallHashes.size() && "incorrect matcher initialization"); for (size_t I = 0; I < Blocks.size(); I++) { FlowBlock *Block = Blocks[I]; uint16_t OpHash = Hashes[I].OpcodeHash; OpHashToBlocks[OpHash].push_back(std::make_pair(Hashes[I], Block)); + if (CallHashes[I]) +CallHashToBlocks[CallHashes[I]].push_back( +std::make_pair(Hashes[I], Block)); } } /// Find the most similar block for a given hash. - const FlowBlock *matchBlock(BlendedBlockHash BlendedHash) const { + const FlowBlock *matchBlock(BlendedBlockHash , + uint64_t ) const { +const FlowBlock *BestBlock = matchWithOpcodes(BlendedHash); +return BestBlock ? BestBlock : matchWithCalls(BlendedHash, CallHash); + } + + /// Returns true if the two basic blocks (in the binary and in the profile) + /// corresponding to the given hashes are matched to each other with a high + /// confidence. + static bool isHighConfidenceMatch(BlendedBlockHash Hash1, +BlendedBlockHash Hash2) { +return Hash1.InstrHash == Hash2.InstrHash; + } + +private: + using HashBlockPairType = std::pair; + std::unordered_map> OpHashToBlocks; + std::unordered_map> CallHashToBlocks; + + // Uses OpcodeHash to find the most similar block for a given hash. + const FlowBlock *matchWithOpcodes(BlendedBlockHash ) const { aaupov wrote: ditto https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
@@ -193,18 +193,43 @@ class StaleMatcher { public: /// Initialize stale matcher. void init(const std::vector , -const std::vector ) { +const std::vector , +const std::vector ) { assert(Blocks.size() == Hashes.size() && + Hashes.size() == CallHashes.size() && "incorrect matcher initialization"); for (size_t I = 0; I < Blocks.size(); I++) { FlowBlock *Block = Blocks[I]; uint16_t OpHash = Hashes[I].OpcodeHash; OpHashToBlocks[OpHash].push_back(std::make_pair(Hashes[I], Block)); + if (CallHashes[I]) +CallHashToBlocks[CallHashes[I]].push_back( +std::make_pair(Hashes[I], Block)); } } /// Find the most similar block for a given hash. - const FlowBlock *matchBlock(BlendedBlockHash BlendedHash) const { + const FlowBlock *matchBlock(BlendedBlockHash , + uint64_t ) const { aaupov wrote: ```suggestion const FlowBlock *matchBlock(BlendedBlockHash BlendedHash, uint64_t CallHash) const { ``` BlendedBlockHash is aliased to uint64_t, and integral types should be passed by value. https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
@@ -412,33 +447,62 @@ createFlowFunction(const BinaryFunction::BasicBlockOrderType ) { /// of the basic blocks in the binary, the count is "matched" to the block. /// Similarly, if both the source and the target of a count in the profile are /// matched to a jump in the binary, the count is recorded in CFG. -size_t matchWeightsByHashes( -BinaryContext , const BinaryFunction::BasicBlockOrderType , -const yaml::bolt::BinaryFunctionProfile , FlowFunction ) { +size_t +matchWeightsByHashes(BinaryContext , + const DenseMap , + const BinaryFunction::BasicBlockOrderType , + const yaml::bolt::BinaryFunctionProfile , + FlowFunction , HashFunction HashFunction) { aaupov wrote: ```suggestion size_t matchWeightsByHashes(BinaryContext , const BinaryFunction::BasicBlockOrderType , const yaml::bolt::BinaryFunctionProfile , FlowFunction , HashFunction HashFunction, const DenseMap ) { ``` It's customary to add new parameters to the end https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/aaupov commented: Sorry, couple of final comments https://github.com/llvm/llvm-project/pull/96596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
MaskRay wrote: > [jh7370](https://github.com/jh7370) I've skimmed briefly and the changes look > reasonable - will look more in depth on a separate occasion when I have more > time. Thanks! > Not for this PR, but I wonder if there would be some benefit in a > `--decode-crel` and/or `--encode-crel` option that would convert an object > file to/from using CREL. I feel like this might be useful for > experimentation, or for handling the case where an object was generated with > CREL but needs to be usable by an older tool that doesn't understand CREL. > Equally, it could be useful for retroactively encoding CREL when the feature > wasn't used during original creation of the object. Thoughts? Agreed that the `CREL => RELA` conversion will be useful to make CREL better interchange format - allow old linkers to build new relocatable files and allow other tools for analysis tasks. That is probably a long-term goal. In the short-term I aim for providing a complete toolchain (assembler,linker,objcopy/strip,objdump) for the most important use case (compile + assemble + (strip)? + link). > [smithp35](https://github.com/smithp35) Only a couple of small comments from > me. I'll be out of the office till Monday next week, I'm fine for others to > progress this wihout me. Thanks! Take your time. https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; aaupov wrote: @ayermolo, we didn't switch Profile component to BC logger class. That would be a separate effort. https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
@@ -1861,7 +1886,15 @@ template Error ELFBuilder::readSections(bool EnsureSymtab) { const typename ELFFile::Elf_Shdr *Shdr = Sections->begin() + RelSec->Index; - if (RelSec->Type == SHT_REL) { + if (RelSec->Type == SHT_CREL) { +auto Rels = ElfFile.crels(*Shdr); MaskRay wrote: Agreed. Renamed to `RelsOrRelas` https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
@@ -0,0 +1,60 @@ +//===- MCELFExtras.h - Extra functions for ELF --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_MC_MCELFEXTRAS_H +#define LLVM_MC_MCELFEXTRAS_H + +#include "llvm/ADT/STLExtras.h" +#include "llvm/BinaryFormat/ELF.h" +#include "llvm/Support/LEB128.h" +#include "llvm/Support/raw_ostream.h" + +#include +#include + +namespace llvm::ELF { MaskRay wrote: Thanks for the comment suggestion. Added. https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)
@@ -207,6 +209,43 @@ bool isSectionInSegment(const typename ELFT::Phdr , checkSectionVMA(Phdr, Sec); } +template +Error decodeCrel(ArrayRef Content, + function_ref HdrHandler, MaskRay wrote: thx for the suggestion. adopted https://github.com/llvm/llvm-project/pull/97382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)
@@ -207,6 +209,43 @@ bool isSectionInSegment(const typename ELFT::Phdr , checkSectionVMA(Phdr, Sec); } +template MaskRay wrote: thx for the suggestion. adopted https://github.com/llvm/llvm-project/pull/97382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/97382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/97382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/96596 >From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:00:59 -0700 Subject: [PATCH 01/13] Added call to matchWithCallsAsAnchors Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index aafffac3d4b1c..1a0e5d239d252 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF)) matchProfileToFunction(YamlBF, *BF); + uint64_t MatchedWithCallsAsAnchors = 0; + matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:11:43 -0700 Subject: [PATCH 02/13] Changed CallHashToBF representation Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 1a0e5d239d252..91b01a99c7485 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -29,6 +29,10 @@ static llvm::cl::opt cl::desc("ignore hash while reading function profile"), cl::Hidden, cl::cat(BoltOptCategory)); +llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors", + cl::desc("Matches with calls as anchors"), + cl::Hidden, cl::cat(BoltOptCategory)); + llvm::cl::opt ProfileUseDFS("profile-use-dfs", cl::desc("use DFS order for YAML profile"), cl::Hidden, cl::cat(BoltOptCategory)); @@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( llvm_unreachable("Unhandled HashFunction"); }; - std::unordered_map CallHashToBF; + std::unordered_map CallHashToBF; for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { if (ProfiledFunctions.count(BF)) @@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors( for (const std::string : FunctionNames) HashString.append(FunctionName); } -CallHashToBF.emplace(ComputeCallHash(HashString), BF); +CallHashToBF[ComputeCallHash(HashString)] = BF; } std::unordered_map ProfiledFunctionIdToName; - for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions) + for (const yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name; for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) { @@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( auto It = CallHashToBF.find(Hash); if (It == CallHashToBF.end()) continue; -matchProfileToFunction(YamlBF, It->second); +matchProfileToFunction(YamlBF, *It->second); ++MatchedWithCallsAsAnchors; } } @@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { matchProfileToFunction(YamlBF, *BF); uint64_t MatchedWithCallsAsAnchors = 0; - matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + if (opts::MatchWithCallsAsAnchors) +matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) >From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Tue, 25 Jun 2024 09:28:39 -0700 Subject: [PATCH 03/13] Changed BF called FunctionNames to multiset Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 91b01a99c7485..3b3d73f7af023 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string HashString; for (const auto : BF->blocks()) { - std::set FunctionNames; + std::multiset FunctionNames; for (const MCInst : BB) { // Skip non-call instructions. if (!BC.MIB->isCall(Instr)) @@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string = ProfiledFunctionIdToName[CallSite.DestId]; FunctionNames.insert(FunctionName); } - for (const std::string : FunctionNames) { + for
[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/96596 >From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:00:59 -0700 Subject: [PATCH 01/13] Added call to matchWithCallsAsAnchors Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index aafffac3d4b1c..1a0e5d239d252 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF)) matchProfileToFunction(YamlBF, *BF); + uint64_t MatchedWithCallsAsAnchors = 0; + matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 24 Jun 2024 23:11:43 -0700 Subject: [PATCH 02/13] Changed CallHashToBF representation Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 1a0e5d239d252..91b01a99c7485 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -29,6 +29,10 @@ static llvm::cl::opt cl::desc("ignore hash while reading function profile"), cl::Hidden, cl::cat(BoltOptCategory)); +llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors", + cl::desc("Matches with calls as anchors"), + cl::Hidden, cl::cat(BoltOptCategory)); + llvm::cl::opt ProfileUseDFS("profile-use-dfs", cl::desc("use DFS order for YAML profile"), cl::Hidden, cl::cat(BoltOptCategory)); @@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( llvm_unreachable("Unhandled HashFunction"); }; - std::unordered_map CallHashToBF; + std::unordered_map CallHashToBF; for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { if (ProfiledFunctions.count(BF)) @@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors( for (const std::string : FunctionNames) HashString.append(FunctionName); } -CallHashToBF.emplace(ComputeCallHash(HashString), BF); +CallHashToBF[ComputeCallHash(HashString)] = BF; } std::unordered_map ProfiledFunctionIdToName; - for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions) + for (const yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name; for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) { @@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( auto It = CallHashToBF.find(Hash); if (It == CallHashToBF.end()) continue; -matchProfileToFunction(YamlBF, It->second); +matchProfileToFunction(YamlBF, *It->second); ++MatchedWithCallsAsAnchors; } } @@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { matchProfileToFunction(YamlBF, *BF); uint64_t MatchedWithCallsAsAnchors = 0; - matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); + if (opts::MatchWithCallsAsAnchors) +matchWithCallsAsAnchors(BC, MatchedWithCallsAsAnchors); for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) >From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Tue, 25 Jun 2024 09:28:39 -0700 Subject: [PATCH 03/13] Changed BF called FunctionNames to multiset Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 91b01a99c7485..3b3d73f7af023 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string HashString; for (const auto : BF->blocks()) { - std::set FunctionNames; + std::multiset FunctionNames; for (const MCInst : BB) { // Skip non-call instructions. if (!BC.MIB->isCall(Instr)) @@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors( std::string = ProfiledFunctionIdToName[CallSite.DestId]; FunctionNames.insert(FunctionName); } - for (const std::string : FunctionNames) { + for
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; shawbyoung wrote: I'm erring on the side of making minimal code change - although it's showing up on gh as code added, I haven't touched the prologue of readProfile. If you see the large "deleted" section above (starting on the original line 353 of YAMLProfileReader.cpp) it's the exact same. So, I'd like to keep this PR just about refactoring function matching. https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/shawbyoung edited https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; + break; +case HashFunction::XXH3: + outs() << "xxh3\n"; + break; +} + } + YamlProfileToFunction.resize(YamlBP.Functions.size() + 1); + + // Computes hash for binary functions. + if (opts::MatchProfileWithFunctionHash) { +for (auto &[_, BF] : BC.getBinaryFunctions()) { + BF.computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } else if (!opts::IgnoreHash) { +for (BinaryFunction *BF : ProfileBFs) { + if (!BF) +continue; + BF->computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } + + size_t MatchedWithExactName = matchWithExactName(); shawbyoung wrote: In lines 481 - 487 > if (opts::Verbosity >= 1) { >outs() << "BOLT-INFO: matched " << MatchedWithExactName > << " functions with identical names\n"; > ... Match counts are directed to outs() https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; ayermolo wrote: BC.outs() https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -456,6 +435,39 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { ++MatchedWithLTOCommonName; } } + return MatchedWithLTOCommonName; +} + +Error YAMLProfileReader::readProfile(BinaryContext ) { + if (opts::Verbosity >= 1) { +outs() << "BOLT-INFO: YAML profile with hash: "; +switch (YamlBP.Header.HashFunction) { +case HashFunction::StdHash: + outs() << "std::hash\n"; + break; +case HashFunction::XXH3: + outs() << "xxh3\n"; + break; +} + } + YamlProfileToFunction.resize(YamlBP.Functions.size() + 1); + + // Computes hash for binary functions. + if (opts::MatchProfileWithFunctionHash) { +for (auto &[_, BF] : BC.getBinaryFunctions()) { + BF.computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } else if (!opts::IgnoreHash) { +for (BinaryFunction *BF : ProfileBFs) { + if (!BF) +continue; + BF->computeHash(YamlBP.Header.IsDFSOrder, YamlBP.Header.HashFunction); +} + } + + size_t MatchedWithExactName = matchWithExactName(); ayermolo wrote: This doesn't look like it's used anywhere? https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/97502 >From c6212e4b26b0f0d8abde323fa5fc04ecc6dd34fd Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Wed, 3 Jul 2024 09:45:46 -0700 Subject: [PATCH 1/2] Changed profileMatches comment Created using spr 1.3.4 --- bolt/include/bolt/Profile/YAMLProfileReader.h | 2 +- bolt/lib/Profile/YAMLProfileReader.cpp| 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/bolt/include/bolt/Profile/YAMLProfileReader.h b/bolt/include/bolt/Profile/YAMLProfileReader.h index a5bd3544bd999..627cebf5d9453 100644 --- a/bolt/include/bolt/Profile/YAMLProfileReader.h +++ b/bolt/include/bolt/Profile/YAMLProfileReader.h @@ -73,7 +73,7 @@ class YAMLProfileReader : public ProfileReaderBase { bool parseFunctionProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); - /// Returns block cnt equality if IgnoreHash is true, otherwise, hash equality + /// Checks if a function profile matches a binary function. bool profileMatches(const yaml::bolt::BinaryFunctionProfile , BinaryFunction ); diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index e8ce187367899..91628d950e9f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -333,6 +333,7 @@ Error YAMLProfileReader::preprocessProfile(BinaryContext ) { return Error::success(); } + bool YAMLProfileReader::profileMatches( const yaml::bolt::BinaryFunctionProfile , BinaryFunction ) { if (opts::IgnoreHash) >From 1f48f09228b54e410910c2186cf0c3a73400bfd3 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Wed, 3 Jul 2024 10:26:27 -0700 Subject: [PATCH 2/2] Changing profileMatches BF param to const Created using spr 1.3.4 --- bolt/include/bolt/Profile/YAMLProfileReader.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bolt/include/bolt/Profile/YAMLProfileReader.h b/bolt/include/bolt/Profile/YAMLProfileReader.h index 627cebf5d9453..fe9f349de278d 100644 --- a/bolt/include/bolt/Profile/YAMLProfileReader.h +++ b/bolt/include/bolt/Profile/YAMLProfileReader.h @@ -75,7 +75,7 @@ class YAMLProfileReader : public ProfileReaderBase { /// Checks if a function profile matches a binary function. bool profileMatches(const yaml::bolt::BinaryFunctionProfile , - BinaryFunction ); + const BinaryFunction ); /// Infer function profile from stale data (collected on older binaries). bool inferStaleProfile(BinaryFunction , ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/aaupov approved this pull request. LG % nit https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -334,6 +334,13 @@ Error YAMLProfileReader::preprocessProfile(BinaryContext ) { return Error::success(); } +bool YAMLProfileReader::profileMatches( +const yaml::bolt::BinaryFunctionProfile , BinaryFunction ) { aaupov wrote: ```suggestion const yaml::bolt::BinaryFunctionProfile , const BinaryFunction ) { ``` https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96444 >From 5945915a9a9f0caf3ed890ce450a25cff58ef608 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 17:07:53 +0200 Subject: [PATCH] AMDGPU: Add subtarget feature for memory atomic fadd f64 --- llvm/lib/Target/AMDGPU/AMDGPU.td | 21 ++--- llvm/lib/Target/AMDGPU/BUFInstructions.td | 10 ++ llvm/lib/Target/AMDGPU/FLATInstructions.td | 6 +++--- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 10 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 2 +- 5 files changed, 31 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index bea233bfb27bd..94e8e77b3c052 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureFlatBufferGlobalAtomicFaddF64Inst + : SubtargetFeature<"flat-buffer-global-fadd-f64-inst", + "HasFlatBufferGlobalAtomicFaddF64Inst", + "true", + "Has flat, buffer, and global instructions for f64 atomic fadd" +>; + def FeatureMemoryAtomicFAddF32DenormalSupport : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", "HasMemoryAtomicFaddF32DenormalSupport", @@ -1390,7 +1397,8 @@ def FeatureISAVersion9_0_A : FeatureSet< FeatureBackOffBarrier, FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF64FlatInsts, + FeatureFlatBufferGlobalAtomicFaddF64Inst ])>; def FeatureISAVersion9_0_C : FeatureSet< @@ -1435,7 +1443,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, FeatureAgentScopeFineGrainedRemoteMemoryAtomics, - FeatureMemoryAtomicFAddF32DenormalSupport + FeatureMemoryAtomicFAddF32DenormalSupport, + FeatureFlatBufferGlobalAtomicFaddF64Inst ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1932,11 +1941,9 @@ def isGFX12Plus : def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">, AssemblerPredicate<(all_of FeatureFlatAddressSpace)>; - -def HasBufferFlatGlobalAtomicsF64 : // FIXME: Rename to show it's only for fadd - Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">, - // FIXME: This is too coarse, and working around using pseudo's predicates on real instruction. - AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, FeatureSouthernIslands, FeatureSeaIslands)>; +def HasFlatBufferGlobalAtomicFaddF64Inst : + Predicate<"Subtarget->hasFlatBufferGlobalAtomicFaddF64Inst()">, + AssemblerPredicate<(any_of FeatureFlatBufferGlobalAtomicFaddF64Inst)>; def HasAtomicFMinFMaxF32GlobalInsts : Predicate<"Subtarget->hasAtomicFMinFMaxF32GlobalInsts()">, diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 3b8d94b744000..a904c8483dbf5 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1312,14 +1312,16 @@ let SubtargetPredicate = isGFX90APlus in { } } // End SubtargetPredicate = isGFX90APlus -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", VReg_64, f64>; +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst +let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2 // depending on some subtargets. defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", VReg_64, f64>; defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", VReg_64, f64>; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} def BUFFER_INV : MUBUF_Invalidate<"buffer_inv"> { let SubtargetPredicate = isGFX940Plus; @@ -1836,9 +1838,9 @@ let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", v2f16, "BUFFER_ATOMIC_PK_ADD_F16", ["ret"]>; } // End SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts -let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in { +let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f64, "BUFFER_ATOMIC_ADD_F64">; -} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 +} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in { defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, "BUFFER_ATOMIC_MIN_F64">; diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td index 4bf8f20269a15..16dc019ede810 100644 ---
[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96443 >From dfefb503c35bb1744bffed759221d12f654c99d8 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 23 Jun 2024 16:44:08 +0200 Subject: [PATCH 1/3] AMDGPU: Add subtarget feature for global atomic fadd denormal support Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 14 -- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 7 +++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 3f35db8883716..51c077598df74 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst "Has flat_atomic_add_f32 instruction" >; +def FeatureMemoryAtomicFaddF32DenormalSupport + : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support", + "HasAtomicMemoryAtomicFaddF32DenormalSupport", + "true", + "global/flat/buffer atomic fadd for float supports denormal handling" +>; + def FeatureAgentScopeFineGrainedRemoteMemoryAtomics : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics", "HasAgentScopeFineGrainedRemoteMemoryAtomics", @@ -1427,7 +1434,8 @@ def FeatureISAVersion9_4_Common : FeatureSet< FeatureKernargPreload, FeatureAtomicFMinFMaxF64GlobalInsts, FeatureAtomicFMinFMaxF64FlatInsts, - FeatureAgentScopeFineGrainedRemoteMemoryAtomics + FeatureAgentScopeFineGrainedRemoteMemoryAtomics, + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion9_4_0 : FeatureSet< @@ -1631,7 +1639,9 @@ def FeatureISAVersion12 : FeatureSet< FeatureScalarDwordx3Loads, FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, - Feature1_5xVGPRs]>; + Feature1_5xVGPRs, + FeatureMemoryAtomicFaddF32DenormalSupport]>; + ]>; def FeatureISAVersion12_Generic: FeatureSet< !listconcat(FeatureISAVersion12.Features, diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 9e2a316a9ed28..db0b2b67a0388 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool HasAtomicFlatPkAdd16Insts = false; bool HasAtomicFaddRtnInsts = false; bool HasAtomicFaddNoRtnInsts = false; + bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false; bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false; bool HasAtomicBufferGlobalPkAddF16Insts = false; bool HasAtomicCSubNoRtnInsts = false; @@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo, bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; } + /// \return true if the target's flat, global, and buffer atomic fadd for + /// float supports denormal handling. + bool hasMemoryAtomicFaddF32DenormalSupport() const { +return HasAtomicMemoryAtomicFaddF32DenormalSupport; + } + /// \return true if atomic operations targeting fine-grained memory work /// correctly at device scope, in allocations in host or peer PCIe device /// memory. >From 09c73116a884c6de72f98fa859c7c56295f5b8eb Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 24 Jun 2024 12:10:37 +0200 Subject: [PATCH 2/3] Add to gfx11. RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 51c077598df74..370992eb81ff3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1547,7 +1547,8 @@ def FeatureISAVersion11_Common : FeatureSet< FeatureFlatAtomicFaddF32Inst, FeatureImageInsts, FeaturePackedTID, - FeatureVcmpxPermlaneHazard]>; + FeatureVcmpxPermlaneHazard, + FeatureMemoryAtomicFaddF32DenormalSupport]>; // There are few workarounds that need to be // added to all targets. This pessimizes codegen @@ -1640,7 +1641,7 @@ def FeatureISAVersion12 : FeatureSet< FeatureDPPSrc1SGPR, FeatureMaxHardClauseLength32, Feature1_5xVGPRs, - FeatureMemoryAtomicFaddF32DenormalSupport]>; + FeatureMemoryAtomicFaddF32DenormalSupport ]>; def FeatureISAVersion12_Generic: FeatureSet< >From 9cf93c6ce502adf460d2432f29cc2aa3c0ccdd68 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 11:30:51 +0200 Subject: [PATCH 3/3] Rename --- llvm/lib/Target/AMDGPU/AMDGPU.td | 10 +- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 370992eb81ff3..bea233bfb27bd 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/8] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/95884 >From 34652b2eebc62218c50a23509ce99937385c30e6 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:42:00 -0700 Subject: [PATCH 1/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 73 -- 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index 66cabc236f4b2..c9f6d88f0b13a 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -424,36 +424,75 @@ Error YAMLProfileReader::readProfile(BinaryContext ) { // Uses name similarity to match functions that were not matched by name. uint64_t MatchedWithDemangledName = 0; - if (opts::NameSimilarityFunctionMatchingThreshold > 0) { - -std::unordered_map NameToBinaryFunction; -NameToBinaryFunction.reserve(BC.getBinaryFunctions().size()); -for (auto &[_, BF] : BC.getBinaryFunctions()) { + if (opts::NameSimilarityFunctionMatchingThreshold > 0) { +auto DemangleName = [&](const char* String) { int Status = 0; - char *DemangledName = abi::__cxa_demangle(BF.getOneName().str().c_str(), + char *DemangledName = abi::__cxa_demangle(String, nullptr, nullptr, ); - if (Status == 0) -NameToBinaryFunction[std::string(DemangledName)] = + return Status == 0 ? new std::string(DemangledName) : nullptr; +}; + +auto DeriveNameSpace = [&](std::string DemangledName) { + size_t LParen = std::string(DemangledName).find("("); + std::string FunctionName = std::string(DemangledName).substr(0, LParen); + size_t ScopeResolutionOperator = std::string(FunctionName).rfind("::"); + return ScopeResolutionOperator == std::string::npos ? std::string("") : std::string(DemangledName).substr(0, ScopeResolutionOperator); +}; + +std::unordered_map> NamespaceToBFs; +NamespaceToBFs.reserve(BC.getBinaryFunctions().size()); + +for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { + std::string* DemangledName = DemangleName(BF->getOneName().str().c_str()); + if (!DemangledName) +continue; + std::string Namespace = DeriveNameSpace(*DemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) +NamespaceToBFs[Namespace] = {BF}; + else +It->second.push_back(BF); } for (auto YamlBF : YamlBP.Functions) { if (YamlBF.Used) continue; - int Status = 0; - char *DemangledName = - abi::__cxa_demangle(YamlBF.Name.c_str(), nullptr, nullptr, ); - if (Status != 0) + std::string* YamlBFDemangledName = DemangleName(YamlBF.Name.c_str()); + if (!YamlBFDemangledName) continue; - auto It = NameToBinaryFunction.find(DemangledName); - if (It == NameToBinaryFunction.end()) + std::string Namespace = DeriveNameSpace(*YamlBFDemangledName); + auto It = NamespaceToBFs.find(Namespace); + if (It == NamespaceToBFs.end()) continue; - BinaryFunction *BF = It->second; - matchProfileToFunction(YamlBF, *BF); - ++MatchedWithDemangledName; + std::vector BFs = It->second; + + unsigned MinEditDistance = UINT_MAX; + BinaryFunction *ClosestNameBF = nullptr; + + for (BinaryFunction *BF : BFs) { +if (ProfiledFunctions.count(BF)) + continue; +std::string *BFDemangledName = DemangleName(BF->getOneName().str().c_str()); +if (!BFDemangledName) + continue; +unsigned BFEditDistance = StringRef(*BFDemangledName).edit_distance(*YamlBFDemangledName); +if (BFEditDistance < MinEditDistance) { + MinEditDistance = BFEditDistance; + ClosestNameBF = BF; +} + } + + if (ClosestNameBF && +MinEditDistance < opts::NameSimilarityFunctionMatchingThreshold) { +matchProfileToFunction(YamlBF, *ClosestNameBF); +++MatchedWithDemangledName; + } } } + outs() << MatchedWithDemangledName << ": functions matched by name similarity\n"; + for (yaml::bolt::BinaryFunctionProfile : YamlBP.Functions) if (!YamlBF.Used && opts::Verbosity >= 1) errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name >From 2d23bbd6b9ce4f0786ae8ceb39b1b008b4ca9c4d Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Thu, 20 Jun 2024 23:45:27 -0700 Subject: [PATCH 2/7] spr amend Created using spr 1.3.4 --- bolt/lib/Profile/YAMLProfileReader.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index c9f6d88f0b13a..cf4a5393df8f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -491,8 +491,6 @@ Error YAMLProfileReader::readProfile(BinaryContext ) {
[llvm-branch-commits] [llvm] [AArch64] Only create called thunks when hardening against SLS (PR #97472)
@@ -36,38 +32,43 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" +static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; + namespace { -class AArch64SLSHardening : public MachineFunctionPass { -public: - const TargetInstrInfo *TII; - const TargetRegisterInfo *TRI; - const AArch64Subtarget *ST; +// Set of inserted thunks: bitmask with bits corresponding to +// indexes in SLSBLRThunks array. +typedef uint32_t ThunksSet; atrosinenko wrote: Here is the PR: #97605 https://github.com/llvm/llvm-project/pull/97472 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/97502 >From c6212e4b26b0f0d8abde323fa5fc04ecc6dd34fd Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Wed, 3 Jul 2024 09:45:46 -0700 Subject: [PATCH] Changed profileMatches comment Created using spr 1.3.4 --- bolt/include/bolt/Profile/YAMLProfileReader.h | 2 +- bolt/lib/Profile/YAMLProfileReader.cpp| 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/bolt/include/bolt/Profile/YAMLProfileReader.h b/bolt/include/bolt/Profile/YAMLProfileReader.h index a5bd3544bd999..627cebf5d9453 100644 --- a/bolt/include/bolt/Profile/YAMLProfileReader.h +++ b/bolt/include/bolt/Profile/YAMLProfileReader.h @@ -73,7 +73,7 @@ class YAMLProfileReader : public ProfileReaderBase { bool parseFunctionProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); - /// Returns block cnt equality if IgnoreHash is true, otherwise, hash equality + /// Checks if a function profile matches a binary function. bool profileMatches(const yaml::bolt::BinaryFunctionProfile , BinaryFunction ); diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp index e8ce187367899..91628d950e9f4 100644 --- a/bolt/lib/Profile/YAMLProfileReader.cpp +++ b/bolt/lib/Profile/YAMLProfileReader.cpp @@ -333,6 +333,7 @@ Error YAMLProfileReader::preprocessProfile(BinaryContext ) { return Error::success(); } + bool YAMLProfileReader::profileMatches( const yaml::bolt::BinaryFunctionProfile , BinaryFunction ) { if (opts::IgnoreHash) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Anatoly Trosinenko (atrosinenko) Changes Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. --- Patch is 23.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97605.diff 2 Files Affected: - (modified) llvm/lib/Target/AArch64/AArch64SLSHardening.cpp (+222-104) - (added) llvm/test/CodeGen/AArch64/speculation-hardening-sls-blra.mir (+210) ``diff diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp index feb166f30127a..d93fe2a875845 100644 --- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp +++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp @@ -13,6 +13,7 @@ #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" +#include "llvm/ADT/StringSwitch.h" #include "llvm/CodeGen/IndirectThunks.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunction.h" @@ -23,6 +24,7 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FormatVariadic.h" #include "llvm/Target/TargetMachine.h" #include @@ -32,17 +34,103 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" -static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; +// Common name prefix of all thunks generated by this pass. +// +// The generic form is +// __llvm_slsblr_thunk_xNfor BLR thunks +// __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks +// __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks +static constexpr StringRef CommonNamePrefix = "__llvm_slsblr_thunk_"; namespace { -// Set of inserted thunks: bitmask with bits corresponding to -// indexes in SLSBLRThunks array. -typedef uint32_t ThunksSet; +struct ThunkKind { + enum ThunkKindId { +ThunkBR, +ThunkBRAA, +ThunkBRAB, +ThunkBRAAZ, +ThunkBRABZ, + }; + + ThunkKindId Id; + StringRef NameInfix; + bool HasXmOperand; + bool NeedsPAuth; + + // Opcode to perform indirect jump from inside the thunk. + unsigned BROpcode; + + static const ThunkKind BR; + static const ThunkKind BRAA; + static const ThunkKind BRAB; + static const ThunkKind BRAAZ; + static const ThunkKind BRABZ; +}; + +// Set of inserted thunks. +class ThunksSet { +public: + static constexpr unsigned NumXRegisters = 32; + + // Given Xn register, returns n. + static unsigned indexOfXReg(Register Xn); + // Given n, returns Xn register. + static Register xRegByIndex(unsigned N); + + ThunksSet |=(const ThunksSet ) { +BLRThunks |= Other.BLRThunks; +BLRAAZThunks |= Other.BLRAAZThunks; +BLRABZThunks |= Other.BLRABZThunks; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRAAThunks[I] |= Other.BLRAAThunks[I]; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRABThunks[I] |= Other.BLRABThunks[I]; + +return *this; + } + + bool get(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +return getBitmask(Kind, Xm) & XnBit; + } + + void set(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +getBitmask(Kind, Xm) |= XnBit; + } + +private: + // Bitmasks representing operands used, with n-th bit corresponding to Xn + // register operand. If the instruction has a second operand (Xm), an array + // of bitmasks is used, indexed by m. + // Indexes corresponding to the forbidden x16, x17 and x30 registers are + // always unset, for simplicity there are no holes. + uint32_t BLRThunks = 0; + uint32_t BLRAAZThunks = 0; + uint32_t BLRABZThunks = 0; + uint32_t BLRAAThunks[NumXRegisters] = {}; + uint32_t BLRABThunks[NumXRegisters] = {}; + + uint32_t (ThunkKind::ThunkKindId Kind, Register Xm) { +switch (Kind) { +case ThunkKind::ThunkBR: + return BLRThunks; +case ThunkKind::ThunkBRAAZ: + return BLRAAZThunks; +case ThunkKind::ThunkBRABZ: + return BLRABZThunks; +case ThunkKind::ThunkBRAA: + return BLRAAThunks[indexOfXReg(Xm)]; +case ThunkKind::ThunkBRAB: + return BLRABThunks[indexOfXReg(Xm)]; +} + } +}; struct SLSHardeningInserter : ThunkInserter { public: - const char *getThunkPrefix() { return SLSBLRNamePrefix; } + const char *getThunkPrefix() { return CommonNamePrefix.data(); } bool mayUseThunk(const MachineFunction ) { // FIXME: ComdatThunks is only accumulated until the first thunk is created. ComdatThunks &= !MF.getSubtarget().hardenSlsNoComdat(); @@ -69,6
[llvm-branch-commits] [llvm] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (PR #97605)
https://github.com/atrosinenko created https://github.com/llvm/llvm-project/pull/97605 Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. >From b389284b8e92f5bf09cea38f3f9a53974a84dc29 Mon Sep 17 00:00:00 2001 From: Anatoly Trosinenko Date: Mon, 1 Jul 2024 20:13:54 +0300 Subject: [PATCH] [AArch64][PAC] Support BLRA* instructions in SLS Hardening pass Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xNfor BLR thunks __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. --- .../Target/AArch64/AArch64SLSHardening.cpp| 326 -- .../speculation-hardening-sls-blra.mir| 210 +++ 2 files changed, 432 insertions(+), 104 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/speculation-hardening-sls-blra.mir diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp index feb166f30127a..d93fe2a875845 100644 --- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp +++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp @@ -13,6 +13,7 @@ #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" +#include "llvm/ADT/StringSwitch.h" #include "llvm/CodeGen/IndirectThunks.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunction.h" @@ -23,6 +24,7 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Pass.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Support/FormatVariadic.h" #include "llvm/Target/TargetMachine.h" #include @@ -32,17 +34,103 @@ using namespace llvm; #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass" -static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_"; +// Common name prefix of all thunks generated by this pass. +// +// The generic form is +// __llvm_slsblr_thunk_xNfor BLR thunks +// __llvm_slsblr_thunk_(aaz|abz)_xN for BLRAAZ and BLRABZ thunks +// __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks +static constexpr StringRef CommonNamePrefix = "__llvm_slsblr_thunk_"; namespace { -// Set of inserted thunks: bitmask with bits corresponding to -// indexes in SLSBLRThunks array. -typedef uint32_t ThunksSet; +struct ThunkKind { + enum ThunkKindId { +ThunkBR, +ThunkBRAA, +ThunkBRAB, +ThunkBRAAZ, +ThunkBRABZ, + }; + + ThunkKindId Id; + StringRef NameInfix; + bool HasXmOperand; + bool NeedsPAuth; + + // Opcode to perform indirect jump from inside the thunk. + unsigned BROpcode; + + static const ThunkKind BR; + static const ThunkKind BRAA; + static const ThunkKind BRAB; + static const ThunkKind BRAAZ; + static const ThunkKind BRABZ; +}; + +// Set of inserted thunks. +class ThunksSet { +public: + static constexpr unsigned NumXRegisters = 32; + + // Given Xn register, returns n. + static unsigned indexOfXReg(Register Xn); + // Given n, returns Xn register. + static Register xRegByIndex(unsigned N); + + ThunksSet |=(const ThunksSet ) { +BLRThunks |= Other.BLRThunks; +BLRAAZThunks |= Other.BLRAAZThunks; +BLRABZThunks |= Other.BLRABZThunks; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRAAThunks[I] |= Other.BLRAAThunks[I]; +for (unsigned I = 0; I < NumXRegisters; ++I) + BLRABThunks[I] |= Other.BLRABThunks[I]; + +return *this; + } + + bool get(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +return getBitmask(Kind, Xm) & XnBit; + } + + void set(ThunkKind::ThunkKindId Kind, Register Xn, Register Xm) { +uint32_t XnBit = 1u << indexOfXReg(Xn); +getBitmask(Kind, Xm) |= XnBit; + } + +private: + // Bitmasks representing operands used, with n-th bit corresponding to Xn + // register operand. If the instruction has a second operand (Xm), an array + // of bitmasks is used, indexed by m. + // Indexes corresponding to the forbidden x16, x17 and x30 registers are + // always unset, for simplicity there are no holes. + uint32_t BLRThunks = 0; + uint32_t BLRAAZThunks = 0; + uint32_t BLRABZThunks = 0; + uint32_t BLRAAThunks[NumXRegisters] = {}; + uint32_t BLRABThunks[NumXRegisters] = {}; + + uint32_t (ThunkKind::ThunkKindId Kind, Register Xm) { +switch (Kind) { +case ThunkKind::ThunkBR: + return BLRThunks; +case ThunkKind::ThunkBRAAZ: + return BLRAAZThunks; +case ThunkKind::ThunkBRABZ:
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/shawbyoung edited https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -73,13 +73,26 @@ class YAMLProfileReader : public ProfileReaderBase { bool parseFunctionProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); + /// Returns block cnt equality if IgnoreHash is true, otherwise, hash equality + bool profileMatches(const yaml::bolt::BinaryFunctionProfile , + BinaryFunction ); + /// Infer function profile from stale data (collected on older binaries). bool inferStaleProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); /// Initialize maps for profile matching. void buildNameMaps(BinaryContext ); + /// Matches functions using exact name. + size_t matchWithExactName(); shawbyoung wrote: I'm moving the different matching techniques into separate functions because it'll be easier to understand and prevent the YAMLProfileReader::readProfile function from getting abhorrently large as I'll be adding call graph function matching to it in a subsequent PR. I'll add this explanation to the description. https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/aaupov approved this pull request. LG with a couple of nits. https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction ) { return false; } +uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext ) { + uint64_t MatchedWithNameSimilarity = 0; + ItaniumPartialDemangler Demangler; + + // Demangle and derive namespace from function name. + auto DemangleName = [&](std::string ) { +StringRef RestoredName = NameResolver::restore(FunctionName); +return demangle(RestoredName); + }; + auto DeriveNameSpace = [&](std::string ) { +if (Demangler.partialDemangle(DemangledName.c_str())) + return std::string(""); +std::vector Buffer(DemangledName.begin(), DemangledName.end()); +size_t BufferSize = Buffer.size(); aaupov wrote: ```suggestion size_t BufferSize; ``` https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction ) { return false; } +uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext ) { + uint64_t MatchedWithNameSimilarity = 0; + ItaniumPartialDemangler Demangler; + + // Demangle and derive namespace from function name. + auto DemangleName = [&](std::string ) { +StringRef RestoredName = NameResolver::restore(FunctionName); +return demangle(RestoredName); + }; + auto DeriveNameSpace = [&](std::string ) { +if (Demangler.partialDemangle(DemangledName.c_str())) + return std::string(""); +std::vector Buffer(DemangledName.begin(), DemangledName.end()); +size_t BufferSize = Buffer.size(); +char *NameSpace = +Demangler.getFunctionDeclContextName([0], ); +return std::string(NameSpace, BufferSize); + }; + + // Maps namespaces to associated function block counts and gets profile + // function names and namespaces to minimize the number of BFs to process and + // avoid repeated name demangling/namespace derivation. + StringMap> NamespaceToProfiledBFSizes; + std::vector ProfileBFDemangledNames; + ProfileBFDemangledNames.reserve(YamlBP.Functions.size()); + std::vector ProfiledBFNamespaces; + ProfiledBFNamespaces.reserve(YamlBP.Functions.size()); + + for (auto : YamlBP.Functions) { +std::string YamlBFDemangledName = DemangleName(YamlBF.Name); +ProfileBFDemangledNames.push_back(YamlBFDemangledName); +std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName); +ProfiledBFNamespaces.push_back(YamlBFNamespace); +NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks); + } + + StringMap> NamespaceToBFs; + + // Maps namespaces to BFs excluding binary functions with no equal sized + // profiled functions belonging to the same namespace. + for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { +std::string DemangledName = BF->getDemangledName(); +std::string Namespace = DeriveNameSpace(DemangledName); + +auto NamespaceToProfiledBFSizesIt = +NamespaceToProfiledBFSizes.find(Namespace); +if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end()) aaupov wrote: ```suggestion // Skip if there are no ProfileBFs with a given \p Namespace. if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end()) ``` https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction ) { return false; } +uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext ) { + uint64_t MatchedWithNameSimilarity = 0; + ItaniumPartialDemangler Demangler; + + // Demangle and derive namespace from function name. + auto DemangleName = [&](std::string ) { +StringRef RestoredName = NameResolver::restore(FunctionName); +return demangle(RestoredName); + }; + auto DeriveNameSpace = [&](std::string ) { +if (Demangler.partialDemangle(DemangledName.c_str())) + return std::string(""); +std::vector Buffer(DemangledName.begin(), DemangledName.end()); +size_t BufferSize = Buffer.size(); +char *NameSpace = +Demangler.getFunctionDeclContextName([0], ); +return std::string(NameSpace, BufferSize); + }; + + // Maps namespaces to associated function block counts and gets profile + // function names and namespaces to minimize the number of BFs to process and + // avoid repeated name demangling/namespace derivation. + StringMap> NamespaceToProfiledBFSizes; + std::vector ProfileBFDemangledNames; + ProfileBFDemangledNames.reserve(YamlBP.Functions.size()); + std::vector ProfiledBFNamespaces; + ProfiledBFNamespaces.reserve(YamlBP.Functions.size()); + + for (auto : YamlBP.Functions) { +std::string YamlBFDemangledName = DemangleName(YamlBF.Name); +ProfileBFDemangledNames.push_back(YamlBFDemangledName); +std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName); +ProfiledBFNamespaces.push_back(YamlBFNamespace); +NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks); + } + + StringMap> NamespaceToBFs; + + // Maps namespaces to BFs excluding binary functions with no equal sized + // profiled functions belonging to the same namespace. + for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { +std::string DemangledName = BF->getDemangledName(); +std::string Namespace = DeriveNameSpace(DemangledName); + +auto NamespaceToProfiledBFSizesIt = +NamespaceToProfiledBFSizes.find(Namespace); +if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end()) + continue; +if (NamespaceToProfiledBFSizesIt->second.count(BF->size()) == 0) aaupov wrote: ```suggestion // Skip if there are no ProfileBFs in a given \p Namespace with // equal number of blocks. if (NamespaceToProfiledBFSizesIt->second.count(BF->size()) == 0) ``` https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction ) { return false; } +uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext ) { + uint64_t MatchedWithNameSimilarity = 0; + ItaniumPartialDemangler Demangler; + + // Demangle and derive namespace from function name. + auto DemangleName = [&](std::string ) { +StringRef RestoredName = NameResolver::restore(FunctionName); +return demangle(RestoredName); + }; + auto DeriveNameSpace = [&](std::string ) { +if (Demangler.partialDemangle(DemangledName.c_str())) + return std::string(""); +std::vector Buffer(DemangledName.begin(), DemangledName.end()); +size_t BufferSize = Buffer.size(); +char *NameSpace = +Demangler.getFunctionDeclContextName([0], ); +return std::string(NameSpace, BufferSize); + }; + + // Maps namespaces to associated function block counts and gets profile + // function names and namespaces to minimize the number of BFs to process and + // avoid repeated name demangling/namespace derivation. + StringMap> NamespaceToProfiledBFSizes; + std::vector ProfileBFDemangledNames; + ProfileBFDemangledNames.reserve(YamlBP.Functions.size()); + std::vector ProfiledBFNamespaces; + ProfiledBFNamespaces.reserve(YamlBP.Functions.size()); + + for (auto : YamlBP.Functions) { +std::string YamlBFDemangledName = DemangleName(YamlBF.Name); +ProfileBFDemangledNames.push_back(YamlBFDemangledName); +std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName); +ProfiledBFNamespaces.push_back(YamlBFNamespace); +NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks); + } + + StringMap> NamespaceToBFs; + + // Maps namespaces to BFs excluding binary functions with no equal sized + // profiled functions belonging to the same namespace. + for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { +std::string DemangledName = BF->getDemangledName(); +std::string Namespace = DeriveNameSpace(DemangledName); + +auto NamespaceToProfiledBFSizesIt = +NamespaceToProfiledBFSizes.find(Namespace); +if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end()) + continue; +if (NamespaceToProfiledBFSizesIt->second.count(BF->size()) == 0) + continue; +auto NamespaceToBFsIt = NamespaceToBFs.find(Namespace); +if (NamespaceToBFsIt == NamespaceToBFs.end()) + NamespaceToBFs[Namespace] = {BF}; +else + NamespaceToBFsIt->second.push_back(BF); + } + + // Iterates through all profiled functions and binary functions belonging to + // the same namespace and matches based on edit distance thresehold. aaupov wrote: ```suggestion // the same namespace and matches based on edit distance threshold. ``` https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction ) { return false; } +uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext ) { + uint64_t MatchedWithNameSimilarity = 0; + ItaniumPartialDemangler Demangler; + + // Demangle and derive namespace from function name. + auto DemangleName = [&](std::string ) { +StringRef RestoredName = NameResolver::restore(FunctionName); +return demangle(RestoredName); + }; + auto DeriveNameSpace = [&](std::string ) { +if (Demangler.partialDemangle(DemangledName.c_str())) + return std::string(""); +std::vector Buffer(DemangledName.begin(), DemangledName.end()); +size_t BufferSize = Buffer.size(); +char *NameSpace = +Demangler.getFunctionDeclContextName([0], ); +return std::string(NameSpace, BufferSize); + }; + + // Maps namespaces to associated function block counts and gets profile + // function names and namespaces to minimize the number of BFs to process and + // avoid repeated name demangling/namespace derivation. + StringMap> NamespaceToProfiledBFSizes; + std::vector ProfileBFDemangledNames; + ProfileBFDemangledNames.reserve(YamlBP.Functions.size()); + std::vector ProfiledBFNamespaces; + ProfiledBFNamespaces.reserve(YamlBP.Functions.size()); + + for (auto : YamlBP.Functions) { +std::string YamlBFDemangledName = DemangleName(YamlBF.Name); +ProfileBFDemangledNames.push_back(YamlBFDemangledName); +std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName); +ProfiledBFNamespaces.push_back(YamlBFNamespace); +NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks); + } + + StringMap> NamespaceToBFs; + + // Maps namespaces to BFs excluding binary functions with no equal sized + // profiled functions belonging to the same namespace. + for (BinaryFunction *BF : BC.getAllBinaryFunctions()) { +std::string DemangledName = BF->getDemangledName(); +std::string Namespace = DeriveNameSpace(DemangledName); + +auto NamespaceToProfiledBFSizesIt = +NamespaceToProfiledBFSizes.find(Namespace); +if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end()) + continue; +if (NamespaceToProfiledBFSizesIt->second.count(BF->size()) == 0) + continue; +auto NamespaceToBFsIt = NamespaceToBFs.find(Namespace); +if (NamespaceToBFsIt == NamespaceToBFs.end()) + NamespaceToBFs[Namespace] = {BF}; +else + NamespaceToBFsIt->second.push_back(BF); + } + + // Iterates through all profiled functions and binary functions belonging to + // the same namespace and matches based on edit distance thresehold. + assert(YamlBP.Functions.size() == ProfiledBFNamespaces.size() && + ProfiledBFNamespaces.size() == ProfileBFDemangledNames.size()); + for (size_t I = 0; I < YamlBP.Functions.size(); ++I) { +yaml::bolt::BinaryFunctionProfile = YamlBP.Functions[I]; +std::string = ProfiledBFNamespaces[I]; +if (YamlBF.Used) + continue; +auto It = NamespaceToBFs.find(YamlBFNamespace); aaupov wrote: ```suggestion // Skip if there are no BFs in a given \p Namespace. auto It = NamespaceToBFs.find(YamlBFNamespace); ``` https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)
https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/95884 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
dcci wrote: > I have a couple of general comments about this. Can you also please add a > description explaining what this patch does? i.e. why we're refactoring these functions. https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/dcci edited https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
https://github.com/dcci commented: I have a couple of general comments about this. Can you also please add a description explaining what this patch does? https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -73,13 +73,26 @@ class YAMLProfileReader : public ProfileReaderBase { bool parseFunctionProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); + /// Returns block cnt equality if IgnoreHash is true, otherwise, hash equality + bool profileMatches(const yaml::bolt::BinaryFunctionProfile , dcci wrote: I think this comment talks about the implementation more than the definition. Can you rephrase it? https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)
@@ -73,13 +73,26 @@ class YAMLProfileReader : public ProfileReaderBase { bool parseFunctionProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); + /// Returns block cnt equality if IgnoreHash is true, otherwise, hash equality + bool profileMatches(const yaml::bolt::BinaryFunctionProfile , + BinaryFunction ); + /// Infer function profile from stale data (collected on older binaries). bool inferStaleProfile(BinaryFunction , const yaml::bolt::BinaryFunctionProfile ); /// Initialize maps for profile matching. void buildNameMaps(BinaryContext ); + /// Matches functions using exact name. + size_t matchWithExactName(); dcci wrote: why you need these 3 different functions? https://github.com/llvm/llvm-project/pull/97502 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
@@ -1861,7 +1886,15 @@ template Error ELFBuilder::readSections(bool EnsureSymtab) { const typename ELFFile::Elf_Shdr *Shdr = Sections->begin() + RelSec->Index; - if (RelSec->Type == SHT_REL) { + if (RelSec->Type == SHT_CREL) { +auto Rels = ElfFile.crels(*Shdr); smithp35 wrote: Would `RelsOrRelas` be a better name as it will make the meaning of first and second more obvious at the point of use? https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
https://github.com/smithp35 edited https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)
https://github.com/smithp35 commented: Only a couple of small comments from me. I'll be out of the office till Monday next week, I'm fine for others to progress this wihout me. https://github.com/llvm/llvm-project/pull/97521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits