https://github.com/mingmingl-llvm created
https://github.com/llvm/llvm-project/pull/174921
The motivation is to support partition only read-only data sections (i.e.,
.rodata, .data.rel.ro) using data access profiles and skip read-write data
sections.
* Load tests show there are workloads that favors partition read-only data
sections and also workloads that favors partition all 4 static data sections.
One hypothesis is that packing hot variables in {.data, .bss} may cause
secondary effects like false sharing. This option will allow enabling the
read-only mode generally while we explore finer grained data layout (e.g.,
reorder global variables in data and bss to reduce false sharing penalty if
any) or understand the load tests better.
>From c9a69970c0dad984c46f3728849b6d3db1b50fb2 Mon Sep 17 00:00:00 2001
From: mingmingl <[email protected]>
Date: Wed, 7 Jan 2026 18:53:18 -0800
Subject: [PATCH] [StaticDataLayout][MemProf]Introduce an LLVM option to
specify one of read-only vs read-write
---
clang/lib/Driver/ToolChains/Clang.cpp | 2 +-
.../Driver/fpartition-static-data-sections.c | 2 +-
llvm/docs/MemProf.rst | 11 ++-
.../Transforms/Instrumentation/MemProfUse.h | 4 +-
llvm/lib/Passes/PassBuilderPipelines.cpp | 2 +-
llvm/lib/Passes/PassRegistry.def | 2 +-
.../Transforms/Instrumentation/MemProfUse.cpp | 51 ++++++++++--
.../PGOProfile/data-access-profile.ll | 78 ++++++++++++++-----
8 files changed, 118 insertions(+), 34 deletions(-)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp
b/clang/lib/Driver/ToolChains/Clang.cpp
index 699fc31f23946..ddb38a8e65d7f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6205,7 +6205,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction
&JA,
if ((Triple.isX86() || Triple.isAArch64()) && Triple.isOSBinFormatELF())
{
A->render(Args, CmdArgs);
CmdArgs.push_back("-mllvm");
- CmdArgs.push_back("-memprof-annotate-static-data-prefix");
+ CmdArgs.push_back("-memprof-annotate-static-data-type=readonly");
} else
D.Diag(diag::err_drv_unsupported_opt_for_target)
<< A->getAsString(Args) << TripleStr;
diff --git a/clang/test/Driver/fpartition-static-data-sections.c
b/clang/test/Driver/fpartition-static-data-sections.c
index b200d673bb7fa..f4999c218dbc5 100644
--- a/clang/test/Driver/fpartition-static-data-sections.c
+++ b/clang/test/Driver/fpartition-static-data-sections.c
@@ -9,7 +9,7 @@
// RUN: %clang -### --target=x86_64-linux -flto
-fpartition-static-data-sections -fno-partition-static-data-sections %s 2>&1 |
FileCheck %s --implicit-check-not="-plugin-opt=-fpartition-static-data-sections"
// OPT: "-fpartition-static-data-sections"
-// OPT: "-mllvm" "-memprof-annotate-static-data-prefix"
+// OPT: "-mllvm" "-memprof-annotate-static-data-type=readonly"
// ERR: error: unsupported option '-fpartition-static-data-sections' for target
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index a1a6d51ee6d2a..7b56a7e80c099 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -138,15 +138,22 @@ This feature uses a hybrid approach:
1. **Symbolizable Data:** Data with external or local linkage (tracked by the
symbol table) is partitioned based on data access profiles collected via
instrumentation (`PR <https://github.com/llvm/llvm-project/pull/142884>`_) or
hardware performance counters (e.g., Intel PEBS events such as
``MEM_INST_RETIRED.ALL_LOADS``).
2. **Module-Internal Data:** Data not tracked by the symbol table (e.g., jump
tables, constant pools, internal globals) has its hotness inferred from
standard PGO code execution profiles.
+.. FIXME: Update this with Clang driver option
-fpartition-static-data-sections and how it works with -profile-use and lld.
+
To enable this feature, pass the following flags to the compiler:
-* ``-memprof-annotate-static-data-prefix``: Enables annotation of global
variables in IR.
+* ``-memprof-annotate-static-data-type={none, readonly, readwrite}``
+ : Specifies which types of static data sections to annotate. Options are:
+
+ * ``none``: Do not annotate static data sections.
+ * ``readonly``: Annotate read-only static data sections (i.e.,
``.rodata``, ``.data.rel.ro``).
+ * ``readwrite``: Annotate both read-only (i.e., ``.rodata``,
``.data.rel.ro``) and read-write static data sections (i.e., ``.data``,
``.bss``).
* ``-split-static-data``: Enables partitioning of other data (like jump
tables) in the backend.
* ``-Wl,-z,keep-data-section-prefix``: Instructs the linker (LLD) to group
hot and cold data sections together.
.. code-block:: bash
- clang++ -fmemory-profile-use=memprof.memprofdata -mllvm
-memprof-annotate-static-data-prefix -mllvm -split-static-data -fuse-ld=lld
-Wl,-z,keep-data-section-prefix -O2 source.cpp -o optimized_app
+ clang++ -fmemory-profile-use=memprof.memprofdata -mllvm
-memprof-annotate-static-data-type=readonly -mllvm -split-static-data
-fuse-ld=lld -Wl,-z,keep-data-section-prefix -O2 source.cpp -o optimized_app
The optimized layout clusters hot static data, improving dTLB and cache
efficiency.
diff --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
index 1fbb2bcb194ef..fd851e4a0be7a 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
@@ -24,6 +24,7 @@ namespace llvm {
class IndexedInstrProfReader;
class Module;
class TargetLibraryInfo;
+class TargetMachine;
namespace vfs {
class FileSystem;
@@ -32,7 +33,7 @@ class FileSystem;
class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
public:
LLVM_ABI explicit MemProfUsePass(
- std::string MemoryProfileFile,
+ std::string MemoryProfileFile, TargetMachine *TM,
IntrusiveRefCntPtr<vfs::FileSystem> FS = nullptr);
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
@@ -43,6 +44,7 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
annotateGlobalVariables(Module &M,
const memprof::DataAccessProfData *DataAccessProf);
std::string MemoryProfileFileName;
+ TargetMachine *TM;
IntrusiveRefCntPtr<vfs::FileSystem> FS;
};
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1584d30875570..7df2dc888d669 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1283,7 +1283,7 @@
PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
EnableSampledInstr));
if (IsMemprofUse)
- MPM.addPass(MemProfUsePass(PGOOpt->MemoryProfile, FS));
+ MPM.addPass(MemProfUsePass(PGOOpt->MemoryProfile, TM, FS));
if (PGOOpt && (PGOOpt->Action == PGOOptions::IRUse ||
PGOOpt->Action == PGOOptions::SampleUse))
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 2cfb5b2592601..3e1a6e0bc5649 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -243,7 +243,7 @@ MODULE_PASS_WITH_PARAMS(
parseLoopExtractorPassOptions, "single")
MODULE_PASS_WITH_PARAMS(
"memprof-use", "MemProfUsePass",
- [](std::string Opts) { return MemProfUsePass(Opts); },
+ [this](std::string Opts) { return MemProfUsePass(Opts, this->TM); },
parseMemProfUsePassOptions, "profile-filename=S")
MODULE_PASS_WITH_PARAMS(
"msan", "MemorySanitizerPass",
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
index 1a55021c5d3f7..ca6d509912c7e 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
@@ -32,6 +32,8 @@
#include "llvm/Support/Debug.h"
#include "llvm/Support/HashBuilder.h"
#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Target/TargetLoweringObjectFile.h"
+#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Utils/LongestCommonSequence.h"
#include <map>
#include <set>
@@ -88,9 +90,24 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
cl::desc("Min percent of cold bytes matched to hint allocation cold"));
-static cl::opt<bool> AnnotateStaticDataSectionPrefix(
- "memprof-annotate-static-data-prefix", cl::init(false), cl::Hidden,
- cl::desc("If true, annotate the static data section prefix"));
+enum class AnnotateDataType {
+ None,
+ ReadOnly,
+ ReadWrite,
+};
+static cl::opt<enum AnnotateDataType> AnnotateStaticDataType(
+ "memprof-annotate-static-data-type",
+ cl::values(
+ clEnumValN(AnnotateDataType::None, "none",
+ "Do not annotate static data sections in this pass"),
+ clEnumValN(AnnotateDataType::ReadOnly, "readonly",
+ "Annotate read-only static data sections (i.e., .rodata and
"
+ ".data.rel.ro)"),
+ clEnumValN(AnnotateDataType::ReadWrite, "readwrite",
+ "Annotate both read-only (i.e., .rodata and .data.rel.ro) "
+ "and read-write (.data and .bss) static data sections")),
+ cl::init(AnnotateDataType::None), cl::Hidden,
+ cl::desc("Specify the type of static data sections to annotate"));
// Matching statistics
STATISTIC(NumOfMemProfMissing, "Number of functions without memory profile.");
@@ -233,6 +250,20 @@ static void
HandleUnsupportedAnnotationKinds(GlobalVariable &GVar,
<< Reason << ".\n");
}
+// Returns true if the global variable GV should be annotated based on
+// AnnotateStaticDataType and its section kind.
+static bool ShouldAnnotateStaticDataSection(const GlobalVariable &GV,
+ const TargetMachine &TM) {
+ SectionKind Kind = TargetLoweringObjectFile::getKindForGlobal(&GV, TM);
+ const bool IsReadOnlyData = Kind.isReadOnly() || Kind.isReadOnlyWithRel();
+ if (AnnotateStaticDataType == AnnotateDataType::ReadOnly) {
+ return IsReadOnlyData;
+ }
+ assert(AnnotateStaticDataType == AnnotateDataType::ReadWrite &&
+ "Unknown static data annotation type");
+ return IsReadOnlyData || Kind.isData() || Kind.isBSS();
+}
+
// Structure for tracking info about matched allocation contexts for use with
// -memprof-print-match-info and -memprof-print-matched-alloc-stack.
struct AllocMatchInfo {
@@ -797,9 +828,9 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader
*MemProfReader,
}
}
-MemProfUsePass::MemProfUsePass(std::string MemoryProfileFile,
+MemProfUsePass::MemProfUsePass(std::string MemoryProfileFile, TargetMachine
*TM,
IntrusiveRefCntPtr<vfs::FileSystem> FS)
- : MemoryProfileFileName(MemoryProfileFile), FS(FS) {
+ : MemoryProfileFileName(MemoryProfileFile), TM(TM), FS(FS) {
if (!FS)
this->FS = vfs::getRealFileSystem();
}
@@ -905,7 +936,7 @@ PreservedAnalyses MemProfUsePass::run(Module &M,
ModuleAnalysisManager &AM) {
bool MemProfUsePass::annotateGlobalVariables(
Module &M, const memprof::DataAccessProfData *DataAccessProf) {
- if (!AnnotateStaticDataSectionPrefix || M.globals().empty())
+ if (AnnotateStaticDataType == AnnotateDataType::None || M.globals().empty())
return false;
if (!DataAccessProf) {
@@ -913,7 +944,7 @@ bool MemProfUsePass::annotateGlobalVariables(
M.getContext().diagnose(DiagnosticInfoPGOProfile(
MemoryProfileFileName.data(),
StringRef("Data access profiles not found in memprof. Ignore "
- "-memprof-annotate-static-data-prefix."),
+ "-memprof-annotate-static-data-type."),
DS_Warning));
return false;
}
@@ -934,6 +965,12 @@ bool MemProfUsePass::annotateGlobalVariables(
continue;
}
+ if (!ShouldAnnotateStaticDataSection(GVar, *TM)) {
+ LLVM_DEBUG(dbgs() << "Skip annotating global variable " << GVar.getName()
+ << "\n");
+ continue;
+ }
+
StringRef Name = GVar.getName();
// Skip string literals as their mangled names don't stay stable across
// binary releases.
diff --git a/llvm/test/Transforms/PGOProfile/data-access-profile.ll
b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
index 205184bdd7156..9999b0d99f6a0 100644
--- a/llvm/test/Transforms/PGOProfile/data-access-profile.ll
+++ b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
@@ -7,42 +7,65 @@
; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata
; RUN: llvm-profdata merge --memprof-version=4 memprof-no-dap.yaml -o
memprof-no-dap.profdata
-;; Run optimizer pass on an IR module without IR functions, and test that
global
-;; variables in the module could be annotated (i.e., no early return),
-; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>'
-memprof-annotate-static-data-prefix \
+;; The following opt RUNs sets '-relocation-model=pic' so that var6 is in
+;; .data.rel.ro section.
+
+;; When opt takes 'funcless-module.ll' as input, RUN lines test that global
+;; variables in the module could be annotated even if there are no IR functions
+;; in the module.
+
+;; Tests that readonly (including .data.rel.ro) sections are annotated but
+;; .data and .bss ones are not with
-memprof-annotate-static-data-type=readonly.
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>'
-memprof-annotate-static-data-type=readonly \
+; RUN: -debug-only=memprof -stats -S funcless-module.ll -o - 2>&1 | FileCheck
%s --check-prefixes=IR-READONLY
+
+; RUN: opt -relocation-model=pic
-passes='memprof-use<profile-filename=memprof.profdata>'
-memprof-annotate-static-data-type=readwrite \
; RUN: -debug-only=memprof -stats -S funcless-module.ll -o - 2>&1 | FileCheck
%s --check-prefixes=LOG,IR,STAT
;; Run optimizer pass on the IR, and check the section prefix.
-; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>'
-memprof-annotate-static-data-prefix \
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>'
-memprof-annotate-static-data-type=readwrite \
; RUN: -debug-only=memprof -stats -S input.ll -o - 2>&1 | FileCheck %s
--check-prefixes=LOG,IR,STAT
;; Run memprof without providing memprof data. Test that IR has module flag
;; `EnableDataAccessProf` as 0.
-; RUN: opt -passes='memprof-use<profile-filename=memprof-no-dap.profdata>'
-memprof-annotate-static-data-prefix \
+; RUN: opt -passes='memprof-use<profile-filename=memprof-no-dap.profdata>'
-memprof-annotate-static-data-type=readwrite \
; RUN: -debug-only=memprof -stats -S input.ll -o - 2>&1 | FileCheck %s
--check-prefix=FLAG
-;; Run memprof without explicitly setting -memprof-annotate-static-data-prefix.
+;; Run memprof without explicitly setting -memprof-annotate-static-data-type.
;; The output text IR shouldn't have `section_prefix` or EnableDataAccessProf
module flag.
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' \
; RUN: -debug-only=memprof -stats -S input.ll -o - | FileCheck %s
--check-prefix=FLAGLESS --implicit-check-not="section_prefix"
+; IR-READONLY: @var1_readonly = constant i32 123, !section_prefix !0
+; IR-READONLY: @var2_bss.llvm.125 = global i64 0
+; IR-READONLY-NOT: section_prefix
+; IR-READONLY-SAME: {{.*}}
+
+; IR-READONLY: @var5_data = global i64 1
+; IR-READONLY-NOT: section_prefix
+; IR-READONLY-SAME: {{.*}}
+
+; IR-READONLY: @var6 = constant [2 x ptr] [ptr @var2_bss.llvm.125, ptr
@var5_data], !section_prefix !0
+
; LOG: Skip annotating string literal .str
-; LOG: Global variable var1 is annotated as hot
-; LOG: Global variable var2.llvm.125 is annotated as hot
+; LOG: Global variable var1_readonly is annotated as hot
+; LOG: Global variable var2_bss.llvm.125 is annotated as hot
; LOG: Global variable bar is not annotated
; LOG: Global variable foo is annotated as unlikely
; LOG: Skip annotation for var3 due to explicit section name.
; LOG: Skip annotation for var4 due to explicit section name.
; LOG: Skip annotation for llvm.fake_var due to name starts with `llvm.`.
; LOG: Skip annotation for qux due to linker declaration.
+; LOG: Global variable var5_data is annotated as hot
+; LOG: Global variable var6 is annotated as hot
;; String literals are not annotated.
; IR: @.str = unnamed_addr constant [5 x i8] c"abcde"
; IR-NOT: section_prefix
-; IR: @var1 = global i32 123, !section_prefix !0
+; IR: @var1_readonly = constant i32 123, !section_prefix !0
-;; @var.llvm.125 will be canonicalized to @var2 for profile look-up.
-; IR-NEXT: @var2.llvm.125 = global i64 0, !section_prefix !0
+;; @var2_bss.llvm.125 will be canonicalized to @var2 for profile look-up.
+; IR-NEXT: @var2_bss.llvm.125 = global i64 0, !section_prefix !0
;; @bar is not seen in hot symbol or known symbol set, so it won't get a
section
;; prefix. Test this by testing that there is no section_prefix between @bar
and
@@ -58,8 +81,14 @@
; IR: @llvm.fake_var = global i32 123
; IR-NOT: !section_prefix
+; IR-SAME: {{.*}}
; IR: @qux = external global i64
; IR-NOT: !section_prefix
+; IR-SAME: {{.*}}
+
+; IR: @var5_data = global i64 1, !section_prefix !0
+; IR: @var6 = constant [2 x ptr] [ptr @var2_bss.llvm.125, ptr @var5_data],
!section_prefix !0
+
; IR: attributes #0 = { "rodata-section"="sec2" }
@@ -72,19 +101,24 @@
; STAT: 1 memprof - Number of global vars annotated with 'unlikely' section
prefix.
; STAT: 2 memprof - Number of global vars with user-specified section (not
annotated).
-; STAT: 2 memprof - Number of global vars annotated with 'hot' section prefix.
+; STAT: 4 memprof - Number of global vars annotated with 'hot' section prefix.
; STAT: 1 memprof - Number of global vars with unknown hotness (no section
prefix).
;--- memprof.yaml
---
DataAccessProfiles:
SampledRecords:
- - Symbol: var1
+ - Symbol: var1_readonly
AccessCount: 1000
- - Symbol: var2
+ - Symbol: var5_data
+ AccessCount: 999
+ - Symbol: var6
+ AccessCount: 998
+ - Symbol: var2_bss
AccessCount: 5
- Hash: 101010
AccessCount: 145
+
KnownColdSymbols:
- foo
KnownColdStrHashes: [ 999, 1001 ]
@@ -113,18 +147,20 @@ target datalayout =
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:
target triple = "x86_64-unknown-linux-gnu"
@.str = unnamed_addr constant [5 x i8] c"abcde"
-@var1 = global i32 123
[email protected] = global i64 0
+@var1_readonly = constant i32 123
+@var2_bss.llvm.125 = global i64 0
@bar = global i16 3
@foo = global i8 2
@var3 = constant [2 x i32][i32 12345, i32 6789], section "sec1"
@var4 = constant [1 x i64][i64 98765] #0
@llvm.fake_var = global i32 123
@qux = external global i64
+@var5_data = global i64 1
+@var6 = constant [2 x ptr][ptr @var2_bss.llvm.125, ptr @var5_data]
define i32 @func() {
- %a = load i32, ptr @var1
- %b = load i32, ptr @var2.llvm.125
+ %a = load i32, ptr @var1_readonly
+ %b = load i32, ptr @var2_bss.llvm.125
%c = load i32, ptr @llvm.fake_var
%ret = call i32 (...) @func_taking_arbitrary_param(i32 %a, i32 %b, i32 %c)
ret i32 %ret
@@ -140,14 +176,16 @@ target datalayout =
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:
target triple = "x86_64-unknown-linux-gnu"
@.str = unnamed_addr constant [5 x i8] c"abcde"
-@var1 = global i32 123
[email protected] = global i64 0
+@var1_readonly = constant i32 123
+@var2_bss.llvm.125 = global i64 0
@bar = global i16 3
@foo = global i8 2
@var3 = constant [2 x i32][i32 12345, i32 6789], section "sec1"
@var4 = constant [1 x i64][i64 98765] #0
@llvm.fake_var = global i32 123
@qux = external global i64
+@var5_data = global i64 1
+@var6 = constant [2 x ptr] [ptr @var2_bss.llvm.125, ptr @var5_data]
attributes #0 = { "rodata-section"="sec2" }
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits