[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)

2024-09-16 Thread Mingming Liu via cfe-commits


@@ -56,6 +56,7 @@ Error thinBackend(const Config &C, unsigned Task, AddStreamFn 
AddStream,
   const FunctionImporter::ImportMapTy &ImportList,
   const GVSummaryMapTy &DefinedGlobals,
   MapVector *ModuleMap,
+  bool CodeGenOnly,

minglotus-6 wrote:

nit: add a brief comment for `CodeGenOnly` parameter before method 
`thinBackend`.

https://github.com/llvm/llvm-project/pull/90934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-06 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

The unused variable error is fixed in 
https://github.com/llvm/llvm-project/commit/a7f152f59b1d64c6e1698fc6840c4911e11a

https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-06 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-06 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 updated 
https://github.com/llvm/llvm-project/pull/107471

>From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Thu, 5 Sep 2024 14:36:28 -0700
Subject: [PATCH 1/6] [NFCI]Clean up synthetic count

---
 clang/docs/tools/clang-formatted-files.txt|   4 -
 llvm/include/llvm/IR/ModuleSummaryIndex.h |  21 +--
 llvm/include/llvm/IR/ModuleSummaryIndexYAML.h |   2 +-
 .../llvm/LTO/SummaryBasedOptimizations.h  |  18 ---
 .../IPO/SyntheticCountsPropagation.h  |  23 ---
 llvm/lib/Analysis/ModuleSummaryAnalysis.cpp   |   9 +-
 llvm/lib/AsmParser/LLParser.cpp   |   4 +-
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp |   8 +-
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp |   2 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 -
 llvm/lib/LTO/LTO.cpp  |   3 -
 llvm/lib/LTO/SummaryBasedOptimizations.cpp|  88 ---
 llvm/lib/LTO/ThinLTOCodeGenerator.cpp |   3 -
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   9 --
 llvm/lib/Passes/PassRegistry.def  |   1 -
 llvm/lib/Transforms/IPO/CMakeLists.txt|   1 -
 .../IPO/SyntheticCountsPropagation.cpp| 139 --
 .../Transforms/Utils/FunctionImportUtils.cpp  |  18 +--
 .../SyntheticCountsPropagation/initial.ll |  79 --
 .../SyntheticCountsPropagation/prop.ll|  50 ---
 .../SyntheticCountsPropagation/scc.ll |  19 ---
 llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn |   1 -
 .../llvm/lib/Transforms/IPO/BUILD.gn  |   1 -
 23 files changed, 17 insertions(+), 487 deletions(-)
 delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 delete mode 100644 
llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp
 delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll

diff --git a/clang/docs/tools/clang-formatted-files.txt 
b/clang/docs/tools/clang-formatted-files.txt
index 62871133a68075..28623554d35a11 100644
--- a/clang/docs/tools/clang-formatted-files.txt
+++ b/clang/docs/tools/clang-formatted-files.txt
@@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h
 llvm/include/llvm/IR/StructuralHash.h
 llvm/include/llvm/IR/TrackingMDRef.h
 llvm/include/llvm/IR/UseListOrder.h
-llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 llvm/include/llvm/MC/MCAsmInfoCOFF.h
 llvm/include/llvm/MC/MCAsmInfoDarwin.h
 llvm/include/llvm/MC/MCAsmInfoELF.h
@@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h
 llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
 llvm/include/llvm/Transforms/IPO/SCCP.h
 llvm/include/llvm/Transforms/IPO/StripSymbols.h
-llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h
 llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
 llvm/include/llvm/Transforms/Scalar/ADCE.h
@@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp
 llvm/lib/IR/Statepoint.cpp
 llvm/lib/IR/StructuralHash.cpp
 llvm/lib/IR/ValueSymbolTable.cpp
-llvm/lib/LTO/SummaryBasedOptimizations.cpp
 llvm/lib/MC/MCAsmInfoCOFF.cpp
 llvm/lib/MC/MCAsmInfoELF.cpp
 llvm/lib/MC/MCAsmInfoGOFF.cpp
@@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
 llvm/lib/Transforms/IPO/SampleContextTracker.cpp
 llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
 llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp
-llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 llvm/lib/Transforms/ObjCARC/BlotMapVector.h
 llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp
 llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h 
b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 00934cc1ce6f2d..5bcf3e88622a25 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary {
 GlobalValue::DefaultVisibility,
 /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
 /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition),
-/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
-std::vector(), std::move(Edges),
-std::vector(),
+/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(),
+std::move(Edges), std::vector(),
 std::vector(),
 std::vector(),
 std::vector(),
@@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary {
   /// Function summary specific flags.
   FFlags FunFlags;
 
-  /// The synthesized entry count of the function.
-  /// This is only populated during ThinLink phase and remains unused while
-  /// generating per-module su

[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-06 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 updated 
https://github.com/llvm/llvm-project/pull/107471

>From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Thu, 5 Sep 2024 14:36:28 -0700
Subject: [PATCH 1/5] [NFCI]Clean up synthetic count

---
 clang/docs/tools/clang-formatted-files.txt|   4 -
 llvm/include/llvm/IR/ModuleSummaryIndex.h |  21 +--
 llvm/include/llvm/IR/ModuleSummaryIndexYAML.h |   2 +-
 .../llvm/LTO/SummaryBasedOptimizations.h  |  18 ---
 .../IPO/SyntheticCountsPropagation.h  |  23 ---
 llvm/lib/Analysis/ModuleSummaryAnalysis.cpp   |   9 +-
 llvm/lib/AsmParser/LLParser.cpp   |   4 +-
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp |   8 +-
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp |   2 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 -
 llvm/lib/LTO/LTO.cpp  |   3 -
 llvm/lib/LTO/SummaryBasedOptimizations.cpp|  88 ---
 llvm/lib/LTO/ThinLTOCodeGenerator.cpp |   3 -
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   9 --
 llvm/lib/Passes/PassRegistry.def  |   1 -
 llvm/lib/Transforms/IPO/CMakeLists.txt|   1 -
 .../IPO/SyntheticCountsPropagation.cpp| 139 --
 .../Transforms/Utils/FunctionImportUtils.cpp  |  18 +--
 .../SyntheticCountsPropagation/initial.ll |  79 --
 .../SyntheticCountsPropagation/prop.ll|  50 ---
 .../SyntheticCountsPropagation/scc.ll |  19 ---
 llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn |   1 -
 .../llvm/lib/Transforms/IPO/BUILD.gn  |   1 -
 23 files changed, 17 insertions(+), 487 deletions(-)
 delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 delete mode 100644 
llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp
 delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll

diff --git a/clang/docs/tools/clang-formatted-files.txt 
b/clang/docs/tools/clang-formatted-files.txt
index 62871133a68075..28623554d35a11 100644
--- a/clang/docs/tools/clang-formatted-files.txt
+++ b/clang/docs/tools/clang-formatted-files.txt
@@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h
 llvm/include/llvm/IR/StructuralHash.h
 llvm/include/llvm/IR/TrackingMDRef.h
 llvm/include/llvm/IR/UseListOrder.h
-llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 llvm/include/llvm/MC/MCAsmInfoCOFF.h
 llvm/include/llvm/MC/MCAsmInfoDarwin.h
 llvm/include/llvm/MC/MCAsmInfoELF.h
@@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h
 llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
 llvm/include/llvm/Transforms/IPO/SCCP.h
 llvm/include/llvm/Transforms/IPO/StripSymbols.h
-llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h
 llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
 llvm/include/llvm/Transforms/Scalar/ADCE.h
@@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp
 llvm/lib/IR/Statepoint.cpp
 llvm/lib/IR/StructuralHash.cpp
 llvm/lib/IR/ValueSymbolTable.cpp
-llvm/lib/LTO/SummaryBasedOptimizations.cpp
 llvm/lib/MC/MCAsmInfoCOFF.cpp
 llvm/lib/MC/MCAsmInfoELF.cpp
 llvm/lib/MC/MCAsmInfoGOFF.cpp
@@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
 llvm/lib/Transforms/IPO/SampleContextTracker.cpp
 llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
 llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp
-llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 llvm/lib/Transforms/ObjCARC/BlotMapVector.h
 llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp
 llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h 
b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 00934cc1ce6f2d..5bcf3e88622a25 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary {
 GlobalValue::DefaultVisibility,
 /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
 /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition),
-/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
-std::vector(), std::move(Edges),
-std::vector(),
+/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(),
+std::move(Edges), std::vector(),
 std::vector(),
 std::vector(),
 std::vector(),
@@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary {
   /// Function summary specific flags.
   FFlags FunFlags;
 
-  /// The synthesized entry count of the function.
-  /// This is only populated during ThinLink phase and remains unused while
-  /// generating per-module su

[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 updated 
https://github.com/llvm/llvm-project/pull/107471

>From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Thu, 5 Sep 2024 14:36:28 -0700
Subject: [PATCH 1/4] [NFCI]Clean up synthetic count

---
 clang/docs/tools/clang-formatted-files.txt|   4 -
 llvm/include/llvm/IR/ModuleSummaryIndex.h |  21 +--
 llvm/include/llvm/IR/ModuleSummaryIndexYAML.h |   2 +-
 .../llvm/LTO/SummaryBasedOptimizations.h  |  18 ---
 .../IPO/SyntheticCountsPropagation.h  |  23 ---
 llvm/lib/Analysis/ModuleSummaryAnalysis.cpp   |   9 +-
 llvm/lib/AsmParser/LLParser.cpp   |   4 +-
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp |   8 +-
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp |   2 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 -
 llvm/lib/LTO/LTO.cpp  |   3 -
 llvm/lib/LTO/SummaryBasedOptimizations.cpp|  88 ---
 llvm/lib/LTO/ThinLTOCodeGenerator.cpp |   3 -
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   9 --
 llvm/lib/Passes/PassRegistry.def  |   1 -
 llvm/lib/Transforms/IPO/CMakeLists.txt|   1 -
 .../IPO/SyntheticCountsPropagation.cpp| 139 --
 .../Transforms/Utils/FunctionImportUtils.cpp  |  18 +--
 .../SyntheticCountsPropagation/initial.ll |  79 --
 .../SyntheticCountsPropagation/prop.ll|  50 ---
 .../SyntheticCountsPropagation/scc.ll |  19 ---
 llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn |   1 -
 .../llvm/lib/Transforms/IPO/BUILD.gn  |   1 -
 23 files changed, 17 insertions(+), 487 deletions(-)
 delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 delete mode 100644 
llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp
 delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll

diff --git a/clang/docs/tools/clang-formatted-files.txt 
b/clang/docs/tools/clang-formatted-files.txt
index 62871133a68075..28623554d35a11 100644
--- a/clang/docs/tools/clang-formatted-files.txt
+++ b/clang/docs/tools/clang-formatted-files.txt
@@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h
 llvm/include/llvm/IR/StructuralHash.h
 llvm/include/llvm/IR/TrackingMDRef.h
 llvm/include/llvm/IR/UseListOrder.h
-llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 llvm/include/llvm/MC/MCAsmInfoCOFF.h
 llvm/include/llvm/MC/MCAsmInfoDarwin.h
 llvm/include/llvm/MC/MCAsmInfoELF.h
@@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h
 llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
 llvm/include/llvm/Transforms/IPO/SCCP.h
 llvm/include/llvm/Transforms/IPO/StripSymbols.h
-llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h
 llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
 llvm/include/llvm/Transforms/Scalar/ADCE.h
@@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp
 llvm/lib/IR/Statepoint.cpp
 llvm/lib/IR/StructuralHash.cpp
 llvm/lib/IR/ValueSymbolTable.cpp
-llvm/lib/LTO/SummaryBasedOptimizations.cpp
 llvm/lib/MC/MCAsmInfoCOFF.cpp
 llvm/lib/MC/MCAsmInfoELF.cpp
 llvm/lib/MC/MCAsmInfoGOFF.cpp
@@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
 llvm/lib/Transforms/IPO/SampleContextTracker.cpp
 llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
 llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp
-llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 llvm/lib/Transforms/ObjCARC/BlotMapVector.h
 llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp
 llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h 
b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 00934cc1ce6f2d..5bcf3e88622a25 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary {
 GlobalValue::DefaultVisibility,
 /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
 /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition),
-/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
-std::vector(), std::move(Edges),
-std::vector(),
+/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(),
+std::move(Edges), std::vector(),
 std::vector(),
 std::vector(),
 std::vector(),
@@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary {
   /// Function summary specific flags.
   FFlags FunFlags;
 
-  /// The synthesized entry count of the function.
-  /// This is only populated during ThinLink phase and remains unused while
-  /// generating per-module su

[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 ready_for_review 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 updated 
https://github.com/llvm/llvm-project/pull/107471

>From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Thu, 5 Sep 2024 14:36:28 -0700
Subject: [PATCH 1/3] [NFCI]Clean up synthetic count

---
 clang/docs/tools/clang-formatted-files.txt|   4 -
 llvm/include/llvm/IR/ModuleSummaryIndex.h |  21 +--
 llvm/include/llvm/IR/ModuleSummaryIndexYAML.h |   2 +-
 .../llvm/LTO/SummaryBasedOptimizations.h  |  18 ---
 .../IPO/SyntheticCountsPropagation.h  |  23 ---
 llvm/lib/Analysis/ModuleSummaryAnalysis.cpp   |   9 +-
 llvm/lib/AsmParser/LLParser.cpp   |   4 +-
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp |   8 +-
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp |   2 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 -
 llvm/lib/LTO/LTO.cpp  |   3 -
 llvm/lib/LTO/SummaryBasedOptimizations.cpp|  88 ---
 llvm/lib/LTO/ThinLTOCodeGenerator.cpp |   3 -
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   9 --
 llvm/lib/Passes/PassRegistry.def  |   1 -
 llvm/lib/Transforms/IPO/CMakeLists.txt|   1 -
 .../IPO/SyntheticCountsPropagation.cpp| 139 --
 .../Transforms/Utils/FunctionImportUtils.cpp  |  18 +--
 .../SyntheticCountsPropagation/initial.ll |  79 --
 .../SyntheticCountsPropagation/prop.ll|  50 ---
 .../SyntheticCountsPropagation/scc.ll |  19 ---
 llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn |   1 -
 .../llvm/lib/Transforms/IPO/BUILD.gn  |   1 -
 23 files changed, 17 insertions(+), 487 deletions(-)
 delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 delete mode 100644 
llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp
 delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll

diff --git a/clang/docs/tools/clang-formatted-files.txt 
b/clang/docs/tools/clang-formatted-files.txt
index 62871133a68075..28623554d35a11 100644
--- a/clang/docs/tools/clang-formatted-files.txt
+++ b/clang/docs/tools/clang-formatted-files.txt
@@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h
 llvm/include/llvm/IR/StructuralHash.h
 llvm/include/llvm/IR/TrackingMDRef.h
 llvm/include/llvm/IR/UseListOrder.h
-llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 llvm/include/llvm/MC/MCAsmInfoCOFF.h
 llvm/include/llvm/MC/MCAsmInfoDarwin.h
 llvm/include/llvm/MC/MCAsmInfoELF.h
@@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h
 llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
 llvm/include/llvm/Transforms/IPO/SCCP.h
 llvm/include/llvm/Transforms/IPO/StripSymbols.h
-llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h
 llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
 llvm/include/llvm/Transforms/Scalar/ADCE.h
@@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp
 llvm/lib/IR/Statepoint.cpp
 llvm/lib/IR/StructuralHash.cpp
 llvm/lib/IR/ValueSymbolTable.cpp
-llvm/lib/LTO/SummaryBasedOptimizations.cpp
 llvm/lib/MC/MCAsmInfoCOFF.cpp
 llvm/lib/MC/MCAsmInfoELF.cpp
 llvm/lib/MC/MCAsmInfoGOFF.cpp
@@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
 llvm/lib/Transforms/IPO/SampleContextTracker.cpp
 llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
 llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp
-llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 llvm/lib/Transforms/ObjCARC/BlotMapVector.h
 llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp
 llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h 
b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 00934cc1ce6f2d..5bcf3e88622a25 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary {
 GlobalValue::DefaultVisibility,
 /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
 /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition),
-/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
-std::vector(), std::move(Edges),
-std::vector(),
+/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(),
+std::move(Edges), std::vector(),
 std::vector(),
 std::vector(),
 std::vector(),
@@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary {
   /// Function summary specific flags.
   FFlags FunFlags;
 
-  /// The synthesized entry count of the function.
-  /// This is only populated during ThinLink phase and remains unused while
-  /// generating per-module su

[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/107471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [NFCI]Clean up synthetic count (PR #107471)

2024-09-05 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 updated 
https://github.com/llvm/llvm-project/pull/107471

>From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001
From: mingmingl 
Date: Thu, 5 Sep 2024 14:36:28 -0700
Subject: [PATCH 1/2] [NFCI]Clean up synthetic count

---
 clang/docs/tools/clang-formatted-files.txt|   4 -
 llvm/include/llvm/IR/ModuleSummaryIndex.h |  21 +--
 llvm/include/llvm/IR/ModuleSummaryIndexYAML.h |   2 +-
 .../llvm/LTO/SummaryBasedOptimizations.h  |  18 ---
 .../IPO/SyntheticCountsPropagation.h  |  23 ---
 llvm/lib/Analysis/ModuleSummaryAnalysis.cpp   |   9 +-
 llvm/lib/AsmParser/LLParser.cpp   |   4 +-
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp |   8 +-
 llvm/lib/Bitcode/Writer/BitcodeWriter.cpp |   2 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 -
 llvm/lib/LTO/LTO.cpp  |   3 -
 llvm/lib/LTO/SummaryBasedOptimizations.cpp|  88 ---
 llvm/lib/LTO/ThinLTOCodeGenerator.cpp |   3 -
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   9 --
 llvm/lib/Passes/PassRegistry.def  |   1 -
 llvm/lib/Transforms/IPO/CMakeLists.txt|   1 -
 .../IPO/SyntheticCountsPropagation.cpp| 139 --
 .../Transforms/Utils/FunctionImportUtils.cpp  |  18 +--
 .../SyntheticCountsPropagation/initial.ll |  79 --
 .../SyntheticCountsPropagation/prop.ll|  50 ---
 .../SyntheticCountsPropagation/scc.ll |  19 ---
 llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn |   1 -
 .../llvm/lib/Transforms/IPO/BUILD.gn  |   1 -
 23 files changed, 17 insertions(+), 487 deletions(-)
 delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 delete mode 100644 
llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp
 delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll
 delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll

diff --git a/clang/docs/tools/clang-formatted-files.txt 
b/clang/docs/tools/clang-formatted-files.txt
index 62871133a68075..28623554d35a11 100644
--- a/clang/docs/tools/clang-formatted-files.txt
+++ b/clang/docs/tools/clang-formatted-files.txt
@@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h
 llvm/include/llvm/IR/StructuralHash.h
 llvm/include/llvm/IR/TrackingMDRef.h
 llvm/include/llvm/IR/UseListOrder.h
-llvm/include/llvm/LTO/SummaryBasedOptimizations.h
 llvm/include/llvm/MC/MCAsmInfoCOFF.h
 llvm/include/llvm/MC/MCAsmInfoDarwin.h
 llvm/include/llvm/MC/MCAsmInfoELF.h
@@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h
 llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
 llvm/include/llvm/Transforms/IPO/SCCP.h
 llvm/include/llvm/Transforms/IPO/StripSymbols.h
-llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h
 llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h
 llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
 llvm/include/llvm/Transforms/Scalar/ADCE.h
@@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp
 llvm/lib/IR/Statepoint.cpp
 llvm/lib/IR/StructuralHash.cpp
 llvm/lib/IR/ValueSymbolTable.cpp
-llvm/lib/LTO/SummaryBasedOptimizations.cpp
 llvm/lib/MC/MCAsmInfoCOFF.cpp
 llvm/lib/MC/MCAsmInfoELF.cpp
 llvm/lib/MC/MCAsmInfoGOFF.cpp
@@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
 llvm/lib/Transforms/IPO/SampleContextTracker.cpp
 llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
 llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp
-llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp
 llvm/lib/Transforms/ObjCARC/BlotMapVector.h
 llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp
 llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h 
b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 00934cc1ce6f2d..5bcf3e88622a25 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary {
 GlobalValue::DefaultVisibility,
 /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
 /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition),
-/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
-std::vector(), std::move(Edges),
-std::vector(),
+/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(),
+std::move(Edges), std::vector(),
 std::vector(),
 std::vector(),
 std::vector(),
@@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary {
   /// Function summary specific flags.
   FFlags FunFlags;
 
-  /// The synthesized entry count of the function.
-  /// This is only populated during ThinLink phase and remains unused while
-  /// generating per-module su

[clang] [llvm] [LTO] Reduce memory usage for import lists (PR #106772)

2024-08-30 Thread Mingming Liu via cfe-commits


@@ -215,13 +215,49 @@ class FunctionImporter {
 SmallVector getSourceModules() const;
 
 std::optional
-getImportType(const FunctionsToImportTy &GUIDToImportType,
-  GlobalValue::GUID GUID) const;
+getImportType(StringRef FromModule, GlobalValue::GUID GUID) const;
+
+// Iterate over the import list.  The caller gets tuples of FromModule,
+// GUID, and ImportKind instead of import IDs.
+auto begin() const { return map_iterator(Imports.begin(), IDs); }
+auto end() const { return map_iterator(Imports.end(), IDs); }
+
+friend class SortedImportList;
+
+  private:
+ImportIDTable &IDs;
+DenseSet Imports;
+  };
+
+  // A read-only copy of ImportMapTy with its contents sorted according to the
+  // given comparison function.
+  class SortedImportList {
+  public:
+SortedImportList(const ImportMapTy &ImportMap,
+ llvm::function_ref<
+ bool(const std::pair &,
+  const std::pair &)>
+ Comp)
+: IDs(ImportMap.IDs), Imports(iterator_range(ImportMap.Imports)) {
+  llvm::sort(Imports, [&](ImportIDTable::ImportIDTy L,

minglotus-6 wrote:

I suggest using `llvm::stable_sort` to make it explicit it's a stable sort 
(i.e., even if the existing implementation is stable already)

https://github.com/llvm/llvm-project/pull/106772
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LTO] Reduce memory usage for import lists (PR #106772)

2024-08-30 Thread Mingming Liu via cfe-commits


@@ -1827,15 +1831,13 @@ Expected FunctionImporter::importFunctions(
 if (Error Err = SrcModule->materializeMetadata())
   return std::move(Err);
 
-auto &ImportGUIDs = FunctionsToImportPerModule->second;
-
 // Find the globals to import
 SetVector GlobalsToImport;
 for (Function &F : *SrcModule) {
   if (!F.hasName())
 continue;
   auto GUID = F.getGUID();
-  auto MaybeImportType = ImportList.getImportType(ImportGUIDs, GUID);
+  auto MaybeImportType = ImportList.getImportType(Name, GUID);

minglotus-6 wrote:

nit pick: s/Name/ModName

https://github.com/llvm/llvm-project/pull/106772
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-30 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

Fixing https://lab.llvm.org/buildbot/#/builders/95/builds/672 in 
https://github.com/llvm/llvm-project/pull/97228

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-29 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-29 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

Going to merge as the failed tests are not related to this PR.

```
cat github-pull-requests_build_76910_linux-linux-x64.log | grep -B 2 -A 10 
"Failed Tests "
--

Failed Tests (1):
  BOLT :: X86/reader-stale-yaml-std.test


Testing Time: 385.93s

Total Discovered Tests: 120952
  Skipped  : 47 (0.04%)
  Unsupported  :   3335 (2.76%)
  Passed   : 117251 (96.94%)
  Expectedly Failed:318 (0.26%)
--
--

Failed Tests (1):
  BOLT :: X86/reader-stale-yaml-std.test


Testing Time: 1.98s

Total Discovered Tests: 454
  Skipped  :   7 (1.54%)
  Unsupported  :  13 (2.86%)
  Passed   : 431 (94.93%)
  Expectedly Failed:   2 (0.44%)
```

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  if (!EnableVTableProfileUse || VPtr == nullptr ||
+  !VPtr->getMetadata(LLVMContext::MD_prof))
+return;
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call. After each candidate
+// is promoted, a new basic block is created for the indirect fallback 
basic
+// block and indirect call `CB` is moved into this new BB.
+BasicBlock *OriginalBB = CB.getParent();
+promoteCallWithVTableCmp(
+CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints,
+createBranchWeights(CB.getContext(), Candidate.Count,
+TotalFuncCount - Candidate.Count));
+
+int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent());
+
+ORE.emi

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency
+// chain for the subsequent candidates. Set its value to 1 for non-last
+// candidate and allow option to override it for the last candidate.
+int MaxNumVTable = 1;
+if (I == CandidateSize - 1)
+  MaxNumVTable = ICPMaxNumVTableLastCandidate;
+
+if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {
+  return false;
+}
+  }
+
+  // If the indirect fallback is not cold, don't compare vtables.
+  if (PSI && PSI->hasProfileSummary() &&
+  !PSI->isColdCount(RemainingVTableCount))
+return false;

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) {
   return F.getMetadata(getPGOFuncNameMetadataName());
 }
 
-void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) {
-  // Only for internal linkage functions.
-  if (PGOFuncName == F.getName())
-  return;
-  // Don't create duplicated meta-data.
-  if (getPGOFuncNameMetadata(F))
+static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
+  StringRef PGOName) {
+  // For internal linkage objects, its name is not the same as its PGO name.

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to
+// this vtable, for example, `Base*` can be a pointer to an instantiated type
+// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html
+static std::optional
+getAddressPointOffset(const GlobalVariable &VTableVar,
+  StringRef CompatibleType) {
+  SmallVector Types;
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` is used via its `UserInst`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Guaranteed by ICP transformation");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)

minglotus-6 wrote:

make sense. I hoisted this check to caller `tryToSinkInstructions`, and updated 
callee assert with `DestBB->getUniquePredecessor() == BB`.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  if (!EnableVTableProfileUse || VPtr == nullptr ||
+  !VPtr->getMetadata(LLVMContext::MD_prof))
+return;
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call. After each candidate
+// is promoted, a new basic block is created for the indirect fallback 
basic
+// block and indirect call `CB` is moved into this new BB.
+BasicBlock *OriginalBB = CB.getParent();
+promoteCallWithVTableCmp(
+CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints,
+createBranchWeights(CB.getContext(), Candidate.Count,
+TotalFuncCount - Candidate.Count));
+
+int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent());
+
+ORE.emi

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency
+// chain for the subsequent candidates. Set its value to 1 for non-last
+// candidate and allow option to override it for the last candidate.
+int MaxNumVTable = 1;
+if (I == CandidateSize - 1)
+  MaxNumVTable = ICPMaxNumVTableLastCandidate;
+
+if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {

minglotus-6 wrote:

done by `LLVM_DEBUG`. 
* Didn't do missed-opt remark mainly because function comparison might be 
applied when vtable comparison is not profitable, and readers need to join 
missed vtable message and applied function one to make sense of the remark 
messages.

Now `opt llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll 
-passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom 
-enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -debug -S` 
gives the following log


```
Work on callsite   call void %1(ptr %d), !prof !10 Num_targets: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078
 
Work on callsite #0  call void %1(ptr %d), !prof !10 Num_targets: 3 
Num_candidates: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078

Computing vtable infos for callsite #1
  Cannot find vtable definition for 12345678; maybe the vtable isn't imported

Evaluating vtable profitability for callsite #1  call void %1(ptr %d), !prof !10
  Candidate 0 FunctionCount: 600, VTableCounts: {Derived1, 600}
  Candidate 1 FunctionCount: 500, VTableCounts: {Derived2, 500}
  Candidate 2 FunctionCount: 400, VTableCounts: {Base1, 200} {Derived3, 200}
allow at most 1 and got 2 vtables. Bail out for vtable comparison.
```

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -140,14 +348,56 @@ class IndirectCallPromoter {
   // indirect callee with functions. Returns true if there are IR
   // transformations and false otherwise.
   bool tryToPromoteWithFuncCmp(
-  CallBase &CB, const std::vector &Candidates,
-  uint64_t TotalCount, ArrayRef ICallProfDataRef,
-  uint32_t NumCandidates);
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates, uint64_t TotalCount,
+  ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Promote a list of targets for one indirect call by comparing vtables with
+  // functions. Returns true if there are IR transformations and false
+  // otherwise.
+  bool tryToPromoteWithVTableCmp(
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates,
+  uint64_t TotalFuncCount, uint32_t NumCandidates,
+  MutableArrayRef ICallProfDataRef,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Returns true if it's profitable to compare vtables for the callsite.
+  bool isProfitableToCompareVTables(
+  const CallBase &CB, const std::vector &Candidates,
+  uint64_t TotalCount);
+
+  // Given an indirect callsite and the list of function candidates, compute
+  // the following vtable information in output parameters and returns vtable
+  // pointer if type profiles exist.
+  // - Populate `VTableGUIDCounts` with  using !prof
+  // metadata attached on the vtable pointer.
+  // - For each function candidate, finds out the vtables from which it get

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to
+// this vtable, for example, `Base*` can be a pointer to an instantiated type

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to
+// this vtable, for example, `Base*` can be a pointer to an instantiated type
+// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html
+static std::optional
+getAddressPointOffset(const GlobalVariable &VTableVar,
+  StringRef CompatibleType) {
+  SmallVector Types;
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` is used via its `UserInst`.

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison

minglotus-6 wrote:

> I think what you mean is that doing so for an earlier candidate delays the 
> comparisons for later candidates, but that for the last candidate, only the 
> fallback path is affected?

Yes. I updated the comment.

> Do we expect to set this parameter above 1?

Yes. Setting it to 1 is to make the default parameter conservative.  Based on 
my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable 
performance win compared with 1, and setting it to 3 doesn't give stable 
performance wins across different binaries or across runs.
 
One interesting thing is the actual cost of materializing one vtable address 
point depends on compile option `fpic/fpie`, and the cost of materializing a 
vtable address point and a function is comparable if `fpie/fpic` option is the 
same.
  * For non-pie binaries, `@vtable + address-point-offset` is lowered to an 
immediate representing vtable address point. It could be folded into `icmp` IR 
after lowering, something like `icmp #imm, `. For pie (but non-pic) 
binaries, `@vtable + address-point-offset` is lowered to a pc-relative address. 
So it takes one instruction to materialize the pc-relative address 
itself(something like `leaq  2890849(%rip), %rdx # 0x30fe50 
<_ZTV8Derived1>` for x86). 
 



https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-26 Thread Mingming Liu via cfe-commits


@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) {
   return F.getMetadata(getPGOFuncNameMetadataName());
 }
 
-void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) {
-  // Only for internal linkage functions.
-  if (PGOFuncName == F.getName())
-  return;
-  // Don't create duplicated meta-data.
-  if (getPGOFuncNameMetadata(F))
+static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
+  StringRef PGOName) {
+  // For internal linkage objects, its name is not the same as its PGO name.
+  if (GO.getName() == PGOName)
 return;
-  LLVMContext &C = F.getContext();
-  MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName));
-  F.setMetadata(getPGOFuncNameMetadataName(), N);
+
+  // Don't created duplicated metadata.

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -103,27 +114,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.

minglotus-6 wrote:

I updated this patch with  `LLVM_DEBUG` to catch the pathological case.

In my internal change I do have such `LLVM_DEBUG` logs when testing this change 
and didn't find any. The threshold is to prevent backsliding.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -321,14 +790,114 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int MaxNumVTable = 1;

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -1393,16 +1398,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) {
   return F.getMetadata(getPGOFuncNameMetadataName());
 }
 
-void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) {
-  // Only for internal linkage functions.
-  if (PGOFuncName == F.getName())
-  return;
-  // Don't create duplicated meta-data.
-  if (getPGOFuncNameMetadata(F))
+static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
+  StringRef PGOName) {
+  // For internal linkage objects, its name is not the same as its PGO name.
+  if (GO.getName() == PGOName)
 return;
-  LLVMContext &C = F.getContext();
-  MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName));
-  F.setMetadata(getPGOFuncNameMetadataName(), N);
+
+  // Don't created duplictaed metadata.

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee(
 uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount;
 for (auto Entry : *VMap) {
   if (isa(Entry.first))
-if (auto *CI = dyn_cast_or_null(Entry.second))
+if (auto *CI = dyn_cast_or_null(Entry.second)) {

minglotus-6 wrote:

I used a lambda function to improve the code duplication.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -321,14 +790,114 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int MaxNumVTable = 1;
+if (I == Candidates.size() - 1)

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -321,14 +746,127 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!ICPEnableVTableCmp || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int NumAdditionalVTable = 0;
+if (I == Candidates.size() - 1)
+  NumAdditionalVTable = ICPNumAdditionalVTableLast;
+
+int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1;
+if (ActualNumAdditionalInst > NumAdditionalVTable) {
+  return false;
+}
+  }
+
+  // If the indirect fallback is not cold, don't compare vtables.
+  if (PSI && PSI->hasProfileSummary() &&
+  !PSI->isColdCount(RemainingVTableCount))
+return false;
+
+  return true;
+}
+
+static void
+computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM,
+  VirtualCallSiteTypeInfoMap &VirtualCSInfo) {
+  auto &FAM = 
MAM.getResult(M).getManager();
+  auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & {
+return FAM.getResult(F);
+  };
+
+  auto compute = [&](Function *Func) {
+if (!Func || Func->use_empty())
+  return;
+// Iterate all type.test calls and find all indirect calls.
+// TODO: Add llvm.public.type.test
+for (Use &U : llvm::make_early_inc_range(Func->uses())) {
+  auto *CI = dyn_cast(U.getUser());
+  if (!CI)
+continue;
+  auto *TypeMDVal = cast(CI->getArgOperand(1));
+  if (!TypeMDVal)
+continue;
+  auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata());
+  if (!CompatibleTypeId)
+continue;
+
+  // Find out all devirtualizable call sites given a llvm.type.test
+  // intrinsic call.
+  SmallVector DevirtCalls;
+  SmallVector Assumes;
+  auto &DT = LookupDomTree(*CI->getFunction());
+  findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT);
+
+  // type-id, offset from the address point
+  // combined with type metadata to compute function offset
+  for (auto &DevirtCall : DevirtCalls) {
+CallBase &CB = DevirtCall.CB;
+// Given an indirect call, try find the instruction which loads a
+// pointer to virtual table.
+Instruction *VTablePtr =
+PGOIndirectCallVisitor::tryGetVTableInstruction(&CB);
+if (!VTablePtr)
+  continue;
+VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr,
+  CompatibleTypeId->getString()};
+  }
+}
+  };
+
+  // Right now only llvm.type.test is used to find out virtual call sites.
+  // With ThinLTO and whole-program-devirtualization, llvm.type.test and
+  // llvm.public.type.test are emitted, and llvm.public.type.test is either
+  // refined to llvm.type.test or dropped before indirect-call-promotion pass.

minglotus-6 wrote:

> Briefly explain in the comment why type test is needed for vtable based 
> indirect call promotion?

Iterating `llvm.type.test` users find virtual calls out of all indirect calls 
in a more compile-time efficient manner. Moreover, the first parameter of 
`llvm.type.test` is the compatible type string. Since one vtable definition can 
be compatible with multiple vtables [1], compatible type string together with 
`getelementptr` are important to calculate the funct

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee(
 uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount;
 for (auto Entry : *VMap) {
   if (isa(Entry.first))
-if (auto *CI = dyn_cast_or_null(Entry.second))
+if (auto *CI = dyn_cast_or_null(Entry.second)) {

minglotus-6 wrote:

As discussed offline, I'll keep the current code for now as `CallBase` is a 
load bearing class.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -103,27 +114,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.

minglotus-6 wrote:

> Is this the sum counts of all vtables corresponding to a candidate function?

For hot basic blocks, the counters are on the order of O(100k) or higher, and 
the diff between vtable value profile counters and function value profile 
counters is minuscule.

For iFDO, the value profile counters are collected per instruction (as opposed 
to per basic block) by partially thread-safe function 
['instrumentTargetValueImpl'](https://github.com/llvm/llvm-project/blob/31440738bd6b1345ea978914fe01d2e19f4aa373/compiler-rt/lib/profile/InstrProfilingValue.c#L150-L239)
 , so the code doesn't assume them to be identical.

For SamplePGO type profile (WIP), the total count of one instruction is the 
same as its basic block. So as long as 'vtable' and indirect call are in the 
same basic block, the sum would be identical (both are the count of basic block)

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();

minglotus-6 wrote:

The code updates sum count and individual `` pairs within one 
`!prof` metadata, so used the sum count from the `!prof` metadata.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers`
+  // is enabled and a vtable is used to load multiple virtual functions.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  // Update value profiles on the vtable pointer if it exists.
+  if (VPtr)
+updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call before indirect call
+// promotion.
+BasicBlock *OriginalBB = CB.getParent();

minglotus-6 wrote:

After each candidate is promoted per loop iteration,  a new basic block is 
created for the indirect fallback basic block. Indirect call `CB` is moved so 
we use it to find the new (per-loop-iteration) basic block. 

I updated the comment to make this (per-loop-iteration new basic block) m

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -321,14 +790,114 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int MaxNumVTable = 1;

minglotus-6 wrote:

I added a comment for it.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers`
+  // is enabled and a vtable is used to load multiple virtual functions.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  // Update value profiles on the vtable pointer if it exists.
+  if (VPtr)
+updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call before indirect call
+// promotion.
+BasicBlock *OriginalBB = CB.getParent();
+promoteCallWithVTableCmp(
+CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints,
+createBranchWeights(CB.getContext(), Candidate.Count,
+TotalFuncCount - Candidate.Count));
+
+int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent());
+
+ORE.emit([&]() {

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 commented:

PTAL, thanks!

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-12 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -2538,6 +2538,190 @@ Value *getSalvageOpsForIcmpOp(ICmpInst *Icmp, uint64_t 
CurrentLocOps,
   return Icmp->getOperand(0);
 }
 
+void llvm::tryToSinkInstructionDbgValues(

minglotus-6 wrote:

The `gep` and `load` sequence doesn't have debug values (so far) so it's not 
easy to test within this PR.

IMO helper functions that handles debug value along instruction sink (e.g. 
https://github.com/llvm/llvm-project/blob/1752740f4b4b752bbe2987a0de398c6f671ceb71/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L4769)
 will help other passes (e.g., simplify-cfg, etc) that moves instruction 
around. And test cases can be added for these shared helper functions.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I,
   ArrayRef Insns,
   ArrayRef DPInsns);
 
+void tryToSinkInstructionDbgValues(

minglotus-6 wrote:

Debug intrinsics are not handled in this PR, I un-do `Local.h/cpp` changes (and 
overlooked this comment).

The updated compiler-rt test (instrprof-vtable-value-prof.cpp) enabled `-g` and 
tested IR for the first callsite. 

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -240,27 +474,102 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
 Ret.push_back(PromotionCandidate(TargetFunction, Count));
 TotalCount -= Count;
   }
+
   return Ret;
 }
 
+Constant *IndirectCallPromoter::getOrCreateVTableAddressPointVar(
+GlobalVariable *GV, uint64_t AddressPointOffset) {
+  auto [Iter, Inserted] =
+  VTableAddressPointOffsetVal[GV].try_emplace(AddressPointOffset, nullptr);
+  if (Inserted)
+Iter->second = getVTableAddressPointOffset(GV, AddressPointOffset);
+  return Iter->second;
+}
+
+Instruction *IndirectCallPromoter::computeVTableInfos(

minglotus-6 wrote:

Document output parameters and return value around class method declaration, 
and added more comments in the function (e.g., one example with detailed steps) 
as suggested.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -321,14 +746,127 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!ICPEnableVTableCmp || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int NumAdditionalVTable = 0;
+if (I == Candidates.size() - 1)
+  NumAdditionalVTable = ICPNumAdditionalVTableLast;
+
+int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1;
+if (ActualNumAdditionalInst > NumAdditionalVTable) {
+  return false;
+}
+  }
+
+  // If the indirect fallback is not cold, don't compare vtables.
+  if (PSI && PSI->hasProfileSummary() &&
+  !PSI->isColdCount(RemainingVTableCount))
+return false;
+
+  return true;
+}
+
+static void
+computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM,
+  VirtualCallSiteTypeInfoMap &VirtualCSInfo) {
+  auto &FAM = 
MAM.getResult(M).getManager();
+  auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & {
+return FAM.getResult(F);
+  };
+
+  auto compute = [&](Function *Func) {
+if (!Func || Func->use_empty())
+  return;
+// Iterate all type.test calls and find all indirect calls.
+// TODO: Add llvm.public.type.test
+for (Use &U : llvm::make_early_inc_range(Func->uses())) {
+  auto *CI = dyn_cast(U.getUser());
+  if (!CI)
+continue;
+  auto *TypeMDVal = cast(CI->getArgOperand(1));
+  if (!TypeMDVal)
+continue;
+  auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata());
+  if (!CompatibleTypeId)
+continue;
+
+  // Find out all devirtualizable call sites given a llvm.type.test
+  // intrinsic call.
+  SmallVector DevirtCalls;
+  SmallVector Assumes;
+  auto &DT = LookupDomTree(*CI->getFunction());
+  findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT);
+
+  // type-id, offset from the address point
+  // combined with type metadata to compute function offset
+  for (auto &DevirtCall : DevirtCalls) {
+CallBase &CB = DevirtCall.CB;
+// Given an indirect call, try find the instruction which loads a
+// pointer to virtual table.
+Instruction *VTablePtr =
+PGOIndirectCallVisitor::tryGetVTableInstruction(&CB);
+if (!VTablePtr)
+  continue;
+VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr,
+  CompatibleTypeId->getString()};
+  }
+}
+  };
+
+  // Right now only llvm.type.test is used to find out virtual call sites.
+  // With ThinLTO and whole-program-devirtualization, llvm.type.test and
+  // llvm.public.type.test are emitted, and llvm.public.type.test is either
+  // refined to llvm.type.test or dropped before indirect-call-promotion pass.

minglotus-6 wrote:

If ICP pass is disabled in prelink, we don't need to analyze 
`llvm.public.type.test` as you point out. `llvm.public.type.test` was analyzed 
here because I didn't disable prelink ICP for compiler-rt test.

Now I updated compiler-rt test to disable prelink ICP and enable postlink ICP, 
and skip the analysis for `llvm.public.type.test`.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commi

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -140,14 +337,51 @@ class IndirectCallPromoter {
   // indirect callee with functions. Returns true if there are IR
   // transformations and false otherwise.
   bool tryToPromoteWithFuncCmp(
-  CallBase &CB, const std::vector &Candidates,
-  uint64_t TotalCount, ArrayRef ICallProfDataRef,
-  uint32_t NumCandidates);
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates, uint64_t TotalCount,
+  ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Promote a list of targets for one indirect call by comparing vtables with
+  // functions. Returns true if there are IR transformations and false
+  // otherwise.
+  bool tryToPromoteWithVTableCmp(
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates,
+  uint64_t TotalFuncCount, uint32_t NumCandidates,
+  MutableArrayRef ICallProfDataRef,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Returns true if it's profitable to compare vtables.
+  bool isProfitableToCompareVTables(
+  const std::vector &Candidates, uint64_t TotalCount);
+
+  // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each

minglotus-6 wrote:

I updated the function comment.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.

minglotus-6 wrote:

The original comment is broken. I updated it.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+ Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(
+PHINode::getIncomingValueNumForOperand(OperandNo));
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, Basic

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// Update VTableGUIDCounts
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -341,6 +879,17 @@ static bool promoteIndirectCalls(Module &M, 
ProfileSummaryInfo *PSI, bool InLTO,
 return false;
   }
   bool Changed = false;
+  VirtualCallSiteTypeInfoMap VirtualCSInfo;
+
+  computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
+
+  // This map records states across functions in an LLVM IR module.

minglotus-6 wrote:

`VTableAddressPointOffsetVal` stores the vtable address points. The vtable 
address point of a given `` is static (doesn't 
change after being computed once).

`IndirectCallPromoter::getOrCreateVTableAddressPointVar` creates the map entry 
the first time a `` pair is seen,  as `promoteIndirectCalls`  
processes an IR module and calls `IndirectCallPromoter` repeatedly on each 
function.

I reworded the comments. PTAL.


https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// Update VTableGUIDCounts
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers`
+  // is enabled and a vtable is used to load multiple virtual functions.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  // Update value profiles on the vtable pointer if it exists.
+  if (VPtr)
+updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+  for (const auto &Candidate : Candidates) {
+uint64_t IfCount = 0;

minglotus-6 wrote:

This variable represents the 'if-branch' count of the branch weights for 
conditional devirtualization. I removed it. Now `Candidate.Count` represents if 
branch count and `TotalFuncCount - Candidate.Count` represents else branch 
count.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-06 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)

minglotus-6 wrote:

I updated the function comment based on Teresa's input, and renamed the 
function to `getAddressPointOffset`

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,

minglotus-6 wrote:

Meanwhile, I can find a few places where the gist of helper function is used. 
I'm planning to have some canonical helper functions maybe in 
`llvm/lib/Transforms/Util` directory.

Examples:
1) 
https://github.com/llvm/llvm-project/blob/ce5b371606422ed21cda0e24cdc89cb41cdc5600/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp#L1139-L1143
2) 
https://github.com/llvm/llvm-project/blob/ce5b371606422ed21cda0e24cdc89cb41cdc5600/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp#L3517-L3520
3) (made closer to this function by my change 
https://github.com/llvm/llvm-project/pull/93249/)  
https://github.com/llvm/llvm-project/blob/5f243b3fffca42ed320529a54aefd86087aa85f8/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L5011-L5014

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)

minglotus-6 wrote:

Type metadata are generated for compatible vtables and virtual functions.

https://gcc.godbolt.org/z/rr5YY7cdv is an example to show the `!type` metadata 
list. 

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+ Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(
+PHINode::getIncomingValueNumForOperand(OperandNo));
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)

minglotus-6 wrote:

In this context we can have `assert(DestBB->getUniquePredecessor() == BB`. I'm 
leaning towards not handling more generic cases if those doesn't show up.

- `DestBB` is the indirect call fall back and created by 
`versionCallSiteWithCond`.  `versionCallsiteWithCond` creates basic blocks and 
edges like this comment shows 
https://github.com/llvm/llvm-project/blob/5f243b3fffca42ed320529a54aefd86087aa85f8/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp#L197-L223

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+ Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(
+PHINode::getIncomingValueNumForOperand(OperandNo));
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, Basic

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -89,41 +93,50 @@
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
-// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150
+// ICTEXT: _ZN8Derived14funcEii:50
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
-// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750
+// ICTEXT: _ZN8Derived1D0Ev:250
 // ICTEXT: # ValueKind = IPVK_VTableTarget:
 // ICTEXT: 2
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
-// ICTEXT: _ZTV8Derived1:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150
+// ICTEXT: _ZTV8Derived1:50
 // ICTEXT: 2
 // ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
 // ICTEXT: _ZTV8Derived1:250
 
+// Test indirect call promotion transformation using vtable profiles.
+// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin 
-fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm 
-icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s 
--check-prefix=REMARK --implicit-check-not="!VP"
+
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with 
count 150 out of 200, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 
50, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 
750 out of 1000, compare 1 vtables and sink 2 instructions
+// REMARK: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 
250, compare 1 vtables and sink 2 instructions

minglotus-6 wrote:

As explained above, only the first candidate of an indirect call site sees the 
original function count.

This is consistent with existing remarks, as shown in this regression test 
(https://github.com/llvm/llvm-project/blob/cc2fafa1788908f69366821a04407083f770483e/llvm/test/Transforms/PGOProfile/indirect_call_promotion.ll#L5-L8)

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -89,41 +93,50 @@
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
-// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150
+// ICTEXT: _ZN8Derived14funcEii:50
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
-// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750
+// ICTEXT: _ZN8Derived1D0Ev:250
 // ICTEXT: # ValueKind = IPVK_VTableTarget:
 // ICTEXT: 2
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
-// ICTEXT: _ZTV8Derived1:250
+// ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150
+// ICTEXT: _ZTV8Derived1:50
 // ICTEXT: 2
 // ICTEXT: 
{{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
 // ICTEXT: _ZTV8Derived1:250
 
+// Test indirect call promotion transformation using vtable profiles.
+// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin 
-fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm 
-icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s 
--check-prefix=REMARK --implicit-check-not="!VP"
+
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with 
count 150 out of 200, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 
50, compare 1 vtables and sink 1 instructions

minglotus-6 wrote:

The first candidate's remark says 150 out of 200. After the first callee gets 
promoted, the total function count becomes 50 (200 - 50), so the second 
candidate's remark says 50 out of 50.

Updated the remark message with the filename and line number to make it more 
explicit.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,

minglotus-6 wrote:

thanks for the catch. I removed it.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-30 Thread Mingming Liu via cfe-commits


@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee(
 uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount;
 for (auto Entry : *VMap) {
   if (isa(Entry.first))
-if (auto *CI = dyn_cast_or_null(Entry.second))
+if (auto *CI = dyn_cast_or_null(Entry.second)) {

minglotus-6 wrote:

Makes sense to not further duplicate the code.  How about something like this?

```
if (isa(Entry.first)) {
  if(auto* CB = dyn_cast_or_null(Entry.Base)) {
CB->updateProfWeight(...
Instruction* VPtr = PGOIndirectCallVisitor::tryGetVTableInstruction(CB);
if (VPtr)
  scaleProfData(VPtr...)
}
```

`CallBase` is the base class of five IR instructions [1] now.  Among them 
`CoroAwaitSuspendInst` [1] and `GCStatePointInst` [2] are relatively new; the 
rest three are well understood. Despite more derived classes of `CallBase` 
might be added for new cases, the code above should be future-proof since 
`updateProfWeight` and `scaleProfData` are programmed to `!prof` (not 
`CallBase`).

[1] 
https://github.com/llvm/llvm-project/blob/692ae5443b1778e138527ef55d799a4b535a36f9/llvm/lib/Transforms/Coroutines/CoroInstr.h#L85-L88
[2]
https://github.com/llvm/llvm-project/blob/692ae5443b1778e138527ef55d799a4b535a36f9/llvm/include/llvm/IR/Statepoint.h#L61-L63

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-05-28 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [IRPGO] [Draft] Import vtable definitions in ThinTLO and use more efficient vtable comparison sequence with cost-benefit analysis (PR #69141)

2024-05-24 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

Close this stale patch and won't merge conflicts in this one.

Working on https://github.com/llvm/llvm-project/pull/81442 now.

https://github.com/llvm/llvm-project/pull/69141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [IRPGO] [Draft] Import vtable definitions in ThinTLO and use more efficient vtable comparison sequence with cost-benefit analysis (PR #69141)

2024-05-24 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/69141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-11 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

> I think this might be breaking: FAIL: LLVM :: tools/gold/X86/thinlto.ll

fixed in 
https://github.com/llvm/llvm-project/commit/20ed5b1f45871612570d3bd447121ac43e083c6a

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-11 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

> I think this might be breaking:
> FAIL: LLVM :: tools/gold/X86/thinlto.ll

Thanks for reporting and sorry for this. I'll send out a fix soon.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits


@@ -564,6 +581,12 @@ class GlobalValueSummary {
 
   bool canAutoHide() const { return Flags.CanAutoHide; }
 
+  bool shouldImportAsDec() const {

minglotus-6 wrote:

this makes sense, done.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits


@@ -16,13 +16,13 @@
 ^3 = gv: (guid: 2, summaries: (function: (module: ^1, flags: (linkage: 
external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), 
insts: 10, calls: ((callee: ^15, relbf: 256, tail: 1)
 
 ; Summaries with different linkage types.
-^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: 
internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1), 
insts: 1)))
+^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: 
internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, 
importType: definition), insts: 1)))
 ; Make this one an alias with a forward reference to aliasee.
-^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, 
visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1), aliasee: 
^14)))
+^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, 
visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1,  importType: 
definition), aliasee: ^14)))
 ^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: 
available_externally, visibility: default, notEligibleToImport: 0, live: 0, 
dsoLocal: 0), insts: 1)))
 ^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: 
linkonce, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), 
insts: 1)))
 ^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: 
linkonce_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 
0), insts: 1)))
-^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: 
weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, 
canAutoHide: 1), insts: 1)))
+^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: 
weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, 
canAutoHide: 0, importType: declaration), insts: 1)))

minglotus-6 wrote:

ah `canAutoHide:0` -> `canAutoHide:1` is not intentional. But I agree it's 
better to add a new record.

Done by taking `^24` and incrementing `^ID` for the rest.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits


@@ -2072,6 +2072,23 @@ void LLParser::parseOptionalVisibility(unsigned &Res) {
   Lex.Lex();
 }
 
+static GlobalValueSummary::ImportKind
+parseOptionalImportType(lltok::Kind Kind) {
+  GlobalValueSummary::ImportKind Res;
+  switch (Kind) {
+  default:
+Res = GlobalValueSummary::Definition;

minglotus-6 wrote:

done, by changing `parseOptionalImportType` to a member function of `LLParser` 
class (from a static free function) so it could return `tokError`.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits


@@ -432,6 +432,18 @@ class GlobalValueSummary {
   /// Sububclass discriminator (for dyn_cast<> et al.)
   enum SummaryKind : unsigned { AliasKind, FunctionKind, GlobalVarKind };
 
+  enum ImportKind : unsigned {
+// The global value definition corresponding to the summary should be
+// imported from source module
+Definition = 0,
+
+// When its definition doesn't exist in the destination module and not
+// imported (e.g., function is large to be inlined), the global value

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits


@@ -635,7 +635,8 @@ static void computeFunctionSummary(
   HasIndirBranchToBlockAddress || HasIFuncCall;
   GlobalValueSummary::GVFlags Flags(
   F.getLinkage(), F.getVisibility(), NotEligibleForImport,
-  /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable());
+  /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable(),
+  /*ImportType=*/GlobalValueSummary::ImportKind::Definition);

minglotus-6 wrote:

done. Fixed all callsites of `GVFlags` constructor.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)

2024-04-10 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

Resolve review feedback, and 'backfilled' three affected tests in 
`clang/test/CodeGen`. Now `ninja check-llvm check-clang check-compiler-rt` 
passed locally.

https://github.com/llvm/llvm-project/pull/87597
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [thinlto] Implement the import of vtable definitions. (PR #72551)

2024-04-01 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

Superseded by https://github.com/llvm/llvm-project/pull/79381

https://github.com/llvm/llvm-project/pull/72551
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [thinlto] Implement the import of vtable definitions. (PR #72551)

2024-04-01 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/72551
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-04-01 Thread Mingming Liu via cfe-commits

https://github.com/minglotus-6 closed 
https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -1362,8 +1372,8 @@ remapSamples(const sampleprof::FunctionSamples &Samples,
   BodySample.second.getSamples());
 for (const auto &Target : BodySample.second.getCallTargets()) {
   Result.addCalledTargetSamples(BodySample.first.LineOffset,
-MaskedDiscriminator,
-Remapper(Target.first), Target.second);
+MaskedDiscriminator, 
Remapper(Target.first),

minglotus-6 wrote:

undo it.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -0,0 +1,145 @@
+// REQUIRES: lld-available
+
+// RUN: rm -rf %t && split-file %s %t && cd %t

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -0,0 +1,145 @@
+// REQUIRES: lld-available
+
+// RUN: rm -rf %t && split-file %s %t && cd %t
+//
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm 
-enable-vtable-value-profiling test.cpp -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test
+
+// Show vtable profiles from raw profile.
+// RUN: llvm-profdata show --function=main --ic-targets -show-vtables 
test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
+
+// Generate indexed profile from raw profile and show the data.
+// RUN: llvm-profdata merge test.profraw -o test.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables 
test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+
+// Generate text profile from raw and indexed profiles respectively and show 
the data.
+// RUN: llvm-profdata merge --text test.profraw -o raw.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text 
raw.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profdata -o indexed.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text 
indexed.proftext | FileCheck %s --check-prefix=ICTEXT
+
+// Generate indexed profile from text profiles and show the data
+// RUN: llvm-profdata merge --binary raw.proftext -o text.profraw
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables 
text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary indexed.proftext -o text.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables 
text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+
+// COMMON: Counters:
+// COMMON-NEXT:  main:
+// COMMON-NEXT:  Hash: 0x0f9a16fe6d398548
+// COMMON-NEXT:  Counters: 2
+// COMMON-NEXT:  Indirect Call Site Count: 2
+// COMMON-NEXT:  Number of instrumented vtables: 2
+// RAW:  Indirect Target Results:
+// RAW-NEXT:   [  0, _ZN8Derived15func1Eii,250 ] (25.00%)
+// RAW-NEXT:   [  0, test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,
750 ] (75.00%)
+// RAW-NEXT:   [  1, _ZN8Derived15func2Eii,250 ] (25.00%)
+// RAW-NEXT:   [  1, test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,
750 ] (75.00%)
+// RAW-NEXT:  VTable Results:
+// RAW-NEXT:   [  0, _ZTV8Derived1,250 ] (25.00%)
+// RAW-NEXT:   [  0, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,750 ] 
(75.00%)
+// RAW-NEXT:   [  1, _ZTV8Derived1,250 ] (25.00%)
+// RAW-NEXT:   [  1, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,750 ] 
(75.00%)
+// INDEXED: Indirect Target Results:
+// INDEXED-NEXT: [  0, test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,   
 750 ] (75.00%)
+// INDEXED-NEXT: [  0, _ZN8Derived15func1Eii,250 ] (25.00%)
+// INDEXED-NEXT: [  1, test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,   
 750 ] (75.00%)
+// INDEXED-NEXT: [  1, _ZN8Derived15func2Eii,250 ] (25.00%)
+// INDEXED-NEXT: VTable Results:
+// INDEXED-NEXT: [  0, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,
750 ] (75.00%)
+// INDEXED-NEXT: [  0, _ZTV8Derived1,250 ] (25.00%)
+// INDEXED-NEXT: [  1, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,
750 ] (75.00%)
+// INDEXED-NEXT: [  1, _ZTV8Derived1,250 ] (25.00%)
+// COMMON: Instrumentation level: IR  entry_first = 0
+// COMMON-NEXT: Functions shown: 1
+// COMMON-NEXT: Total functions: 6
+// COMMON-NEXT: Maximum function count: 1000
+// COMMON-NEXT: Maximum internal block count: 250
+// COMMON-NEXT: Statistics for indirect call sites profile:
+// COMMON-NEXT:   Total number of sites: 2
+// COMMON-NEXT:   Total number of sites with values: 2
+// COMMON-NEXT:   Total number of profiled values: 4
+// COMMON-NEXT:   Value sites histogram:
+// COMMON-NEXT: NumTargets, SiteCount
+// COMMON-NEXT: 2, 2
+// COMMON-NEXT: Statistics for vtable profile:
+// COMMON-NEXT:   Total number of sites: 2
+// COMMON-NEXT:   Total number of sites with values: 2
+// COMMON-NEXT:   Total number of profiled values: 4
+// COMMON-NEXT:   Value sites histogram:
+// COMMON-NEXT: NumTargets, SiteCount
+// COMMON-NEXT: 2, 2
+
+// ICTEXT: :ir
+// ICTEXT: main
+// ICTEXT: # Func Hash:
+// ICTEXT: 1124236338992350536
+// ICTEXT: # Num Counters:
+// ICTEXT: 2
+// ICTEXT: # Counter Values:
+// ICTEXT: 1000
+// ICTEXT: 1
+// ICTEXT: # Num Value Kinds:
+// ICTEXT: 2
+// ICTEXT: # ValueKind = IPVK_IndirectCallTarget:
+// ICTEXT: 0
+// ICTEXT: # NumValueSites:
+// ICTEXT: 2
+// ICTEXT: 2
+// ICTEXT: test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
+// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: 2
+// ICTEXT: test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
+// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: # ValueKind = IPVK_VTableTarget:
+// ICTEXT: 2
+// ICTEXT: # NumValueSites:
+// ICTEXT: 2
+// ICTEXT: 2
+// ICTEXT: test.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
+/

[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -676,24 +722,66 @@ TEST_P(InstrProfReaderWriterTest, icall_data_read_write) {
 
   Expected R = Reader->getInstrProfRecord("caller", 0x1234);
   ASSERT_THAT_ERROR(R.takeError(), Succeeded());
+
+  // Test the number of instrumented indirect call sites and the number of
+  // profiled values at each site.
   ASSERT_EQ(4U, R->getNumValueSites(IPVK_IndirectCallTarget));
   EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 0));
   EXPECT_EQ(0U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 1));
   EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 2));
-  EXPECT_EQ(1U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3));
+  EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3));
+
+  // Test the number of instrumented vtable sites and the number of profiled
+  // values at each site.
+  ASSERT_EQ(2U, R->getNumValueSites(IPVK_VTableTarget));
+  EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_VTableTarget, 0));
+  EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_VTableTarget, 1));
+
+  // First indirect site.
+  {
+uint64_t TotalC;
+std::unique_ptr VD =
+R->getValueForSite(IPVK_IndirectCallTarget, 0, &TotalC);
+
+EXPECT_EQ(3U * getProfWeight(), VD[0].Count);
+EXPECT_EQ(2U * getProfWeight(), VD[1].Count);
+EXPECT_EQ(1U * getProfWeight(), VD[2].Count);
+EXPECT_EQ(6U * getProfWeight(), TotalC);
+
+EXPECT_EQ(StringRef((const char *)VD[0].Value, 7), StringRef("callee3"));

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -0,0 +1,145 @@
+// REQUIRES: lld-available
+
+// RUN: rm -rf %t && split-file %s %t && cd %t
+//
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm 
-enable-vtable-value-profiling test.cpp -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test
+
+// Show vtable profiles from raw profile.
+// RUN: llvm-profdata show --function=main --ic-targets -show-vtables 
test.profraw | FileCheck %s --check-prefixes=COMMON,RAW

minglotus-6 wrote:

fixed it.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -676,24 +722,66 @@ TEST_P(InstrProfReaderWriterTest, icall_data_read_write) {
 
   Expected R = Reader->getInstrProfRecord("caller", 0x1234);
   ASSERT_THAT_ERROR(R.takeError(), Succeeded());
+
+  // Test the number of instrumented indirect call sites and the number of
+  // profiled values at each site.
   ASSERT_EQ(4U, R->getNumValueSites(IPVK_IndirectCallTarget));
   EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 0));
   EXPECT_EQ(0U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 1));
   EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 2));
-  EXPECT_EQ(1U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3));
+  EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3));
+
+  // Test the number of instrumented vtable sites and the number of profiled
+  // values at each site.
+  ASSERT_EQ(2U, R->getNumValueSites(IPVK_VTableTarget));
+  EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_VTableTarget, 0));
+  EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_VTableTarget, 1));
+
+  // First indirect site.
+  {
+uint64_t TotalC;
+std::unique_ptr VD =
+R->getValueForSite(IPVK_IndirectCallTarget, 0, &TotalC);
+
+EXPECT_EQ(3U * getProfWeight(), VD[0].Count);

minglotus-6 wrote:

 > Gunit expectations should be written as EXPECT_EQ(actual, expected).

I see. Used `EXPECT_EQ(actual, expected)` in new lines, and updated some 
surrounding existing lines in this style in this PR. Will send out a patch 
later to update the rest of `EXPECT_EQ` to make the style consistent.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -431,23 +441,34 @@ class InstrProfSymtab {
   using AddrHashMap = std::vector>;
 
 private:
+  using AddrIntervalMap =
+  IntervalMap>;
   StringRef Data;
   uint64_t Address = 0;
-  // Unique name strings.
+  // Unique name strings. Used to ensure entries in MD5NameMap (a vector that's
+  // going to be sorted) has unique MD5 keys in the first place.
   StringSet<> NameTab;
+  // Records the unique virtual table names. This is used by InstrProfWriter to
+  // write out an on-disk chained hash table of virtual table names.
+  // InstrProfWriter stores per function profile data (keyed by function names)
+  // so it doesn't use a StringSet for function names.
+  StringSet<> VTableNames;
   // A map from MD5 keys to function name strings.
   std::vector> MD5NameMap;

minglotus-6 wrote:

Sure. i'll send out another patch later.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-29 Thread Mingming Liu via cfe-commits


@@ -0,0 +1,145 @@
+// REQUIRES: lld-available
+
+// RUN: rm -rf %t && split-file %s %t && cd %t
+//
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm 
-enable-vtable-value-profiling test.cpp -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test

minglotus-6 wrote:

done.

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-28 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

> I think we should resort to scripts and profraw in LLVM if we don't have 
> support for textual format. This is the case for memprof for example. 

Got it. I updated the test cases. Now compiler-rt test provides raw-related 
test coverage, and IR test provides test coverage for the {show, merge} related 
with {indexed, text} formats.

> It seems cleanest if the allocator and the map are owned by the 
> InstrProfSymtab class.

Agree. With https://github.com/llvm/llvm-project/pull/86882, InstrProfSymtab 
could own both. Thanks for the discussion!
- Added a few 
[lines](https://github.com/llvm/llvm-project/pull/66825/commits/01585236e7c350830544482c9917320a75e9ac57#diff-f959bf0ff4ce781c36e6eddb19d46fadb78e0c79fa13f7df384bb72f38fcf405)
 in unit test to have some test coverage for the IntervalMap usage.


https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)

2024-03-26 Thread Mingming Liu via cfe-commits

minglotus-6 wrote:

> My concern with this approach is that compiler-rt is treated as a different 
> project and updating the code within LLVM makes it easy to miss running the 
> test locally for the other project. I think such issues will be caught by the 
> buildbot but having it flagged earlier is better for the developer. What do 
> you think?

Yeah presubmit test coverage makes sense.

If it's desirable to get rid of 'update_vtable_value_prof_inputs.sh' and 
'vtable-value-prof-basic.profraw' in the repo, here is another way:
* For LLVM IR tests, store textual profiles in the repo, and run `llvm-profdata 
merge` to convert it to indexed profiles.
* To have test coverage on raw profiles (generate raw profiles or convert it 
into other formats), have a compiler-rt test.

I wonder if that is preferred over the current status (with script and 
`.profraw` in the repo).

https://github.com/llvm/llvm-project/pull/66825
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   >