[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)
@@ -56,6 +56,7 @@ Error thinBackend(const Config &C, unsigned Task, AddStreamFn AddStream, const FunctionImporter::ImportMapTy &ImportList, const GVSummaryMapTy &DefinedGlobals, MapVector *ModuleMap, + bool CodeGenOnly, minglotus-6 wrote: nit: add a brief comment for `CodeGenOnly` parameter before method `thinBackend`. https://github.com/llvm/llvm-project/pull/90934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
minglotus-6 wrote: The unused variable error is fixed in https://github.com/llvm/llvm-project/commit/a7f152f59b1d64c6e1698fc6840c4911e11a https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/107471 >From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001 From: mingmingl Date: Thu, 5 Sep 2024 14:36:28 -0700 Subject: [PATCH 1/6] [NFCI]Clean up synthetic count --- clang/docs/tools/clang-formatted-files.txt| 4 - llvm/include/llvm/IR/ModuleSummaryIndex.h | 21 +-- llvm/include/llvm/IR/ModuleSummaryIndexYAML.h | 2 +- .../llvm/LTO/SummaryBasedOptimizations.h | 18 --- .../IPO/SyntheticCountsPropagation.h | 23 --- llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 9 +- llvm/lib/AsmParser/LLParser.cpp | 4 +- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 8 +- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 2 +- llvm/lib/LTO/CMakeLists.txt | 1 - llvm/lib/LTO/LTO.cpp | 3 - llvm/lib/LTO/SummaryBasedOptimizations.cpp| 88 --- llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 3 - llvm/lib/Passes/PassBuilderPipelines.cpp | 9 -- llvm/lib/Passes/PassRegistry.def | 1 - llvm/lib/Transforms/IPO/CMakeLists.txt| 1 - .../IPO/SyntheticCountsPropagation.cpp| 139 -- .../Transforms/Utils/FunctionImportUtils.cpp | 18 +-- .../SyntheticCountsPropagation/initial.ll | 79 -- .../SyntheticCountsPropagation/prop.ll| 50 --- .../SyntheticCountsPropagation/scc.ll | 19 --- llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn | 1 - .../llvm/lib/Transforms/IPO/BUILD.gn | 1 - 23 files changed, 17 insertions(+), 487 deletions(-) delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h delete mode 100644 llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll diff --git a/clang/docs/tools/clang-formatted-files.txt b/clang/docs/tools/clang-formatted-files.txt index 62871133a68075..28623554d35a11 100644 --- a/clang/docs/tools/clang-formatted-files.txt +++ b/clang/docs/tools/clang-formatted-files.txt @@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h llvm/include/llvm/IR/StructuralHash.h llvm/include/llvm/IR/TrackingMDRef.h llvm/include/llvm/IR/UseListOrder.h -llvm/include/llvm/LTO/SummaryBasedOptimizations.h llvm/include/llvm/MC/MCAsmInfoCOFF.h llvm/include/llvm/MC/MCAsmInfoDarwin.h llvm/include/llvm/MC/MCAsmInfoELF.h @@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h llvm/include/llvm/Transforms/IPO/SCCP.h llvm/include/llvm/Transforms/IPO/StripSymbols.h -llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h llvm/include/llvm/Transforms/Scalar/ADCE.h @@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp llvm/lib/IR/Statepoint.cpp llvm/lib/IR/StructuralHash.cpp llvm/lib/IR/ValueSymbolTable.cpp -llvm/lib/LTO/SummaryBasedOptimizations.cpp llvm/lib/MC/MCAsmInfoCOFF.cpp llvm/lib/MC/MCAsmInfoELF.cpp llvm/lib/MC/MCAsmInfoGOFF.cpp @@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/SampleContextTracker.cpp llvm/lib/Transforms/IPO/SampleProfileProbe.cpp llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp -llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp llvm/lib/Transforms/ObjCARC/BlotMapVector.h llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 00934cc1ce6f2d..5bcf3e88622a25 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary { GlobalValue::DefaultVisibility, /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false, /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition), -/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, -std::vector(), std::move(Edges), -std::vector(), +/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(), +std::move(Edges), std::vector(), std::vector(), std::vector(), std::vector(), @@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary { /// Function summary specific flags. FFlags FunFlags; - /// The synthesized entry count of the function. - /// This is only populated during ThinLink phase and remains unused while - /// generating per-module su
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/107471 >From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001 From: mingmingl Date: Thu, 5 Sep 2024 14:36:28 -0700 Subject: [PATCH 1/5] [NFCI]Clean up synthetic count --- clang/docs/tools/clang-formatted-files.txt| 4 - llvm/include/llvm/IR/ModuleSummaryIndex.h | 21 +-- llvm/include/llvm/IR/ModuleSummaryIndexYAML.h | 2 +- .../llvm/LTO/SummaryBasedOptimizations.h | 18 --- .../IPO/SyntheticCountsPropagation.h | 23 --- llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 9 +- llvm/lib/AsmParser/LLParser.cpp | 4 +- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 8 +- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 2 +- llvm/lib/LTO/CMakeLists.txt | 1 - llvm/lib/LTO/LTO.cpp | 3 - llvm/lib/LTO/SummaryBasedOptimizations.cpp| 88 --- llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 3 - llvm/lib/Passes/PassBuilderPipelines.cpp | 9 -- llvm/lib/Passes/PassRegistry.def | 1 - llvm/lib/Transforms/IPO/CMakeLists.txt| 1 - .../IPO/SyntheticCountsPropagation.cpp| 139 -- .../Transforms/Utils/FunctionImportUtils.cpp | 18 +-- .../SyntheticCountsPropagation/initial.ll | 79 -- .../SyntheticCountsPropagation/prop.ll| 50 --- .../SyntheticCountsPropagation/scc.ll | 19 --- llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn | 1 - .../llvm/lib/Transforms/IPO/BUILD.gn | 1 - 23 files changed, 17 insertions(+), 487 deletions(-) delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h delete mode 100644 llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll diff --git a/clang/docs/tools/clang-formatted-files.txt b/clang/docs/tools/clang-formatted-files.txt index 62871133a68075..28623554d35a11 100644 --- a/clang/docs/tools/clang-formatted-files.txt +++ b/clang/docs/tools/clang-formatted-files.txt @@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h llvm/include/llvm/IR/StructuralHash.h llvm/include/llvm/IR/TrackingMDRef.h llvm/include/llvm/IR/UseListOrder.h -llvm/include/llvm/LTO/SummaryBasedOptimizations.h llvm/include/llvm/MC/MCAsmInfoCOFF.h llvm/include/llvm/MC/MCAsmInfoDarwin.h llvm/include/llvm/MC/MCAsmInfoELF.h @@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h llvm/include/llvm/Transforms/IPO/SCCP.h llvm/include/llvm/Transforms/IPO/StripSymbols.h -llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h llvm/include/llvm/Transforms/Scalar/ADCE.h @@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp llvm/lib/IR/Statepoint.cpp llvm/lib/IR/StructuralHash.cpp llvm/lib/IR/ValueSymbolTable.cpp -llvm/lib/LTO/SummaryBasedOptimizations.cpp llvm/lib/MC/MCAsmInfoCOFF.cpp llvm/lib/MC/MCAsmInfoELF.cpp llvm/lib/MC/MCAsmInfoGOFF.cpp @@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/SampleContextTracker.cpp llvm/lib/Transforms/IPO/SampleProfileProbe.cpp llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp -llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp llvm/lib/Transforms/ObjCARC/BlotMapVector.h llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 00934cc1ce6f2d..5bcf3e88622a25 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary { GlobalValue::DefaultVisibility, /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false, /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition), -/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, -std::vector(), std::move(Edges), -std::vector(), +/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(), +std::move(Edges), std::vector(), std::vector(), std::vector(), std::vector(), @@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary { /// Function summary specific flags. FFlags FunFlags; - /// The synthesized entry count of the function. - /// This is only populated during ThinLink phase and remains unused while - /// generating per-module su
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/107471 >From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001 From: mingmingl Date: Thu, 5 Sep 2024 14:36:28 -0700 Subject: [PATCH 1/4] [NFCI]Clean up synthetic count --- clang/docs/tools/clang-formatted-files.txt| 4 - llvm/include/llvm/IR/ModuleSummaryIndex.h | 21 +-- llvm/include/llvm/IR/ModuleSummaryIndexYAML.h | 2 +- .../llvm/LTO/SummaryBasedOptimizations.h | 18 --- .../IPO/SyntheticCountsPropagation.h | 23 --- llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 9 +- llvm/lib/AsmParser/LLParser.cpp | 4 +- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 8 +- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 2 +- llvm/lib/LTO/CMakeLists.txt | 1 - llvm/lib/LTO/LTO.cpp | 3 - llvm/lib/LTO/SummaryBasedOptimizations.cpp| 88 --- llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 3 - llvm/lib/Passes/PassBuilderPipelines.cpp | 9 -- llvm/lib/Passes/PassRegistry.def | 1 - llvm/lib/Transforms/IPO/CMakeLists.txt| 1 - .../IPO/SyntheticCountsPropagation.cpp| 139 -- .../Transforms/Utils/FunctionImportUtils.cpp | 18 +-- .../SyntheticCountsPropagation/initial.ll | 79 -- .../SyntheticCountsPropagation/prop.ll| 50 --- .../SyntheticCountsPropagation/scc.ll | 19 --- llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn | 1 - .../llvm/lib/Transforms/IPO/BUILD.gn | 1 - 23 files changed, 17 insertions(+), 487 deletions(-) delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h delete mode 100644 llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll diff --git a/clang/docs/tools/clang-formatted-files.txt b/clang/docs/tools/clang-formatted-files.txt index 62871133a68075..28623554d35a11 100644 --- a/clang/docs/tools/clang-formatted-files.txt +++ b/clang/docs/tools/clang-formatted-files.txt @@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h llvm/include/llvm/IR/StructuralHash.h llvm/include/llvm/IR/TrackingMDRef.h llvm/include/llvm/IR/UseListOrder.h -llvm/include/llvm/LTO/SummaryBasedOptimizations.h llvm/include/llvm/MC/MCAsmInfoCOFF.h llvm/include/llvm/MC/MCAsmInfoDarwin.h llvm/include/llvm/MC/MCAsmInfoELF.h @@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h llvm/include/llvm/Transforms/IPO/SCCP.h llvm/include/llvm/Transforms/IPO/StripSymbols.h -llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h llvm/include/llvm/Transforms/Scalar/ADCE.h @@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp llvm/lib/IR/Statepoint.cpp llvm/lib/IR/StructuralHash.cpp llvm/lib/IR/ValueSymbolTable.cpp -llvm/lib/LTO/SummaryBasedOptimizations.cpp llvm/lib/MC/MCAsmInfoCOFF.cpp llvm/lib/MC/MCAsmInfoELF.cpp llvm/lib/MC/MCAsmInfoGOFF.cpp @@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/SampleContextTracker.cpp llvm/lib/Transforms/IPO/SampleProfileProbe.cpp llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp -llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp llvm/lib/Transforms/ObjCARC/BlotMapVector.h llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 00934cc1ce6f2d..5bcf3e88622a25 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary { GlobalValue::DefaultVisibility, /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false, /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition), -/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, -std::vector(), std::move(Edges), -std::vector(), +/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(), +std::move(Edges), std::vector(), std::vector(), std::vector(), std::vector(), @@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary { /// Function summary specific flags. FFlags FunFlags; - /// The synthesized entry count of the function. - /// This is only populated during ThinLink phase and remains unused while - /// generating per-module su
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 ready_for_review https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (PR #107471)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)
https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/107471 >From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001 From: mingmingl Date: Thu, 5 Sep 2024 14:36:28 -0700 Subject: [PATCH 1/3] [NFCI]Clean up synthetic count --- clang/docs/tools/clang-formatted-files.txt| 4 - llvm/include/llvm/IR/ModuleSummaryIndex.h | 21 +-- llvm/include/llvm/IR/ModuleSummaryIndexYAML.h | 2 +- .../llvm/LTO/SummaryBasedOptimizations.h | 18 --- .../IPO/SyntheticCountsPropagation.h | 23 --- llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 9 +- llvm/lib/AsmParser/LLParser.cpp | 4 +- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 8 +- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 2 +- llvm/lib/LTO/CMakeLists.txt | 1 - llvm/lib/LTO/LTO.cpp | 3 - llvm/lib/LTO/SummaryBasedOptimizations.cpp| 88 --- llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 3 - llvm/lib/Passes/PassBuilderPipelines.cpp | 9 -- llvm/lib/Passes/PassRegistry.def | 1 - llvm/lib/Transforms/IPO/CMakeLists.txt| 1 - .../IPO/SyntheticCountsPropagation.cpp| 139 -- .../Transforms/Utils/FunctionImportUtils.cpp | 18 +-- .../SyntheticCountsPropagation/initial.ll | 79 -- .../SyntheticCountsPropagation/prop.ll| 50 --- .../SyntheticCountsPropagation/scc.ll | 19 --- llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn | 1 - .../llvm/lib/Transforms/IPO/BUILD.gn | 1 - 23 files changed, 17 insertions(+), 487 deletions(-) delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h delete mode 100644 llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll diff --git a/clang/docs/tools/clang-formatted-files.txt b/clang/docs/tools/clang-formatted-files.txt index 62871133a68075..28623554d35a11 100644 --- a/clang/docs/tools/clang-formatted-files.txt +++ b/clang/docs/tools/clang-formatted-files.txt @@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h llvm/include/llvm/IR/StructuralHash.h llvm/include/llvm/IR/TrackingMDRef.h llvm/include/llvm/IR/UseListOrder.h -llvm/include/llvm/LTO/SummaryBasedOptimizations.h llvm/include/llvm/MC/MCAsmInfoCOFF.h llvm/include/llvm/MC/MCAsmInfoDarwin.h llvm/include/llvm/MC/MCAsmInfoELF.h @@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h llvm/include/llvm/Transforms/IPO/SCCP.h llvm/include/llvm/Transforms/IPO/StripSymbols.h -llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h llvm/include/llvm/Transforms/Scalar/ADCE.h @@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp llvm/lib/IR/Statepoint.cpp llvm/lib/IR/StructuralHash.cpp llvm/lib/IR/ValueSymbolTable.cpp -llvm/lib/LTO/SummaryBasedOptimizations.cpp llvm/lib/MC/MCAsmInfoCOFF.cpp llvm/lib/MC/MCAsmInfoELF.cpp llvm/lib/MC/MCAsmInfoGOFF.cpp @@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/SampleContextTracker.cpp llvm/lib/Transforms/IPO/SampleProfileProbe.cpp llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp -llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp llvm/lib/Transforms/ObjCARC/BlotMapVector.h llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 00934cc1ce6f2d..5bcf3e88622a25 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary { GlobalValue::DefaultVisibility, /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false, /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition), -/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, -std::vector(), std::move(Edges), -std::vector(), +/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(), +std::move(Edges), std::vector(), std::vector(), std::vector(), std::vector(), @@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary { /// Function summary specific flags. FFlags FunFlags; - /// The synthesized entry count of the function. - /// This is only populated during ThinLink phase and remains unused while - /// generating per-module su
[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Clean up synthetic count pass (PR #107471)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/107471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [NFCI]Clean up synthetic count (PR #107471)
https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/107471 >From a18e042a0d03321d3ffca9ede92e976343a1b4c7 Mon Sep 17 00:00:00 2001 From: mingmingl Date: Thu, 5 Sep 2024 14:36:28 -0700 Subject: [PATCH 1/2] [NFCI]Clean up synthetic count --- clang/docs/tools/clang-formatted-files.txt| 4 - llvm/include/llvm/IR/ModuleSummaryIndex.h | 21 +-- llvm/include/llvm/IR/ModuleSummaryIndexYAML.h | 2 +- .../llvm/LTO/SummaryBasedOptimizations.h | 18 --- .../IPO/SyntheticCountsPropagation.h | 23 --- llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 9 +- llvm/lib/AsmParser/LLParser.cpp | 4 +- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 8 +- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | 2 +- llvm/lib/LTO/CMakeLists.txt | 1 - llvm/lib/LTO/LTO.cpp | 3 - llvm/lib/LTO/SummaryBasedOptimizations.cpp| 88 --- llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 3 - llvm/lib/Passes/PassBuilderPipelines.cpp | 9 -- llvm/lib/Passes/PassRegistry.def | 1 - llvm/lib/Transforms/IPO/CMakeLists.txt| 1 - .../IPO/SyntheticCountsPropagation.cpp| 139 -- .../Transforms/Utils/FunctionImportUtils.cpp | 18 +-- .../SyntheticCountsPropagation/initial.ll | 79 -- .../SyntheticCountsPropagation/prop.ll| 50 --- .../SyntheticCountsPropagation/scc.ll | 19 --- llvm/utils/gn/secondary/llvm/lib/LTO/BUILD.gn | 1 - .../llvm/lib/Transforms/IPO/BUILD.gn | 1 - 23 files changed, 17 insertions(+), 487 deletions(-) delete mode 100644 llvm/include/llvm/LTO/SummaryBasedOptimizations.h delete mode 100644 llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h delete mode 100644 llvm/lib/LTO/SummaryBasedOptimizations.cpp delete mode 100644 llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/initial.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/prop.ll delete mode 100644 llvm/test/Transforms/SyntheticCountsPropagation/scc.ll diff --git a/clang/docs/tools/clang-formatted-files.txt b/clang/docs/tools/clang-formatted-files.txt index 62871133a68075..28623554d35a11 100644 --- a/clang/docs/tools/clang-formatted-files.txt +++ b/clang/docs/tools/clang-formatted-files.txt @@ -5349,7 +5349,6 @@ llvm/include/llvm/IR/SSAContext.h llvm/include/llvm/IR/StructuralHash.h llvm/include/llvm/IR/TrackingMDRef.h llvm/include/llvm/IR/UseListOrder.h -llvm/include/llvm/LTO/SummaryBasedOptimizations.h llvm/include/llvm/MC/MCAsmInfoCOFF.h llvm/include/llvm/MC/MCAsmInfoDarwin.h llvm/include/llvm/MC/MCAsmInfoELF.h @@ -5586,7 +5585,6 @@ llvm/include/llvm/Transforms/IPO/SampleProfile.h llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h llvm/include/llvm/Transforms/IPO/SCCP.h llvm/include/llvm/Transforms/IPO/StripSymbols.h -llvm/include/llvm/Transforms/IPO/SyntheticCountsPropagation.h llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h llvm/include/llvm/Transforms/Scalar/ADCE.h @@ -6070,7 +6068,6 @@ llvm/lib/IR/SSAContext.cpp llvm/lib/IR/Statepoint.cpp llvm/lib/IR/StructuralHash.cpp llvm/lib/IR/ValueSymbolTable.cpp -llvm/lib/LTO/SummaryBasedOptimizations.cpp llvm/lib/MC/MCAsmInfoCOFF.cpp llvm/lib/MC/MCAsmInfoELF.cpp llvm/lib/MC/MCAsmInfoGOFF.cpp @@ -6861,7 +6858,6 @@ llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/lib/Transforms/IPO/SampleContextTracker.cpp llvm/lib/Transforms/IPO/SampleProfileProbe.cpp llvm/lib/Transforms/IPO/StripDeadPrototypes.cpp -llvm/lib/Transforms/IPO/SyntheticCountsPropagation.cpp llvm/lib/Transforms/ObjCARC/BlotMapVector.h llvm/lib/Transforms/ObjCARC/ObjCARCExpand.cpp llvm/lib/Transforms/ObjCARC/ProvenanceAnalysis.h diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h index 00934cc1ce6f2d..5bcf3e88622a25 100644 --- a/llvm/include/llvm/IR/ModuleSummaryIndex.h +++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h @@ -856,9 +856,8 @@ class FunctionSummary : public GlobalValueSummary { GlobalValue::DefaultVisibility, /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false, /*CanAutoHide=*/false, GlobalValueSummary::ImportKind::Definition), -/*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, -std::vector(), std::move(Edges), -std::vector(), +/*NumInsts=*/0, FunctionSummary::FFlags{}, std::vector(), +std::move(Edges), std::vector(), std::vector(), std::vector(), std::vector(), @@ -878,11 +877,6 @@ class FunctionSummary : public GlobalValueSummary { /// Function summary specific flags. FFlags FunFlags; - /// The synthesized entry count of the function. - /// This is only populated during ThinLink phase and remains unused while - /// generating per-module su
[clang] [llvm] [LTO] Reduce memory usage for import lists (PR #106772)
@@ -215,13 +215,49 @@ class FunctionImporter { SmallVector getSourceModules() const; std::optional -getImportType(const FunctionsToImportTy &GUIDToImportType, - GlobalValue::GUID GUID) const; +getImportType(StringRef FromModule, GlobalValue::GUID GUID) const; + +// Iterate over the import list. The caller gets tuples of FromModule, +// GUID, and ImportKind instead of import IDs. +auto begin() const { return map_iterator(Imports.begin(), IDs); } +auto end() const { return map_iterator(Imports.end(), IDs); } + +friend class SortedImportList; + + private: +ImportIDTable &IDs; +DenseSet Imports; + }; + + // A read-only copy of ImportMapTy with its contents sorted according to the + // given comparison function. + class SortedImportList { + public: +SortedImportList(const ImportMapTy &ImportMap, + llvm::function_ref< + bool(const std::pair &, + const std::pair &)> + Comp) +: IDs(ImportMap.IDs), Imports(iterator_range(ImportMap.Imports)) { + llvm::sort(Imports, [&](ImportIDTable::ImportIDTy L, minglotus-6 wrote: I suggest using `llvm::stable_sort` to make it explicit it's a stable sort (i.e., even if the existing implementation is stable already) https://github.com/llvm/llvm-project/pull/106772 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LTO] Reduce memory usage for import lists (PR #106772)
@@ -1827,15 +1831,13 @@ Expected FunctionImporter::importFunctions( if (Error Err = SrcModule->materializeMetadata()) return std::move(Err); -auto &ImportGUIDs = FunctionsToImportPerModule->second; - // Find the globals to import SetVector GlobalsToImport; for (Function &F : *SrcModule) { if (!F.hasName()) continue; auto GUID = F.getGUID(); - auto MaybeImportType = ImportList.getImportType(ImportGUIDs, GUID); + auto MaybeImportType = ImportList.getImportType(Name, GUID); minglotus-6 wrote: nit pick: s/Name/ModName https://github.com/llvm/llvm-project/pull/106772 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
minglotus-6 wrote: Fixing https://lab.llvm.org/buildbot/#/builders/95/builds/672 in https://github.com/llvm/llvm-project/pull/97228 https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
minglotus-6 wrote: Going to merge as the failed tests are not related to this PR. ``` cat github-pull-requests_build_76910_linux-linux-x64.log | grep -B 2 -A 10 "Failed Tests " -- Failed Tests (1): BOLT :: X86/reader-stale-yaml-std.test Testing Time: 385.93s Total Discovered Tests: 120952 Skipped : 47 (0.04%) Unsupported : 3335 (2.76%) Passed : 117251 (96.94%) Expectedly Failed:318 (0.26%) -- -- Failed Tests (1): BOLT :: X86/reader-stale-yaml-std.test Testing Time: 1.98s Total Discovered Tests: 454 Skipped : 7 (1.54%) Unsupported : 13 (2.86%) Passed : 431 (94.93%) Expectedly Failed: 2 (0.44%) ``` https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + if (!EnableVTableProfileUse || VPtr == nullptr || + !VPtr->getMetadata(LLVMContext::MD_prof)) +return; + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call. After each candidate +// is promoted, a new basic block is created for the indirect fallback basic +// block and indirect call `CB` is moved into this new BB. +BasicBlock *OriginalBB = CB.getParent(); +promoteCallWithVTableCmp( +CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints, +createBranchWeights(CB.getContext(), Candidate.Count, +TotalFuncCount - Candidate.Count)); + +int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent()); + +ORE.emi
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency +// chain for the subsequent candidates. Set its value to 1 for non-last +// candidate and allow option to override it for the last candidate. +int MaxNumVTable = 1; +if (I == CandidateSize - 1) + MaxNumVTable = ICPMaxNumVTableLastCandidate; + +if ((int)Candidate.AddressPoints.size() > MaxNumVTable) { + return false; +} + } + + // If the indirect fallback is not cold, don't compare vtables. + if (PSI && PSI->hasProfileSummary() && + !PSI->isColdCount(RemainingVTableCount)) +return false; minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) { return F.getMetadata(getPGOFuncNameMetadataName()); } -void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) { - // Only for internal linkage functions. - if (PGOFuncName == F.getName()) - return; - // Don't create duplicated meta-data. - if (getPGOFuncNameMetadata(F)) +static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName, + StringRef PGOName) { + // For internal linkage objects, its name is not the same as its PGO name. minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to +// this vtable, for example, `Base*` can be a pointer to an instantiated type +// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html +static std::optional +getAddressPointOffset(const GlobalVariable &VTableVar, + StringRef CompatibleType) { + SmallVector Types; + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` is used via its `UserInst`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Guaranteed by ICP transformation"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) minglotus-6 wrote: make sense. I hoisted this check to caller `tryToSinkInstructions`, and updated callee assert with `DestBB->getUniquePredecessor() == BB`. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + if (!EnableVTableProfileUse || VPtr == nullptr || + !VPtr->getMetadata(LLVMContext::MD_prof)) +return; + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call. After each candidate +// is promoted, a new basic block is created for the indirect fallback basic +// block and indirect call `CB` is moved into this new BB. +BasicBlock *OriginalBB = CB.getParent(); +promoteCallWithVTableCmp( +CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints, +createBranchWeights(CB.getContext(), Candidate.Count, +TotalFuncCount - Candidate.Count)); + +int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent()); + +ORE.emi
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency +// chain for the subsequent candidates. Set its value to 1 for non-last +// candidate and allow option to override it for the last candidate. +int MaxNumVTable = 1; +if (I == CandidateSize - 1) + MaxNumVTable = ICPMaxNumVTableLastCandidate; + +if ((int)Candidate.AddressPoints.size() > MaxNumVTable) { minglotus-6 wrote: done by `LLVM_DEBUG`. * Didn't do missed-opt remark mainly because function comparison might be applied when vtable comparison is not profitable, and readers need to join missed vtable message and applied function one to make sense of the remark messages. Now `opt llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -debug -S` gives the following log ``` Work on callsite call void %1(ptr %d), !prof !10 Num_targets: 3 Candidate 0 Count=600 Target_func: 3827408714133779784 Candidate 1 Count=500 Target_func: 5837445539218476403 Candidate 2 Count=400 Target_func: 9381788221313981078 Work on callsite #0 call void %1(ptr %d), !prof !10 Num_targets: 3 Num_candidates: 3 Candidate 0 Count=600 Target_func: 3827408714133779784 Candidate 1 Count=500 Target_func: 5837445539218476403 Candidate 2 Count=400 Target_func: 9381788221313981078 Computing vtable infos for callsite #1 Cannot find vtable definition for 12345678; maybe the vtable isn't imported Evaluating vtable profitability for callsite #1 call void %1(ptr %d), !prof !10 Candidate 0 FunctionCount: 600, VTableCounts: {Derived1, 600} Candidate 1 FunctionCount: 500, VTableCounts: {Derived2, 500} Candidate 2 FunctionCount: 400, VTableCounts: {Base1, 200} {Derived3, 200} allow at most 1 and got 2 vtables. Bail out for vtable comparison. ``` https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -140,14 +348,56 @@ class IndirectCallPromoter { // indirect callee with functions. Returns true if there are IR // transformations and false otherwise. bool tryToPromoteWithFuncCmp( - CallBase &CB, const std::vector &Candidates, - uint64_t TotalCount, ArrayRef ICallProfDataRef, - uint32_t NumCandidates); + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, uint64_t TotalCount, + ArrayRef ICallProfDataRef, uint32_t NumCandidates, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Promote a list of targets for one indirect call by comparing vtables with + // functions. Returns true if there are IR transformations and false + // otherwise. + bool tryToPromoteWithVTableCmp( + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, + uint64_t TotalFuncCount, uint32_t NumCandidates, + MutableArrayRef ICallProfDataRef, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Returns true if it's profitable to compare vtables for the callsite. + bool isProfitableToCompareVTables( + const CallBase &CB, const std::vector &Candidates, + uint64_t TotalCount); + + // Given an indirect callsite and the list of function candidates, compute + // the following vtable information in output parameters and returns vtable + // pointer if type profiles exist. + // - Populate `VTableGUIDCounts` with using !prof + // metadata attached on the vtable pointer. + // - For each function candidate, finds out the vtables from which it get minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to +// this vtable, for example, `Base*` can be a pointer to an instantiated type minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to +// this vtable, for example, `Base*` can be a pointer to an instantiated type +// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html +static std::optional +getAddressPointOffset(const GlobalVariable &VTableVar, + StringRef CompatibleType) { + SmallVector Types; + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` is used via its `UserInst`. minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison minglotus-6 wrote: > I think what you mean is that doing so for an earlier candidate delays the > comparisons for later candidates, but that for the last candidate, only the > fallback path is affected? Yes. I updated the comment. > Do we expect to set this parameter above 1? Yes. Setting it to 1 is to make the default parameter conservative. Based on my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable performance win compared with 1, and setting it to 3 doesn't give stable performance wins across different binaries or across runs. One interesting thing is the actual cost of materializing one vtable address point depends on compile option `fpic/fpie`, and the cost of materializing a vtable address point and a function is comparable if `fpie/fpic` option is the same. * For non-pie binaries, `@vtable + address-point-offset` is lowered to an immediate representing vtable address point. It could be folded into `icmp` IR after lowering, something like `icmp #imm, `. For pie (but non-pic) binaries, `@vtable + address-point-offset` is lowered to a pc-relative address. So it takes one instruction to materialize the pc-relative address itself(something like `leaq 2890849(%rip), %rdx # 0x30fe50 <_ZTV8Derived1>` for x86). https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) { return F.getMetadata(getPGOFuncNameMetadataName()); } -void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) { - // Only for internal linkage functions. - if (PGOFuncName == F.getName()) - return; - // Don't create duplicated meta-data. - if (getPGOFuncNameMetadata(F)) +static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName, + StringRef PGOName) { + // For internal linkage objects, its name is not the same as its PGO name. + if (GO.getName() == PGOName) return; - LLVMContext &C = F.getContext(); - MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName)); - F.setMetadata(getPGOFuncNameMetadataName(), N); + + // Don't created duplicated metadata. minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +114,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. minglotus-6 wrote: I updated this patch with `LLVM_DEBUG` to catch the pathological case. In my internal change I do have such `LLVM_DEBUG` logs when testing this change and didn't find any. The threshold is to prevent backsliding. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +790,114 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold) + return false; + +RemainingVTableCount -= Candidate.Count; + +int MaxNumVTable = 1; minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1393,16 +1398,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) { return F.getMetadata(getPGOFuncNameMetadataName()); } -void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) { - // Only for internal linkage functions. - if (PGOFuncName == F.getName()) - return; - // Don't create duplicated meta-data. - if (getPGOFuncNameMetadata(F)) +static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName, + StringRef PGOName) { + // For internal linkage objects, its name is not the same as its PGO name. + if (GO.getName() == PGOName) return; - LLVMContext &C = F.getContext(); - MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName)); - F.setMetadata(getPGOFuncNameMetadataName(), N); + + // Don't created duplictaed metadata. minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee( uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount; for (auto Entry : *VMap) { if (isa(Entry.first)) -if (auto *CI = dyn_cast_or_null(Entry.second)) +if (auto *CI = dyn_cast_or_null(Entry.second)) { minglotus-6 wrote: I used a lambda function to improve the code duplication. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +790,114 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold) + return false; + +RemainingVTableCount -= Candidate.Count; + +int MaxNumVTable = 1; +if (I == Candidates.size() - 1) minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +746,127 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!ICPEnableVTableCmp || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage) + return false; + +RemainingVTableCount -= Candidate.Count; + +int NumAdditionalVTable = 0; +if (I == Candidates.size() - 1) + NumAdditionalVTable = ICPNumAdditionalVTableLast; + +int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1; +if (ActualNumAdditionalInst > NumAdditionalVTable) { + return false; +} + } + + // If the indirect fallback is not cold, don't compare vtables. + if (PSI && PSI->hasProfileSummary() && + !PSI->isColdCount(RemainingVTableCount)) +return false; + + return true; +} + +static void +computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM, + VirtualCallSiteTypeInfoMap &VirtualCSInfo) { + auto &FAM = MAM.getResult(M).getManager(); + auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & { +return FAM.getResult(F); + }; + + auto compute = [&](Function *Func) { +if (!Func || Func->use_empty()) + return; +// Iterate all type.test calls and find all indirect calls. +// TODO: Add llvm.public.type.test +for (Use &U : llvm::make_early_inc_range(Func->uses())) { + auto *CI = dyn_cast(U.getUser()); + if (!CI) +continue; + auto *TypeMDVal = cast(CI->getArgOperand(1)); + if (!TypeMDVal) +continue; + auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata()); + if (!CompatibleTypeId) +continue; + + // Find out all devirtualizable call sites given a llvm.type.test + // intrinsic call. + SmallVector DevirtCalls; + SmallVector Assumes; + auto &DT = LookupDomTree(*CI->getFunction()); + findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT); + + // type-id, offset from the address point + // combined with type metadata to compute function offset + for (auto &DevirtCall : DevirtCalls) { +CallBase &CB = DevirtCall.CB; +// Given an indirect call, try find the instruction which loads a +// pointer to virtual table. +Instruction *VTablePtr = +PGOIndirectCallVisitor::tryGetVTableInstruction(&CB); +if (!VTablePtr) + continue; +VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr, + CompatibleTypeId->getString()}; + } +} + }; + + // Right now only llvm.type.test is used to find out virtual call sites. + // With ThinLTO and whole-program-devirtualization, llvm.type.test and + // llvm.public.type.test are emitted, and llvm.public.type.test is either + // refined to llvm.type.test or dropped before indirect-call-promotion pass. minglotus-6 wrote: > Briefly explain in the comment why type test is needed for vtable based > indirect call promotion? Iterating `llvm.type.test` users find virtual calls out of all indirect calls in a more compile-time efficient manner. Moreover, the first parameter of `llvm.type.test` is the compatible type string. Since one vtable definition can be compatible with multiple vtables [1], compatible type string together with `getelementptr` are important to calculate the funct
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee( uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount; for (auto Entry : *VMap) { if (isa(Entry.first)) -if (auto *CI = dyn_cast_or_null(Entry.second)) +if (auto *CI = dyn_cast_or_null(Entry.second)) { minglotus-6 wrote: As discussed offline, I'll keep the current code for now as `CallBase` is a load bearing class. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +114,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. minglotus-6 wrote: > Is this the sum counts of all vtables corresponding to a candidate function? For hot basic blocks, the counters are on the order of O(100k) or higher, and the diff between vtable value profile counters and function value profile counters is minuscule. For iFDO, the value profile counters are collected per instruction (as opposed to per basic block) by partially thread-safe function ['instrumentTargetValueImpl'](https://github.com/llvm/llvm-project/blob/31440738bd6b1345ea978914fe01d2e19f4aa373/compiler-rt/lib/profile/InstrProfilingValue.c#L150-L239) , so the code doesn't assume them to be identical. For SamplePGO type profile (WIP), the total count of one instruction is the same as its basic block. So as long as 'vtable' and indirect call are in the same basic block, the sum would be identical (both are the count of basic block) https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); minglotus-6 wrote: The code updates sum count and individual `` pairs within one `!prof` metadata, so used the sum count from the `!prof` metadata. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers` + // is enabled and a vtable is used to load multiple virtual functions. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + // Update value profiles on the vtable pointer if it exists. + if (VPtr) +updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call before indirect call +// promotion. +BasicBlock *OriginalBB = CB.getParent(); minglotus-6 wrote: After each candidate is promoted per loop iteration, a new basic block is created for the indirect fallback basic block. Indirect call `CB` is moved so we use it to find the new (per-loop-iteration) basic block. I updated the comment to make this (per-loop-iteration new basic block) m
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +790,114 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTablePercentageThreshold) + return false; + +RemainingVTableCount -= Candidate.Count; + +int MaxNumVTable = 1; minglotus-6 wrote: I added a comment for it. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +626,154 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers` + // is enabled and a vtable is used to load multiple virtual functions. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + // Update value profiles on the vtable pointer if it exists. + if (VPtr) +updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call before indirect call +// promotion. +BasicBlock *OriginalBB = CB.getParent(); +promoteCallWithVTableCmp( +CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints, +createBranchWeights(CB.getContext(), Candidate.Count, +TotalFuncCount - Candidate.Count)); + +int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent()); + +ORE.emit([&]() {
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 commented: PTAL, thanks! https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -2538,6 +2538,190 @@ Value *getSalvageOpsForIcmpOp(ICmpInst *Icmp, uint64_t CurrentLocOps, return Icmp->getOperand(0); } +void llvm::tryToSinkInstructionDbgValues( minglotus-6 wrote: The `gep` and `load` sequence doesn't have debug values (so far) so it's not easy to test within this PR. IMO helper functions that handles debug value along instruction sink (e.g. https://github.com/llvm/llvm-project/blob/1752740f4b4b752bbe2987a0de398c6f671ceb71/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L4769) will help other passes (e.g., simplify-cfg, etc) that moves instruction around. And test cases can be added for these shared helper functions. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I, ArrayRef Insns, ArrayRef DPInsns); +void tryToSinkInstructionDbgValues( minglotus-6 wrote: Debug intrinsics are not handled in this PR, I un-do `Local.h/cpp` changes (and overlooked this comment). The updated compiler-rt test (instrprof-vtable-value-prof.cpp) enabled `-g` and tested IR for the first callsite. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -240,27 +474,102 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite( Ret.push_back(PromotionCandidate(TargetFunction, Count)); TotalCount -= Count; } + return Ret; } +Constant *IndirectCallPromoter::getOrCreateVTableAddressPointVar( +GlobalVariable *GV, uint64_t AddressPointOffset) { + auto [Iter, Inserted] = + VTableAddressPointOffsetVal[GV].try_emplace(AddressPointOffset, nullptr); + if (Inserted) +Iter->second = getVTableAddressPointOffset(GV, AddressPointOffset); + return Iter->second; +} + +Instruction *IndirectCallPromoter::computeVTableInfos( minglotus-6 wrote: Document output parameters and return value around class method declaration, and added more comments in the function (e.g., one example with detailed steps) as suggested. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +746,127 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!ICPEnableVTableCmp || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage) + return false; + +RemainingVTableCount -= Candidate.Count; + +int NumAdditionalVTable = 0; +if (I == Candidates.size() - 1) + NumAdditionalVTable = ICPNumAdditionalVTableLast; + +int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1; +if (ActualNumAdditionalInst > NumAdditionalVTable) { + return false; +} + } + + // If the indirect fallback is not cold, don't compare vtables. + if (PSI && PSI->hasProfileSummary() && + !PSI->isColdCount(RemainingVTableCount)) +return false; + + return true; +} + +static void +computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM, + VirtualCallSiteTypeInfoMap &VirtualCSInfo) { + auto &FAM = MAM.getResult(M).getManager(); + auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & { +return FAM.getResult(F); + }; + + auto compute = [&](Function *Func) { +if (!Func || Func->use_empty()) + return; +// Iterate all type.test calls and find all indirect calls. +// TODO: Add llvm.public.type.test +for (Use &U : llvm::make_early_inc_range(Func->uses())) { + auto *CI = dyn_cast(U.getUser()); + if (!CI) +continue; + auto *TypeMDVal = cast(CI->getArgOperand(1)); + if (!TypeMDVal) +continue; + auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata()); + if (!CompatibleTypeId) +continue; + + // Find out all devirtualizable call sites given a llvm.type.test + // intrinsic call. + SmallVector DevirtCalls; + SmallVector Assumes; + auto &DT = LookupDomTree(*CI->getFunction()); + findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT); + + // type-id, offset from the address point + // combined with type metadata to compute function offset + for (auto &DevirtCall : DevirtCalls) { +CallBase &CB = DevirtCall.CB; +// Given an indirect call, try find the instruction which loads a +// pointer to virtual table. +Instruction *VTablePtr = +PGOIndirectCallVisitor::tryGetVTableInstruction(&CB); +if (!VTablePtr) + continue; +VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr, + CompatibleTypeId->getString()}; + } +} + }; + + // Right now only llvm.type.test is used to find out virtual call sites. + // With ThinLTO and whole-program-devirtualization, llvm.type.test and + // llvm.public.type.test are emitted, and llvm.public.type.test is either + // refined to llvm.type.test or dropped before indirect-call-promotion pass. minglotus-6 wrote: If ICP pass is disabled in prelink, we don't need to analyze `llvm.public.type.test` as you point out. `llvm.public.type.test` was analyzed here because I didn't disable prelink ICP for compiler-rt test. Now I updated compiler-rt test to disable prelink ICP and enable postlink ICP, and skip the analysis for `llvm.public.type.test`. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commi
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -140,14 +337,51 @@ class IndirectCallPromoter { // indirect callee with functions. Returns true if there are IR // transformations and false otherwise. bool tryToPromoteWithFuncCmp( - CallBase &CB, const std::vector &Candidates, - uint64_t TotalCount, ArrayRef ICallProfDataRef, - uint32_t NumCandidates); + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, uint64_t TotalCount, + ArrayRef ICallProfDataRef, uint32_t NumCandidates, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Promote a list of targets for one indirect call by comparing vtables with + // functions. Returns true if there are IR transformations and false + // otherwise. + bool tryToPromoteWithVTableCmp( + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, + uint64_t TotalFuncCount, uint32_t NumCandidates, + MutableArrayRef ICallProfDataRef, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Returns true if it's profitable to compare vtables. + bool isProfitableToCompareVTables( + const std::vector &Candidates, uint64_t TotalCount); + + // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each minglotus-6 wrote: I updated the function comment. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. minglotus-6 wrote: The original comment is broken. I updated it. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, + Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock( +PHINode::getIncomingValueNumForOperand(OperandNo)); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, Basic
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty()) + continue; + +// Update VTableGUIDCounts +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -341,6 +879,17 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO, return false; } bool Changed = false; + VirtualCallSiteTypeInfoMap VirtualCSInfo; + + computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo); + + // This map records states across functions in an LLVM IR module. minglotus-6 wrote: `VTableAddressPointOffsetVal` stores the vtable address points. The vtable address point of a given `` is static (doesn't change after being computed once). `IndirectCallPromoter::getOrCreateVTableAddressPointVar` creates the map entry the first time a `` pair is seen, as `promoteIndirectCalls` processes an IR module and calls `IndirectCallPromoter` repeatedly on each function. I reworded the comments. PTAL. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty()) + continue; + +// Update VTableGUIDCounts +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers` + // is enabled and a vtable is used to load multiple virtual functions. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + // Update value profiles on the vtable pointer if it exists. + if (VPtr) +updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + for (const auto &Candidate : Candidates) { +uint64_t IfCount = 0; minglotus-6 wrote: This variable represents the 'if-branch' count of the branch weights for conditional devirtualization. I removed it. Now `Candidate.Count` represents if branch count and `TotalFuncCount - Candidate.Count` represents else branch count. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) minglotus-6 wrote: I updated the function comment based on Teresa's input, and renamed the function to `getAddressPointOffset` https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, minglotus-6 wrote: Meanwhile, I can find a few places where the gist of helper function is used. I'm planning to have some canonical helper functions maybe in `llvm/lib/Transforms/Util` directory. Examples: 1) https://github.com/llvm/llvm-project/blob/ce5b371606422ed21cda0e24cdc89cb41cdc5600/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp#L1139-L1143 2) https://github.com/llvm/llvm-project/blob/ce5b371606422ed21cda0e24cdc89cb41cdc5600/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp#L3517-L3520 3) (made closer to this function by my change https://github.com/llvm/llvm-project/pull/93249/) https://github.com/llvm/llvm-project/blob/5f243b3fffca42ed320529a54aefd86087aa85f8/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L5011-L5014 https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) minglotus-6 wrote: Type metadata are generated for compatible vtables and virtual functions. https://gcc.godbolt.org/z/rr5YY7cdv is an example to show the `!type` metadata list. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, + Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock( +PHINode::getIncomingValueNumForOperand(OperandNo)); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) minglotus-6 wrote: In this context we can have `assert(DestBB->getUniquePredecessor() == BB`. I'm leaning towards not handling more generic cases if those doesn't show up. - `DestBB` is the indirect call fall back and created by `versionCallSiteWithCond`. `versionCallsiteWithCond` creates basic blocks and edges like this comment shows https://github.com/llvm/llvm-project/blob/5f243b3fffca42ed320529a54aefd86087aa85f8/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp#L197-L223 https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, + Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock( +PHINode::getIncomingValueNumForOperand(OperandNo)); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, Basic
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -89,41 +93,50 @@ // ICTEXT: # NumValueSites: // ICTEXT: 2 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750 -// ICTEXT: _ZN8Derived15func1Eii:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150 +// ICTEXT: _ZN8Derived14funcEii:50 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750 -// ICTEXT: _ZN8Derived15func2Eii:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750 +// ICTEXT: _ZN8Derived1D0Ev:250 // ICTEXT: # ValueKind = IPVK_VTableTarget: // ICTEXT: 2 // ICTEXT: # NumValueSites: // ICTEXT: 2 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750 -// ICTEXT: _ZTV8Derived1:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150 +// ICTEXT: _ZTV8Derived1:50 // ICTEXT: 2 // ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750 // ICTEXT: _ZTV8Derived1:250 +// Test indirect call promotion transformation using vtable profiles. +// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin -fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm -icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s --check-prefix=REMARK --implicit-check-not="!VP" + +// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, compare 1 vtables and sink 1 instructions +// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, compare 1 vtables and sink 1 instructions +// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, compare 1 vtables and sink 2 instructions +// REMARK: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, compare 1 vtables and sink 2 instructions minglotus-6 wrote: As explained above, only the first candidate of an indirect call site sees the original function count. This is consistent with existing remarks, as shown in this regression test (https://github.com/llvm/llvm-project/blob/cc2fafa1788908f69366821a04407083f770483e/llvm/test/Transforms/PGOProfile/indirect_call_promotion.ll#L5-L8) https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -89,41 +93,50 @@ // ICTEXT: # NumValueSites: // ICTEXT: 2 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750 -// ICTEXT: _ZN8Derived15func1Eii:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150 +// ICTEXT: _ZN8Derived14funcEii:50 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750 -// ICTEXT: _ZN8Derived15func2Eii:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750 +// ICTEXT: _ZN8Derived1D0Ev:250 // ICTEXT: # ValueKind = IPVK_VTableTarget: // ICTEXT: 2 // ICTEXT: # NumValueSites: // ICTEXT: 2 // ICTEXT: 2 -// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750 -// ICTEXT: _ZTV8Derived1:250 +// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150 +// ICTEXT: _ZTV8Derived1:50 // ICTEXT: 2 // ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750 // ICTEXT: _ZTV8Derived1:250 +// Test indirect call promotion transformation using vtable profiles. +// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin -fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm -icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s --check-prefix=REMARK --implicit-check-not="!VP" + +// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, compare 1 vtables and sink 1 instructions +// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, compare 1 vtables and sink 1 instructions minglotus-6 wrote: The first candidate's remark says 150 out of 200. After the first callee gets promoted, the total function count becomes 50 (200 - 50), so the second candidate's remark says 50 out of 50. Updated the remark message with the filename and line number to make it more explicit. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, minglotus-6 wrote: thanks for the catch. I removed it. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee( uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount; for (auto Entry : *VMap) { if (isa(Entry.first)) -if (auto *CI = dyn_cast_or_null(Entry.second)) +if (auto *CI = dyn_cast_or_null(Entry.second)) { minglotus-6 wrote: Makes sense to not further duplicate the code. How about something like this? ``` if (isa(Entry.first)) { if(auto* CB = dyn_cast_or_null(Entry.Base)) { CB->updateProfWeight(... Instruction* VPtr = PGOIndirectCallVisitor::tryGetVTableInstruction(CB); if (VPtr) scaleProfData(VPtr...) } ``` `CallBase` is the base class of five IR instructions [1] now. Among them `CoroAwaitSuspendInst` [1] and `GCStatePointInst` [2] are relatively new; the rest three are well understood. Despite more derived classes of `CallBase` might be added for new cases, the code above should be future-proof since `updateProfWeight` and `scaleProfData` are programmed to `!prof` (not `CallBase`). [1] https://github.com/llvm/llvm-project/blob/692ae5443b1778e138527ef55d799a4b535a36f9/llvm/lib/Transforms/Coroutines/CoroInstr.h#L85-L88 [2] https://github.com/llvm/llvm-project/blob/692ae5443b1778e138527ef55d799a4b535a36f9/llvm/include/llvm/IR/Statepoint.h#L61-L63 https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/minglotus-6 edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [IRPGO] [Draft] Import vtable definitions in ThinTLO and use more efficient vtable comparison sequence with cost-benefit analysis (PR #69141)
minglotus-6 wrote: Close this stale patch and won't merge conflicts in this one. Working on https://github.com/llvm/llvm-project/pull/81442 now. https://github.com/llvm/llvm-project/pull/69141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [IRPGO] [Draft] Import vtable definitions in ThinTLO and use more efficient vtable comparison sequence with cost-benefit analysis (PR #69141)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/69141 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
minglotus-6 wrote: > I think this might be breaking: FAIL: LLVM :: tools/gold/X86/thinlto.ll fixed in https://github.com/llvm/llvm-project/commit/20ed5b1f45871612570d3bd447121ac43e083c6a https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
minglotus-6 wrote: > I think this might be breaking: > FAIL: LLVM :: tools/gold/X86/thinlto.ll Thanks for reporting and sorry for this. I'll send out a fix soon. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
@@ -564,6 +581,12 @@ class GlobalValueSummary { bool canAutoHide() const { return Flags.CanAutoHide; } + bool shouldImportAsDec() const { minglotus-6 wrote: this makes sense, done. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
@@ -16,13 +16,13 @@ ^3 = gv: (guid: 2, summaries: (function: (module: ^1, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 10, calls: ((callee: ^15, relbf: 256, tail: 1) ; Summaries with different linkage types. -^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1), insts: 1))) +^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, importType: definition), insts: 1))) ; Make this one an alias with a forward reference to aliasee. -^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1), aliasee: ^14))) +^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, importType: definition), aliasee: ^14))) ^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: available_externally, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1))) ^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: linkonce, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1))) ^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: linkonce_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1))) -^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 1), insts: 1))) +^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0, importType: declaration), insts: 1))) minglotus-6 wrote: ah `canAutoHide:0` -> `canAutoHide:1` is not intentional. But I agree it's better to add a new record. Done by taking `^24` and incrementing `^ID` for the rest. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
@@ -2072,6 +2072,23 @@ void LLParser::parseOptionalVisibility(unsigned &Res) { Lex.Lex(); } +static GlobalValueSummary::ImportKind +parseOptionalImportType(lltok::Kind Kind) { + GlobalValueSummary::ImportKind Res; + switch (Kind) { + default: +Res = GlobalValueSummary::Definition; minglotus-6 wrote: done, by changing `parseOptionalImportType` to a member function of `LLParser` class (from a static free function) so it could return `tokError`. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
@@ -432,6 +432,18 @@ class GlobalValueSummary { /// Sububclass discriminator (for dyn_cast<> et al.) enum SummaryKind : unsigned { AliasKind, FunctionKind, GlobalVarKind }; + enum ImportKind : unsigned { +// The global value definition corresponding to the summary should be +// imported from source module +Definition = 0, + +// When its definition doesn't exist in the destination module and not +// imported (e.g., function is large to be inlined), the global value minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
@@ -635,7 +635,8 @@ static void computeFunctionSummary( HasIndirBranchToBlockAddress || HasIFuncCall; GlobalValueSummary::GVFlags Flags( F.getLinkage(), F.getVisibility(), NotEligibleForImport, - /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable()); + /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable(), + /*ImportType=*/GlobalValueSummary::ImportKind::Definition); minglotus-6 wrote: done. Fixed all callsites of `GVFlags` constructor. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ThinLTO]Record import type in GlobalValueSummary::GVFlags (PR #87597)
minglotus-6 wrote: Resolve review feedback, and 'backfilled' three affected tests in `clang/test/CodeGen`. Now `ninja check-llvm check-clang check-compiler-rt` passed locally. https://github.com/llvm/llvm-project/pull/87597 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [thinlto] Implement the import of vtable definitions. (PR #72551)
minglotus-6 wrote: Superseded by https://github.com/llvm/llvm-project/pull/79381 https://github.com/llvm/llvm-project/pull/72551 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [thinlto] Implement the import of vtable definitions. (PR #72551)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/72551 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
https://github.com/minglotus-6 closed https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -1362,8 +1372,8 @@ remapSamples(const sampleprof::FunctionSamples &Samples, BodySample.second.getSamples()); for (const auto &Target : BodySample.second.getCallTargets()) { Result.addCalledTargetSamples(BodySample.first.LineOffset, -MaskedDiscriminator, -Remapper(Target.first), Target.second); +MaskedDiscriminator, Remapper(Target.first), minglotus-6 wrote: undo it. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -0,0 +1,145 @@ +// REQUIRES: lld-available + +// RUN: rm -rf %t && split-file %s %t && cd %t minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -0,0 +1,145 @@ +// REQUIRES: lld-available + +// RUN: rm -rf %t && split-file %s %t && cd %t +// +// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling test.cpp -o test +// RUN: env LLVM_PROFILE_FILE=test.profraw ./test + +// Show vtable profiles from raw profile. +// RUN: llvm-profdata show --function=main --ic-targets -show-vtables test.profraw | FileCheck %s --check-prefixes=COMMON,RAW + +// Generate indexed profile from raw profile and show the data. +// RUN: llvm-profdata merge test.profraw -o test.profdata +// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED + +// Generate text profile from raw and indexed profiles respectively and show the data. +// RUN: llvm-profdata merge --text test.profraw -o raw.proftext +// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text raw.proftext | FileCheck %s --check-prefix=ICTEXT +// RUN: llvm-profdata merge --text test.profdata -o indexed.proftext +// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text indexed.proftext | FileCheck %s --check-prefix=ICTEXT + +// Generate indexed profile from text profiles and show the data +// RUN: llvm-profdata merge --binary raw.proftext -o text.profraw +// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED +// RUN: llvm-profdata merge --binary indexed.proftext -o text.profdata +// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED + +// COMMON: Counters: +// COMMON-NEXT: main: +// COMMON-NEXT: Hash: 0x0f9a16fe6d398548 +// COMMON-NEXT: Counters: 2 +// COMMON-NEXT: Indirect Call Site Count: 2 +// COMMON-NEXT: Number of instrumented vtables: 2 +// RAW: Indirect Target Results: +// RAW-NEXT: [ 0, _ZN8Derived15func1Eii,250 ] (25.00%) +// RAW-NEXT: [ 0, test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%) +// RAW-NEXT: [ 1, _ZN8Derived15func2Eii,250 ] (25.00%) +// RAW-NEXT: [ 1, test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%) +// RAW-NEXT: VTable Results: +// RAW-NEXT: [ 0, _ZTV8Derived1,250 ] (25.00%) +// RAW-NEXT: [ 0, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,750 ] (75.00%) +// RAW-NEXT: [ 1, _ZTV8Derived1,250 ] (25.00%) +// RAW-NEXT: [ 1, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E,750 ] (75.00%) +// INDEXED: Indirect Target Results: +// INDEXED-NEXT: [ 0, test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%) +// INDEXED-NEXT: [ 0, _ZN8Derived15func1Eii,250 ] (25.00%) +// INDEXED-NEXT: [ 1, test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%) +// INDEXED-NEXT: [ 1, _ZN8Derived15func2Eii,250 ] (25.00%) +// INDEXED-NEXT: VTable Results: +// INDEXED-NEXT: [ 0, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%) +// INDEXED-NEXT: [ 0, _ZTV8Derived1,250 ] (25.00%) +// INDEXED-NEXT: [ 1, test.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%) +// INDEXED-NEXT: [ 1, _ZTV8Derived1,250 ] (25.00%) +// COMMON: Instrumentation level: IR entry_first = 0 +// COMMON-NEXT: Functions shown: 1 +// COMMON-NEXT: Total functions: 6 +// COMMON-NEXT: Maximum function count: 1000 +// COMMON-NEXT: Maximum internal block count: 250 +// COMMON-NEXT: Statistics for indirect call sites profile: +// COMMON-NEXT: Total number of sites: 2 +// COMMON-NEXT: Total number of sites with values: 2 +// COMMON-NEXT: Total number of profiled values: 4 +// COMMON-NEXT: Value sites histogram: +// COMMON-NEXT: NumTargets, SiteCount +// COMMON-NEXT: 2, 2 +// COMMON-NEXT: Statistics for vtable profile: +// COMMON-NEXT: Total number of sites: 2 +// COMMON-NEXT: Total number of sites with values: 2 +// COMMON-NEXT: Total number of profiled values: 4 +// COMMON-NEXT: Value sites histogram: +// COMMON-NEXT: NumTargets, SiteCount +// COMMON-NEXT: 2, 2 + +// ICTEXT: :ir +// ICTEXT: main +// ICTEXT: # Func Hash: +// ICTEXT: 1124236338992350536 +// ICTEXT: # Num Counters: +// ICTEXT: 2 +// ICTEXT: # Counter Values: +// ICTEXT: 1000 +// ICTEXT: 1 +// ICTEXT: # Num Value Kinds: +// ICTEXT: 2 +// ICTEXT: # ValueKind = IPVK_IndirectCallTarget: +// ICTEXT: 0 +// ICTEXT: # NumValueSites: +// ICTEXT: 2 +// ICTEXT: 2 +// ICTEXT: test.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750 +// ICTEXT: _ZN8Derived15func1Eii:250 +// ICTEXT: 2 +// ICTEXT: test.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750 +// ICTEXT: _ZN8Derived15func2Eii:250 +// ICTEXT: # ValueKind = IPVK_VTableTarget: +// ICTEXT: 2 +// ICTEXT: # NumValueSites: +// ICTEXT: 2 +// ICTEXT: 2 +// ICTEXT: test.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750 +/
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -676,24 +722,66 @@ TEST_P(InstrProfReaderWriterTest, icall_data_read_write) { Expected R = Reader->getInstrProfRecord("caller", 0x1234); ASSERT_THAT_ERROR(R.takeError(), Succeeded()); + + // Test the number of instrumented indirect call sites and the number of + // profiled values at each site. ASSERT_EQ(4U, R->getNumValueSites(IPVK_IndirectCallTarget)); EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 0)); EXPECT_EQ(0U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 1)); EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 2)); - EXPECT_EQ(1U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3)); + EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3)); + + // Test the number of instrumented vtable sites and the number of profiled + // values at each site. + ASSERT_EQ(2U, R->getNumValueSites(IPVK_VTableTarget)); + EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_VTableTarget, 0)); + EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_VTableTarget, 1)); + + // First indirect site. + { +uint64_t TotalC; +std::unique_ptr VD = +R->getValueForSite(IPVK_IndirectCallTarget, 0, &TotalC); + +EXPECT_EQ(3U * getProfWeight(), VD[0].Count); +EXPECT_EQ(2U * getProfWeight(), VD[1].Count); +EXPECT_EQ(1U * getProfWeight(), VD[2].Count); +EXPECT_EQ(6U * getProfWeight(), TotalC); + +EXPECT_EQ(StringRef((const char *)VD[0].Value, 7), StringRef("callee3")); minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -0,0 +1,145 @@ +// REQUIRES: lld-available + +// RUN: rm -rf %t && split-file %s %t && cd %t +// +// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling test.cpp -o test +// RUN: env LLVM_PROFILE_FILE=test.profraw ./test + +// Show vtable profiles from raw profile. +// RUN: llvm-profdata show --function=main --ic-targets -show-vtables test.profraw | FileCheck %s --check-prefixes=COMMON,RAW minglotus-6 wrote: fixed it. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -676,24 +722,66 @@ TEST_P(InstrProfReaderWriterTest, icall_data_read_write) { Expected R = Reader->getInstrProfRecord("caller", 0x1234); ASSERT_THAT_ERROR(R.takeError(), Succeeded()); + + // Test the number of instrumented indirect call sites and the number of + // profiled values at each site. ASSERT_EQ(4U, R->getNumValueSites(IPVK_IndirectCallTarget)); EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 0)); EXPECT_EQ(0U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 1)); EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 2)); - EXPECT_EQ(1U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3)); + EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_IndirectCallTarget, 3)); + + // Test the number of instrumented vtable sites and the number of profiled + // values at each site. + ASSERT_EQ(2U, R->getNumValueSites(IPVK_VTableTarget)); + EXPECT_EQ(3U, R->getNumValueDataForSite(IPVK_VTableTarget, 0)); + EXPECT_EQ(2U, R->getNumValueDataForSite(IPVK_VTableTarget, 1)); + + // First indirect site. + { +uint64_t TotalC; +std::unique_ptr VD = +R->getValueForSite(IPVK_IndirectCallTarget, 0, &TotalC); + +EXPECT_EQ(3U * getProfWeight(), VD[0].Count); minglotus-6 wrote: > Gunit expectations should be written as EXPECT_EQ(actual, expected). I see. Used `EXPECT_EQ(actual, expected)` in new lines, and updated some surrounding existing lines in this style in this PR. Will send out a patch later to update the rest of `EXPECT_EQ` to make the style consistent. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -431,23 +441,34 @@ class InstrProfSymtab { using AddrHashMap = std::vector>; private: + using AddrIntervalMap = + IntervalMap>; StringRef Data; uint64_t Address = 0; - // Unique name strings. + // Unique name strings. Used to ensure entries in MD5NameMap (a vector that's + // going to be sorted) has unique MD5 keys in the first place. StringSet<> NameTab; + // Records the unique virtual table names. This is used by InstrProfWriter to + // write out an on-disk chained hash table of virtual table names. + // InstrProfWriter stores per function profile data (keyed by function names) + // so it doesn't use a StringSet for function names. + StringSet<> VTableNames; // A map from MD5 keys to function name strings. std::vector> MD5NameMap; minglotus-6 wrote: Sure. i'll send out another patch later. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
@@ -0,0 +1,145 @@ +// REQUIRES: lld-available + +// RUN: rm -rf %t && split-file %s %t && cd %t +// +// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling test.cpp -o test +// RUN: env LLVM_PROFILE_FILE=test.profraw ./test minglotus-6 wrote: done. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
minglotus-6 wrote: > I think we should resort to scripts and profraw in LLVM if we don't have > support for textual format. This is the case for memprof for example. Got it. I updated the test cases. Now compiler-rt test provides raw-related test coverage, and IR test provides test coverage for the {show, merge} related with {indexed, text} formats. > It seems cleanest if the allocator and the map are owned by the > InstrProfSymtab class. Agree. With https://github.com/llvm/llvm-project/pull/86882, InstrProfSymtab could own both. Thanks for the discussion! - Added a few [lines](https://github.com/llvm/llvm-project/pull/66825/commits/01585236e7c350830544482c9917320a75e9ac57#diff-f959bf0ff4ce781c36e6eddb19d46fadb78e0c79fa13f7df384bb72f38fcf405) in unit test to have some test coverage for the IntervalMap usage. https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (PR #66825)
minglotus-6 wrote: > My concern with this approach is that compiler-rt is treated as a different > project and updating the code within LLVM makes it easy to miss running the > test locally for the other project. I think such issues will be caught by the > buildbot but having it flagged earlier is better for the developer. What do > you think? Yeah presubmit test coverage makes sense. If it's desirable to get rid of 'update_vtable_value_prof_inputs.sh' and 'vtable-value-prof-basic.profraw' in the repo, here is another way: * For LLVM IR tests, store textual profiles in the repo, and run `llvm-profdata merge` to convert it to indexed profiles. * To have test coverage on raw profiles (generate raw profiles or convert it into other formats), have a compiler-rt test. I wonder if that is preferred over the current status (with script and `.profraw` in the repo). https://github.com/llvm/llvm-project/pull/66825 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits