[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/90934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)
@@ -586,7 +586,9 @@ Error lto::thinBackend(const Config &Conf, unsigned Task, AddStreamFn AddStream, Mod.setPartialSampleProfileRatio(CombinedIndex); LLVM_DEBUG(dbgs() << "Running ThinLTO\n"); - if (Conf.CodeGenOnly) { + if (CodeGenOnly) { teresajohnson wrote: Can you add a comment that this value may be different than the one in the provided Config. https://github.com/llvm/llvm-project/pull/90934 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)
teresajohnson wrote: > > * Looking at the NFC, this seems like it has very similar issues to > > Propeller, which wants to redo just the codegen with a new injected profile > > and BB ordering. It would be good to see if we can converge to similar > > approaches. I asked @rlavaee to take a look and he is reading through the > > background on this work. @rlavaee do you think Propeller could use a > > similar approach to this where it saves the pre-codegen bitcode and > > re-loads it instead of redoing opt? This isn't necessarily an action item > > for this PR, but I wanted Rahman to take a look since he is more familiar > > with codegen. > > It's interesting to know that Propeller wants to redo the codegen. I'm happy > to align with this work. We've already started discussing this and have > shared some details from our side. Here's the link for more info: > https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/11?u=kyulee-com. Great, I hadn't looked at the RFC again and didn't realize Rahman already responded there. > > > * I think this should be doable to implement with distributed ThinLTO if we > > want in the future, from what I can tell. But we'll need a way to save the > > bitcode before codegen from a non-in-process backend (e.g. the thinBackend > > invoked from clang). Not asking you to do anything for this PR on that, but > > just thinking through it. Seems doable... > > I think technically it's doable, but initially, I believed we did not want to > repeat the code generation with distributed ThinLTO unless we were willing to > introduce additional synchronization and spawn distributed ThinLTO backends > again with the merged codegen data. If we aim to leverage the saved optimized > IR, I suppose we need to assign the same backend work to the same machine > used in the first run. As commented above in the link, I wanted to confine > repeating the codegen is for a single machine (In-process backend). I mainly > wanted to separate the writer/reader builds for the use of distributed > ThinLTO builds to avoid this complication while allowing some stale codege > data. However, it's certainly worth experimenting with. Agree it makes sense to do the in-process mode support first as that is more straightforward. > If we aim to leverage the saved optimized IR, I suppose we need to assign the > same backend work to the same machine used in the first run. Not necessarily. The IR just needs to be saved in a path that the build system specifies and can therefore move around. > > > * My biggest concern about this patch as written is whether it will break > > down under LTO's caching mechanism - have you tested it with caching? It > > seems like it might just skip the backend completely since you aren't > > adding anything to the cache key. > > That's a good catch! Assuming this is not a common scenario (as mentioned in > the link above RFC), my initial design intentionally disables LTO's caching > for correctness and simplicity by setting an empty `FileCache()` in the > constructor for both the first and second round backends. To enable proper > caching, we need two additional caches/streams, in addition to one for the > final object output: one for the scratch object files and another for the > optimized IR files. I've managed to implement this with some refactoring, > details of which I will follow. Yeah, there are a few caching aspects. Adding caching of the scratch files and optimized IR being reused here is one aspect that will enable more caching and better build performance for this optimization. I was more concerned about the existing LTO caching and whether that would be broken, but missed that you were previously disabling caching. We use distributed ThinLTO but my understanding is that LTO caching is frequently used for in-process ThinLTO. I'll look at your refactoring and updates as soon as I can. https://github.com/llvm/llvm-project/pull/90933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)
teresajohnson wrote: > > @teresajohnson Do you have any concern or comment on this direction? > > Just a quick reply to say that I am taking a look today (unfamiliar with this > approach so need to read through the RFC etc) and will get back later today Sorry I was obviously over optimistic on when I could do this! A couple of high level comments / questions: - Looking at the NFC, this seems like it has very similar issues to Propeller, which wants to redo just the codegen with a new injected profile and BB ordering. It would be good to see if we can converge to similar approaches. I asked @rlavaee to take a look and he is reading through the background on this work. @rlavaee do you think Propeller could use a similar approach to this where it saves the pre-codegen bitcode and re-loads it instead of redoing opt? This isn't necessarily an action item for this PR, but I wanted Rahman to take a look since he is more familiar with codegen. - I think this should be doable to implement with distributed ThinLTO if we want in the future, from what I can tell. But we'll need a way to save the bitcode before codegen from a non-in-process backend (e.g. the thinBackend invoked from clang). Not asking you to do anything for this PR on that, but just thinking through it. Seems doable... - My biggest concern about this patch as written is whether it will break down under LTO's caching mechanism - have you tested it with caching? It seems like it might just skip the backend completely since you aren't adding anything to the cache key. https://github.com/llvm/llvm-project/pull/90933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)
teresajohnson wrote: > @teresajohnson Do you have any concern or comment on this direction? Just a quick reply to say that I am taking a look today (unfamiliar with this approach so need to read through the RFC etc) and will get back later today https://github.com/llvm/llvm-project/pull/90933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
https://github.com/teresajohnson approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency teresajohnson wrote: "the dependency chain" https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison teresajohnson wrote: The wording "earlier candidate of one icall can affect later function candidate in an undesired way" is a little confusing to me. I think what you mean is that doing so for an earlier candidate delays the comparisons for later candidates, but that for the last candidate, only the fallback path is affected? Do we expect to set this parameter above 1? https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/teresajohnson edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency +// chain for the subsequent candidates. Set its value to 1 for non-last +// candidate and allow option to override it for the last candidate. +int MaxNumVTable = 1; +if (I == CandidateSize - 1) + MaxNumVTable = ICPMaxNumVTableLastCandidate; + +if ((int)Candidate.AddressPoints.size() > MaxNumVTable) { + return false; +} + } + + // If the indirect fallback is not cold, don't compare vtables. + if (PSI && PSI->hasProfileSummary() && + !PSI->isColdCount(RemainingVTableCount)) +return false; teresajohnson wrote: Ditto https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -140,14 +348,56 @@ class IndirectCallPromoter { // indirect callee with functions. Returns true if there are IR // transformations and false otherwise. bool tryToPromoteWithFuncCmp( - CallBase &CB, const std::vector &Candidates, - uint64_t TotalCount, ArrayRef ICallProfDataRef, - uint32_t NumCandidates); + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, uint64_t TotalCount, + ArrayRef ICallProfDataRef, uint32_t NumCandidates, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Promote a list of targets for one indirect call by comparing vtables with + // functions. Returns true if there are IR transformations and false + // otherwise. + bool tryToPromoteWithVTableCmp( + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, + uint64_t TotalFuncCount, uint32_t NumCandidates, + MutableArrayRef ICallProfDataRef, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Returns true if it's profitable to compare vtables for the callsite. + bool isProfitableToCompareVTables( + const CallBase &CB, const std::vector &Candidates, + uint64_t TotalCount); + + // Given an indirect callsite and the list of function candidates, compute + // the following vtable information in output parameters and returns vtable + // pointer if type profiles exist. + // - Populate `VTableGUIDCounts` with using !prof + // metadata attached on the vtable pointer. + // - For each function candidate, finds out the vtables from which it get teresajohnson wrote: s/get/gets/ https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency teresajohnson wrote: s/elongates/elongate/ https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) { return F.getMetadata(getPGOFuncNameMetadataName()); } -void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) { - // Only for internal linkage functions. - if (PGOFuncName == F.getName()) - return; - // Don't create duplicated meta-data. - if (getPGOFuncNameMetadata(F)) +static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName, + StringRef PGOName) { + // For internal linkage objects, its name is not the same as its PGO name. teresajohnson wrote: I think it would be useful to keep the old comment too about this only being for internal linkage functions. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to teresajohnson wrote: s/container/contain/ https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + if (!EnableVTableProfileUse || VPtr == nullptr || + !VPtr->getMetadata(LLVMContext::MD_prof)) +return; + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call. After each candidate +// is promoted, a new basic block is created for the indirect fallback basic +// block and indirect call `CB` is moved into this new BB. +BasicBlock *OriginalBB = CB.getParent(); +promoteCallWithVTableCmp( +CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints, +createBranchWeights(CB.getContext(), Candidate.Count, +TotalFuncCount - Candidate.Count)); + +int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent()); + +ORE.emi
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/teresajohnson approved this pull request. lgtm with some mostly comment related fixes/suggestions https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to +// this vtable, for example, `Base*` can be a pointer to an instantiated type +// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html +static std::optional +getAddressPointOffset(const GlobalVariable &VTableVar, + StringRef CompatibleType) { + SmallVector Types; + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` is used via its `UserInst`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Guaranteed by ICP transformation"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) teresajohnson wrote: It looks like tryToSinkInstruction is called repeatedly for insts all from the same BB, and for the same DestBB - consider hoisting this check out to the caller of tryToSinkInstruction (i.e. into tryToSinkInstructions). https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty()) + continue; + +// After a virtual call candidate gets promoted, update the vtable's counts +// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents +// a vtable from which the virtual call is loaded. Compute the sum and use +// 128-bit APInt to improve accuracy. +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + if (!EnableVTableProfileUse || VPtr == nullptr || + !VPtr->getMetadata(LLVMContext::MD_prof)) +return; + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + + for (const auto &Candidate : Candidates) { +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableGUIDCounts[GUID] -= Count; + +// 'OriginalBB' is the basic block of indirect call. After each candidate +// is promoted, a new basic block is created for the indirect fallback basic +// block and indirect call `CB` is moved into this new BB. +BasicBlock *OriginalBB = CB.getParent(); +promoteCallWithVTableCmp( +CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints, +createBranchWeights(CB.getContext(), Candidate.Count, +TotalFuncCount - Candidate.Count)); + +int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent()); + +ORE.emi
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const CallBase &CB, const std::vector &Candidates, +uint64_t TotalCount) { + if (!EnableVTableProfileUse || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + const size_t CandidateSize = Candidates.size(); + for (size_t I = 0; I < CandidateSize; I++) { +auto &Candidate = Candidates[I]; +uint64_t CandidateVTableCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + CandidateVTableCount += Count; + +if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) { + LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I +<< "-th candidate, function count " << Candidate.Count +<< " and its vtable count " << CandidateVTableCount +<< " have discrepancies\n"); + return false; +} + +RemainingVTableCount -= Candidate.Count; + +// 'MaxNumVTable' limits the number of vtables to make vtable comparison +// profitable. Comparing multiple vtables for one function candidate will +// insert additional instructions on the hot path, and allowing more than +// one vtable for non last candidates may or may not elongates dependency +// chain for the subsequent candidates. Set its value to 1 for non-last +// candidate and allow option to override it for the last candidate. +int MaxNumVTable = 1; +if (I == CandidateSize - 1) + MaxNumVTable = ICPMaxNumVTableLastCandidate; + +if ((int)Candidate.AddressPoints.size() > MaxNumVTable) { teresajohnson wrote: Might be useful to have a debug or missed optimization message for this case? https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) { return F.getMetadata(getPGOFuncNameMetadataName()); } -void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) { - // Only for internal linkage functions. - if (PGOFuncName == F.getName()) - return; - // Don't create duplicated meta-data. - if (getPGOFuncNameMetadata(F)) +static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName, + StringRef PGOName) { + // For internal linkage objects, its name is not the same as its PGO name. + if (GO.getName() == PGOName) return; - LLVMContext &C = F.getContext(); - MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName)); - F.setMetadata(getPGOFuncNameMetadataName(), N); + + // Don't created duplicated metadata. teresajohnson wrote: s/created/create/ https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,27 +112,226 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt ICPVTablePercentageThreshold( +"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, +cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison +// for the last function candidate and use the option below to cap the number +// of vtables. +static cl::opt ICPMaxNumVTableLastCandidate( +"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden, +cl::desc("The maximum number of vtable for the last candidate.")); + namespace { +// The key is a vtable global variable, and the value is a map. +// In the inner map, the key represents address point offsets and the value is a +// constant for this address point. +using VTableAddressPointOffsetValMap = +SmallDenseMap>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// The key is vtable GUID, and value is its value profile count. +using VTableGUIDCountsMap = SmallDenseMap; + +// Returns the address point offset of the given compatible type. +// +// Type metadata of a vtable specifies the types that can container a pointer to +// this vtable, for example, `Base*` can be a pointer to an instantiated type teresajohnson wrote: I think "instantiated type" should be "derived type" https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -20,25 +20,25 @@ CHECK-NEXT: - CHECK: Records: CHECK-NEXT: - -CHECK-NEXT:FunctionGUID: 15505678318020221912 +CHECK-NEXT:FunctionGUID: 3873612792189045660 CHECK-NEXT:AllocSites: CHECK-NEXT:- CHECK-NEXT: Callstack: CHECK-NEXT: - -CHECK-NEXT:Function: 15505678318020221912 -CHECK-NEXT:SymbolName: qux +CHECK-NEXT:Function: 3873612792189045660 +CHECK-NEXT:SymbolName: _Z3quxi teresajohnson wrote: Not a problem, just struck me as odd since your patch shouldn't change. But like you said, probably some prior change was made that didn't require updating this test to get it to pass, and now those changes are showing up. https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -96,19 +102,63 @@ llvm::SmallVector readSegmentEntries(const char *Ptr) { } llvm::SmallVector> -readMemInfoBlocks(const char *Ptr) { +readMemInfoBlocksV3(const char *Ptr) { using namespace support; const uint64_t NumItemsToRead = - endian::readNext(Ptr); + endian::readNext(Ptr); + llvm::SmallVector> Items; for (uint64_t I = 0; I < NumItemsToRead; I++) { const uint64_t Id = -endian::readNext(Ptr); -const MemInfoBlock MIB = *reinterpret_cast(Ptr); +endian::readNext(Ptr); + +// We cheat a bit here and remove the const from cast to set the +// Histogram Pointer to newly allocated buffer. We also cheat, since V3 and +// V4 do not have the same fields. V3 is missing AccessHistogramSize and +// AccessHistogram. This means we read "dirty" data in here, but it should +// not segfault, since there will be callstack data placed after this in the +// binary format. +MemInfoBlock MIB = *reinterpret_cast(Ptr); +// Overwrite dirty data. teresajohnson wrote: Oh yes you are right, I missed the first `*` which means it is making a copy of what was in the memory pointed to by Ptr. https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) { return count; } +// Accumulates the access count from the shadow for the given pointer and size. +// See memprof_mapping.h for an overview on histogram counters. +u64 GetShadowCountHistogram(uptr p, u32 size) { + u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p); + u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size); + u64 count = 0; + for (; shadow <= shadow_end; shadow++) +count += *shadow; + return count; +} + +// If we use the normal approach from clearCountersWithoutHistogram, the +// histogram will clear too much data and may overwrite shadow counters that are +// in use. Likely because of rounding up the shadow_end pointer. +// See memprof_mapping.h for an overview on histogram counters. +void clearCountersHistogram(uptr addr, uptr size) { + u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr); + u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size); + for (; shadow_8 < shadow_end_8; shadow_8++) { +*shadow_8 = 0; + } +} + +void clearCountersWithoutHistogram(uptr addr, uptr size) { + uptr shadow_beg = MEM_TO_SHADOW(addr); + uptr shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1; + REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg); +} + // Clears the shadow counters (when memory is allocated). void ClearShadow(uptr addr, uptr size) { teresajohnson wrote: I guess it works, because the shadow granularities are less than the page size, and because they are both scaling by the same 8 to 1 scale, and also I guess because the clear_shadow_mmap_threshold is an optimization and doesn't need to be exact. However, it still feels a little wonky to me (and also means that we have to do extra mapping operations here and again in `clearCounters*`). I do think I would prefer to have it be something like: ``` uptr shadow_beg; uptr shadow_end; if (__memprof_histogram) { shadow_beg = HISTOGRAM_MEM_TO_SHADOW(addr); shadow_end = HISTOGRAM_MEM_TO_SHADOW(addr + size); } else { shadow_beg = MEM_TO_SHADOW(addr); shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1; } if (shadow_end - shadow_beg < common_flags()->clear_shadow_mmap_threshold) { REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg); } else { ... ``` I.e. set shadow_beg/end based on whether we are doing the histogramming or not, leave the rest as-is. Would that work? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -0,0 +1,101 @@ +REQUIRES: x86_64-linux + +This is a copy of memprof-basict.test with slight changes to check that we can still read v3 of memprofraw. + +To update the inputs used below run Inputs/update_memprof_inputs.sh /path/to/updated/clang teresajohnson wrote: I think this comment is wrong - we can't update the v3 version, is that correct? Nor would we want to. https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -38,4 +38,5 @@ MEMPROF_FLAG(bool, allocator_frees_and_returns_null_on_realloc_zero, true, MEMPROF_FLAG(bool, print_text, false, "If set, prints the heap profile in text format. Else use the raw binary serialization format.") MEMPROF_FLAG(bool, print_terse, false, - "If set, prints memory profile in a terse format. Only applicable if print_text = true.") + "If set, prints memory profile in a terse format. Only applicable " teresajohnson wrote: Formatting change only, remove from patch https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -205,8 +205,14 @@ class RawMemProfReader final : public MemProfReader { object::SectionedAddress getModuleOffset(uint64_t VirtualAddress); + llvm::SmallVector> + readMemInfoBlocks(const char *Ptr); + // The profiled binary. object::OwningBinary Binary; + // Version of raw memprof binary currently being read. Defaults to most update teresajohnson wrote: typo: s/update to date/up to date/ https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -610,13 +670,33 @@ RawMemProfReader::peekBuildIds(MemoryBuffer *DataBuffer) { return BuildIds.takeVector(); } +// FIXME: Add a schema for serializing similiar to IndexedMemprofReader. This +// will help being able to deserialize different versions raw memprof versions +// more easily. +llvm::SmallVector> +RawMemProfReader::readMemInfoBlocks(const char *Ptr) { + if (MemprofRawVersion == 3ULL) { +errs() << "Reading V3\n"; teresajohnson wrote: Once you do that, the braces can be removed from all of the if else conditions here which have single statement bodies https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -610,13 +670,33 @@ RawMemProfReader::peekBuildIds(MemoryBuffer *DataBuffer) { return BuildIds.takeVector(); } +// FIXME: Add a schema for serializing similiar to IndexedMemprofReader. This +// will help being able to deserialize different versions raw memprof versions +// more easily. +llvm::SmallVector> +RawMemProfReader::readMemInfoBlocks(const char *Ptr) { + if (MemprofRawVersion == 3ULL) { +errs() << "Reading V3\n"; teresajohnson wrote: Leftover debugging message that should be removed? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -0,0 +1,101 @@ +REQUIRES: x86_64-linux + +This is a copy of memprof-basict.test with slight changes to check that we can still read v3 of memprofraw. teresajohnson wrote: typo: basict https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) { return count; } +// Accumulates the access count from the shadow for the given pointer and size. +// See memprof_mapping.h for an overview on histogram counters. +u64 GetShadowCountHistogram(uptr p, u32 size) { + u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p); + u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size); + u64 count = 0; + for (; shadow <= shadow_end; shadow++) +count += *shadow; + return count; +} + +// If we use the normal approach from clearCountersWithoutHistogram, the +// histogram will clear too much data and may overwrite shadow counters that are +// in use. Likely because of rounding up the shadow_end pointer. +// See memprof_mapping.h for an overview on histogram counters. +void clearCountersHistogram(uptr addr, uptr size) { + u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr); + u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size); + for (; shadow_8 < shadow_end_8; shadow_8++) { +*shadow_8 = 0; + } +} + +void clearCountersWithoutHistogram(uptr addr, uptr size) { + uptr shadow_beg = MEM_TO_SHADOW(addr); + uptr shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1; + REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg); +} + // Clears the shadow counters (when memory is allocated). void ClearShadow(uptr addr, uptr size) { teresajohnson wrote: Looking at this function more, there are a lot of uses of the MEM_TO_SHADOW computed values, which is not the right mapping function to use with the smaller granularity of the histogram case. I think you probably need to version the calls to MEM_TO_SHADOW here, then all of the rest of the code can work as is? I.e. you wouldn't need the 2 versions of `clearCounters*` https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -124,6 +124,13 @@ struct PortableMemInfoBlock { OS << "" << #Name << ": " << Name << "\n"; #include "llvm/ProfileData/MIBEntryDef.inc" #undef MIBEntryDef +if (AccessHistogramSize > 0) { + OS << "" << "AccessHistogramValues" << ":"; + for (uint32_t I = 0; I < AccessHistogramSize; ++I) { +OS << " -" << ((uint64_t *)AccessHistogram)[I]; teresajohnson wrote: Why the " -" in the outputs? I noticed when looking at the text that they look like negative values as a result. Any reason not to just delimit with the space? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -508,7 +519,26 @@ void createProfileFileNameVar(Module &M) { } } +// Set MemprofHistogramFlag as a Global veriable in IR. This makes it accessible +// to teresajohnson wrote: nit: fix line wrapping https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -96,19 +102,63 @@ llvm::SmallVector readSegmentEntries(const char *Ptr) { } llvm::SmallVector> -readMemInfoBlocks(const char *Ptr) { +readMemInfoBlocksV3(const char *Ptr) { using namespace support; const uint64_t NumItemsToRead = - endian::readNext(Ptr); + endian::readNext(Ptr); + llvm::SmallVector> Items; for (uint64_t I = 0; I < NumItemsToRead; I++) { const uint64_t Id = -endian::readNext(Ptr); -const MemInfoBlock MIB = *reinterpret_cast(Ptr); +endian::readNext(Ptr); + +// We cheat a bit here and remove the const from cast to set the +// Histogram Pointer to newly allocated buffer. We also cheat, since V3 and +// V4 do not have the same fields. V3 is missing AccessHistogramSize and +// AccessHistogram. This means we read "dirty" data in here, but it should +// not segfault, since there will be callstack data placed after this in the +// binary format. +MemInfoBlock MIB = *reinterpret_cast(Ptr); +// Overwrite dirty data. teresajohnson wrote: Isn't this going to overwrite some callstack data? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -20,25 +20,25 @@ CHECK-NEXT: - CHECK: Records: CHECK-NEXT: - -CHECK-NEXT:FunctionGUID: 15505678318020221912 +CHECK-NEXT:FunctionGUID: 3873612792189045660 CHECK-NEXT:AllocSites: CHECK-NEXT:- CHECK-NEXT: Callstack: CHECK-NEXT: - -CHECK-NEXT:Function: 15505678318020221912 -CHECK-NEXT:SymbolName: qux +CHECK-NEXT:Function: 3873612792189045660 +CHECK-NEXT:SymbolName: _Z3quxi teresajohnson wrote: Why did the function symbol names change with your patch? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -508,7 +519,26 @@ void createProfileFileNameVar(Module &M) { } } +// Set MemprofHistogramFlag as a Global veriable in IR. This makes it accessible +// to +// the runtime, changing shadow count behavior. +void createMemprofHistogramFlagVar(Module &M) { + const StringRef VarName(MemProfHistogramFlagVar); + Type *IntTy1 = Type::getInt1Ty(M.getContext()); + auto MemprofHistogramFlag = new GlobalVariable( + M, IntTy1, true, GlobalValue::WeakAnyLinkage, + Constant::getIntegerValue(IntTy1, APInt(1, ClHistogram)), VarName); + // MemprofHistogramFlag->setVisibility(GlobalValue::HiddenVisibility); teresajohnson wrote: remove? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)
@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) { return count; } +// Accumulates the access count from the shadow for the given pointer and size. +// See memprof_mapping.h for an overview on histogram counters. +u64 GetShadowCountHistogram(uptr p, u32 size) { + u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p); + u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size); + u64 count = 0; + for (; shadow <= shadow_end; shadow++) +count += *shadow; + return count; +} + +// If we use the normal approach from clearCountersWithoutHistogram, the +// histogram will clear too much data and may overwrite shadow counters that are +// in use. Likely because of rounding up the shadow_end pointer. +// See memprof_mapping.h for an overview on histogram counters. +void clearCountersHistogram(uptr addr, uptr size) { + u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr); + u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size); + for (; shadow_8 < shadow_end_8; shadow_8++) { teresajohnson wrote: Why not use REAL(memset) like the non-histogram version below? https://github.com/llvm/llvm-project/pull/94264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -140,14 +337,51 @@ class IndirectCallPromoter { // indirect callee with functions. Returns true if there are IR // transformations and false otherwise. bool tryToPromoteWithFuncCmp( - CallBase &CB, const std::vector &Candidates, - uint64_t TotalCount, ArrayRef ICallProfDataRef, - uint32_t NumCandidates); + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, uint64_t TotalCount, + ArrayRef ICallProfDataRef, uint32_t NumCandidates, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Promote a list of targets for one indirect call by comparing vtables with + // functions. Returns true if there are IR transformations and false + // otherwise. + bool tryToPromoteWithVTableCmp( + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, + uint64_t TotalFuncCount, uint32_t NumCandidates, + MutableArrayRef ICallProfDataRef, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Returns true if it's profitable to compare vtables. + bool isProfitableToCompareVTables( + const std::vector &Candidates, uint64_t TotalCount); + + // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each teresajohnson wrote: should that be "for each candidate with vtable information"? https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -321,14 +746,127 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) { if (!NumCandidates || (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount))) continue; + auto PromotionCandidates = getPromotionCandidatesForCallSite( *CB, ICallProfDataRef, TotalCount, NumCandidates); -Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount, - ICallProfDataRef, NumCandidates); + +VTableGUIDCountsMap VTableGUIDCounts; +Instruction *VPtr = +computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates); + +if (isProfitableToCompareVTables(PromotionCandidates, TotalCount)) + Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates, + TotalCount, NumCandidates, + ICallProfDataRef, VTableGUIDCounts); +else + Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates, + TotalCount, ICallProfDataRef, + NumCandidates, VTableGUIDCounts); } return Changed; } +// TODO: Returns false if the function addressing and vtable load instructions +// cannot sink to indirect fallback. +bool IndirectCallPromoter::isProfitableToCompareVTables( +const std::vector &Candidates, uint64_t TotalCount) { + if (!ICPEnableVTableCmp || Candidates.empty()) +return false; + uint64_t RemainingVTableCount = TotalCount; + for (size_t I = 0; I < Candidates.size(); I++) { +auto &Candidate = Candidates[I]; +uint64_t VTableSumCount = 0; +for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts) + VTableSumCount += Count; + +if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage) + return false; + +RemainingVTableCount -= Candidate.Count; + +int NumAdditionalVTable = 0; +if (I == Candidates.size() - 1) + NumAdditionalVTable = ICPNumAdditionalVTableLast; + +int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1; +if (ActualNumAdditionalInst > NumAdditionalVTable) { + return false; +} + } + + // If the indirect fallback is not cold, don't compare vtables. + if (PSI && PSI->hasProfileSummary() && + !PSI->isColdCount(RemainingVTableCount)) +return false; + + return true; +} + +static void +computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM, + VirtualCallSiteTypeInfoMap &VirtualCSInfo) { + auto &FAM = MAM.getResult(M).getManager(); + auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & { +return FAM.getResult(F); + }; + + auto compute = [&](Function *Func) { +if (!Func || Func->use_empty()) + return; +// Iterate all type.test calls and find all indirect calls. +// TODO: Add llvm.public.type.test +for (Use &U : llvm::make_early_inc_range(Func->uses())) { + auto *CI = dyn_cast(U.getUser()); + if (!CI) +continue; + auto *TypeMDVal = cast(CI->getArgOperand(1)); + if (!TypeMDVal) +continue; + auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata()); + if (!CompatibleTypeId) +continue; + + // Find out all devirtualizable call sites given a llvm.type.test + // intrinsic call. + SmallVector DevirtCalls; + SmallVector Assumes; + auto &DT = LookupDomTree(*CI->getFunction()); + findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT); + + // type-id, offset from the address point + // combined with type metadata to compute function offset + for (auto &DevirtCall : DevirtCalls) { +CallBase &CB = DevirtCall.CB; +// Given an indirect call, try find the instruction which loads a +// pointer to virtual table. +Instruction *VTablePtr = +PGOIndirectCallVisitor::tryGetVTableInstruction(&CB); +if (!VTablePtr) + continue; +VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr, + CompatibleTypeId->getString()}; + } +} + }; + + // Right now only llvm.type.test is used to find out virtual call sites. + // With ThinLTO and whole-program-devirtualization, llvm.type.test and + // llvm.public.type.test are emitted, and llvm.public.type.test is either + // refined to llvm.type.test or dropped before indirect-call-promotion pass. teresajohnson wrote: Do we then need to do the below analysis on llvm.public.type.test? ICP should be after it gets converted or dropped. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = teresajohnson wrote: Add comment describing the contents https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -140,14 +337,51 @@ class IndirectCallPromoter { // indirect callee with functions. Returns true if there are IR // transformations and false otherwise. bool tryToPromoteWithFuncCmp( - CallBase &CB, const std::vector &Candidates, - uint64_t TotalCount, ArrayRef ICallProfDataRef, - uint32_t NumCandidates); + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, uint64_t TotalCount, + ArrayRef ICallProfDataRef, uint32_t NumCandidates, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Promote a list of targets for one indirect call by comparing vtables with + // functions. Returns true if there are IR transformations and false + // otherwise. + bool tryToPromoteWithVTableCmp( + CallBase &CB, Instruction *VPtr, + const std::vector &Candidates, + uint64_t TotalFuncCount, uint32_t NumCandidates, + MutableArrayRef ICallProfDataRef, + VTableGUIDCountsMap &VTableGUIDCounts); + + // Returns true if it's profitable to compare vtables. + bool isProfitableToCompareVTables( + const std::vector &Candidates, uint64_t TotalCount); + + // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each teresajohnson wrote: "with vtable GUIDs and..." ? (i.e. missing "and"?) https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -341,6 +879,17 @@ static bool promoteIndirectCalls(Module &M, ProfileSummaryInfo *PSI, bool InLTO, return false; } bool Changed = false; + VirtualCallSiteTypeInfoMap VirtualCSInfo; + + computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo); + + // This map records states across functions in an LLVM IR module. teresajohnson wrote: What do you mean by "records states"? https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,222 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) teresajohnson wrote: What you are getting here is the address point offset of the given compatible type within the given vtable. See also https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata. It might be good to add a comment describing this more explicitly. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty()) + continue; + +// Update VTableGUIDCounts +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { teresajohnson wrote: Can you add a comment describing the intuition behind the way the counts are being updated here https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, Function *DirectCallee, // Promote indirect-call to conditional direct-call for one callsite. bool IndirectCallPromoter::tryToPromoteWithFuncCmp( -CallBase &CB, const std::vector &Candidates, -uint64_t TotalCount, ArrayRef ICallProfDataRef, -uint32_t NumCandidates) { +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalCount, +ArrayRef ICallProfDataRef, uint32_t NumCandidates, +VTableGUIDCountsMap &VTableGUIDCounts) { uint32_t NumPromoted = 0; for (const auto &C : Candidates) { -uint64_t Count = C.Count; -pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO, - &ORE); -assert(TotalCount >= Count); -TotalCount -= Count; +uint64_t FuncCount = C.Count; +pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount, + SamplePGO, &ORE); +assert(TotalCount >= FuncCount); +TotalCount -= FuncCount; NumOfPGOICallPromotion++; NumPromoted++; - } +if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty()) + continue; + +// Update VTableGUIDCounts +uint64_t SumVTableCount = 0; +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) + SumVTableCount += VTableCount; + +for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { + APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/); + APFuncCount *= VTableCount; + VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue(); +} + } if (NumPromoted == 0) return false; - // Adjust the MD.prof metadata. First delete the old one. - CB.setMetadata(LLVMContext::MD_prof, nullptr); - assert(NumPromoted <= ICallProfDataRef.size() && "Number of promoted functions should not be greater than the number " "of values in profile metadata"); + + // Update value profiles on the indirect call. + // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers` + // is enabled and a vtable is used to load multiple virtual functions. + updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount, + NumCandidates); + // Update value profiles on the vtable pointer if it exists. + if (VPtr) +updateVPtrValueProfiles(VPtr, VTableGUIDCounts); + return true; +} + +void IndirectCallPromoter::updateFuncValueProfiles( +CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount, +uint32_t MaxMDCount) { + // First clear the existing !prof. + CB.setMetadata(LLVMContext::MD_prof, nullptr); // Annotate the remaining value profiles if counter is not zero. if (TotalCount != 0) -annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted), - TotalCount, IPVK_IndirectCallTarget, NumCandidates); +annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget, + MaxMDCount); +} + +void IndirectCallPromoter::updateVPtrValueProfiles( +Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) { + VPtr->setMetadata(LLVMContext::MD_prof, nullptr); + std::vector VTableValueProfiles; + uint64_t TotalVTableCount = 0; + for (auto [GUID, Count] : VTableGUIDCounts) { +if (Count == 0) + continue; + +VTableValueProfiles.push_back({GUID, Count}); +TotalVTableCount += Count; + } + llvm::sort(VTableValueProfiles, + [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) { + return LHS.Count > RHS.Count; + }); + + annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount, +IPVK_VTableTarget, VTableValueProfiles.size()); +} + +bool IndirectCallPromoter::tryToPromoteWithVTableCmp( +CallBase &CB, Instruction *VPtr, +const std::vector &Candidates, uint64_t TotalFuncCount, +uint32_t NumCandidates, +MutableArrayRef ICallProfDataRef, +VTableGUIDCountsMap &VTableGUIDCounts) { + SmallVector PromotedFuncCount; + for (const auto &Candidate : Candidates) { +uint64_t IfCount = 0; teresajohnson wrote: The name IfCount isn't very intuitive to me, suggest using a different name. https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/teresajohnson commented: I haven't looked at the tests yet, but some comments/questions from a first pass through most of the code https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
https://github.com/teresajohnson edited https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. teresajohnson wrote: I don't quite follow the comment, can you expand it a bit? https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt teresajohnson wrote: Add comment https://github.com/llvm/llvm-project/pull/81442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)
@@ -103,30 +110,220 @@ static cl::opt ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// This option is meant to be used by LLVM regression test and test the +// transformation that compares vtables. +static cl::opt ICPEnableVTableCmp( +"icp-enable-vtable-cmp", cl::init(false), cl::Hidden, +cl::desc("If ThinLTO and WPD is enabled and this option is true, " + "indirect-call promotion pass will compare vtables rather than " + "functions for speculative devirtualization of virtual calls." + " If set to false, indirect-call promotion pass will always " + "compare functions.")); + +static cl::opt +ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99), + cl::Hidden, + cl::desc("Percentage of vtable count to compare")); + +static cl::opt ICPNumAdditionalVTableLast( +"icp-num-additional-vtable-last", cl::init(0), cl::Hidden, +cl::desc("The number of additional instruction for the last candidate")); + namespace { +using VTableAddressPointOffsetValMap = +SmallDenseMap, 8>; + +// A struct to collect type information for a virtual call site. +struct VirtualCallSiteInfo { + // The offset from the address point to virtual function in the vtable. + uint64_t FunctionOffset; + // The instruction that computes the address point of vtable. + Instruction *VPtr; + // The compatible type used in LLVM type intrinsics. + StringRef CompatibleTypeStr; +}; + +// The key is a virtual call, and value is its type information. +using VirtualCallSiteTypeInfoMap = +SmallDenseMap; + +// Find the offset where type string is `CompatibleType`. +static std::optional +getCompatibleTypeOffset(const GlobalVariable &VTableVar, +StringRef CompatibleType) { + SmallVector Types; // type metadata associated with a vtable. + VTableVar.getMetadata(LLVMContext::MD_type, Types); + + for (MDNode *Type : Types) +if (auto *TypeId = dyn_cast(Type->getOperand(1).get()); +TypeId && TypeId->getString() == CompatibleType) + + return cast( + cast(Type->getOperand(0))->getValue()) + ->getZExtValue(); + + return std::nullopt; +} + +// Returns a constant representing the vtable's address point specified by the +// offset. +static Constant *getVTableAddressPointOffset(GlobalVariable *VTable, + uint32_t AddressPointOffset) { + Module &M = *VTable->getParent(); + LLVMContext &Context = M.getContext(); + assert(AddressPointOffset < + M.getDataLayout().getTypeAllocSize(VTable->getValueType()) && + "Out-of-bound access"); + + return ConstantExpr::getInBoundsGetElementPtr( + Type::getInt8Ty(Context), VTable, + llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); +} + +// Returns the basic block in which `Inst` by `Use`. +static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) { + if (PHINode *PN = dyn_cast(UserInst)) +return PN->getIncomingBlock(U); + + return UserInst->getParent(); +} + +// `DestBB` is a suitable basic block to sink `Inst` into when the following +// conditions are true: +// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB` +//is dominated by `Inst->getParent()` and we don't need to sink across a +//critical edge. +// 2) `Inst` have users and all users are in `DestBB`. +static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) { + BasicBlock *BB = Inst->getParent(); + assert(Inst->getParent() != DestBB && + BB->getTerminator()->getNumSuccessors() == 2 && + "Caller should guarantee"); + // Do not sink across a critical edge for simplicity. + if (DestBB->getUniquePredecessor() != BB) +return false; + + // Now we know BB dominates DestBB. + BasicBlock *UserBB = nullptr; + for (Use &Use : Inst->uses()) { +User *User = Use.getUser(); +// Do checked cast since IR verifier guarantees that the user of an +// instruction must be an instruction. See `Verifier::visitInstruction`. +Instruction *UserInst = cast(User); +// We can sink debug or pseudo instructions together with Inst. +if (UserInst->isDebugOrPseudoInst()) + continue; +UserBB = getUserBasicBlock(Use, UserInst); +// Do not sink if Inst is used in a basic block that is not DestBB. +// TODO: Sink to the common dominator of all user blocks. +if (UserBB != DestBB) + return false; + } + return UserBB != nullptr; +} + +// For the virtual call dispatch sequence, try to sink vtable load instructions +// to the cold indirect call fallback. +static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { + assert(!I->isTerminator()); + if (!isDestBBSuitableForSink(I, DestBlock)) +return false; + + assert(DestBlock->getUniquePre
[clang] [clang-tools-extra] [compiler-rt] [lldb] [llvm] [mlir] [openmp] [polly] fix(python): fix comparison to None (PR #91857)
https://github.com/teresajohnson approved this pull request. compiler-rt changes lgtm https://github.com/llvm/llvm-project/pull/91857 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)
@@ -1036,7 +1041,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline( if (!actionRequiresCodeGen(Action) && CodeGenOpts.VerifyModule) MPM.addPass(VerifierPass()); - if (Action == Backend_EmitBC || Action == Backend_EmitLL) { + if (Action == Backend_EmitBC || Action == Backend_EmitLL || + CodeGenOpts.FatLTO) { teresajohnson wrote: I see, I guess then the check for the CodeGenOpt is needed here to catch this case. https://github.com/llvm/llvm-project/pull/83159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/83159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)
https://github.com/teresajohnson edited https://github.com/llvm/llvm-project/pull/83159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)
@@ -1036,7 +1041,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline( if (!actionRequiresCodeGen(Action) && CodeGenOpts.VerifyModule) MPM.addPass(VerifierPass()); - if (Action == Backend_EmitBC || Action == Backend_EmitLL) { + if (Action == Backend_EmitBC || Action == Backend_EmitLL || + CodeGenOpts.FatLTO) { teresajohnson wrote: What is the Action type with FatLTO? https://github.com/llvm/llvm-project/pull/83159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LTO] Fix fat-lto output for -c -emit-llvm. (PR #79404)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/79404 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [libc] [clang] [lld] [clang-tools-extra] [flang] [compiler-rt] [llvm] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/78835 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang-tools-extra] [llvm] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)
@@ -352,32 +357,49 @@ std::vector BitcodeCompiler::compile() { pruneCache(config->thinLTOCacheDir, config->thinLTOCachePolicy, files); if (!config->ltoObjPath.empty()) { -saveBuffer(buf[0], config->ltoObjPath); +saveBuffer(buf[0].second, config->ltoObjPath); for (unsigned i = 1; i != maxTasks; ++i) - saveBuffer(buf[i], config->ltoObjPath + Twine(i)); - } - - if (config->saveTempsArgs.contains("prelink")) { -if (!buf[0].empty()) - saveBuffer(buf[0], config->outputFile + ".lto.o"); -for (unsigned i = 1; i != maxTasks; ++i) - saveBuffer(buf[i], config->outputFile + Twine(i) + ".lto.o"); - } - - if (config->ltoEmitAsm) { -saveBuffer(buf[0], config->outputFile); -for (unsigned i = 1; i != maxTasks; ++i) - saveBuffer(buf[i], config->outputFile + Twine(i)); -return {}; + saveBuffer(buf[i].second, config->ltoObjPath + Twine(i)); } + bool savePrelink = config->saveTempsArgs.contains("prelink"); std::vector ret; - for (unsigned i = 0; i != maxTasks; ++i) -if (!buf[i].empty()) - ret.push_back(createObjFile(MemoryBufferRef(buf[i], "lto.tmp"))); + const char *ext = config->ltoEmitAsm ? ".s" : ".o"; + for (unsigned i = 0; i != maxTasks; ++i) { +StringRef bitcodeFilePath; +StringRef objBuf; +if (files[i]) { teresajohnson wrote: Thanks, those are helpful. Can you also add a comment before 373 on meaning of files[i] being null vs not - reading through the comments you added it sounds like files[i] is non-null when object file was found in the cache? https://github.com/llvm/llvm-project/pull/78835 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang-tools-extra] [llvm] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)
@@ -53,10 +53,10 @@ ; RUN: rm -fr cache && mkdir cache ; RUN: ld.lld --thinlto-cache-dir=cache --save-temps -o out b.bc a.bc -M | FileCheck %s --check-prefix=MAP -; RUN: ls out1.lto.o a.bc.0.preopt.bc b.bc.0.preopt.bc +; RUN: ls out.lto.a.o a.bc.0.preopt.bc b.bc.0.preopt.bc -; MAP: llvmcache-{{.*}}:(.text) -; MAP: llvmcache-{{.*}}:(.text) +; MAP: out.lto.b.o:(.text) teresajohnson wrote: Confused about what changed in the latest commit to cause these lines to need updating - was this just a missed test update in the prior commit? https://github.com/llvm/llvm-project/pull/78835 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [lld] [clang-tools-extra] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)
@@ -46,8 +46,9 @@ class BitcodeCompiler { private: std::unique_ptr ltoObj; - std::vector> buf; + SmallVector>, 0> buf; teresajohnson wrote: ping on this comment. https://github.com/llvm/llvm-project/pull/78835 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] [compiler-rt] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)
teresajohnson wrote: > I realized one problem during testing IRPGO (thanks again for the suggestion > @minglotus-6 !). > > A function's control flow may change between `-fprofile-generate` and > `-fprofile-use` when we make use of definitions in the new header. For > example, one may have the following code: > > ```c > #include "profile/instr_prof_interface.h" > > void main() { >... > if (__llvm_profile_dump()) > return error; > > cleanup(); > } > ``` > > During `-fprofile-generate`, `__llvm_profile_dump` is a declared name and > main's control flow includes a branch to `return error;`. During > `-fprofile-use`, `__llvm_profile_dump()` is replaced by `(0)` and the > frontend eliminates the `if` statement and the branch to `return error`. Such > control flow change can lead to PGO warnings (hash mismatch). > > I think it may be OK to keep the PR this way because the new macros can > potentially cause control flow changes directly as well. The documentation is > updated > (https://github.com/llvm/llvm-project/pull/76471/files#diff-7389be311daf0b9b476c876bef04245fa3c0ad9337ce865682174bd77d53b648R2908) > to advise against using these APIs in a program's hot regions and warn about > possible impact on control flow. > > Do you all think this is reasonable? That's probably ok, as it doesn't make sense to do dumping etc in a hot region of code anyway. An alternate suggestion is to make these functions real functions that simply return 0 instead of #defined to 0. But that may not avoid the issue described above because early inlining will likely inline and simplify the code before IR PGO matching anyway. https://github.com/llvm/llvm-project/pull/76471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [compiler-rt] [clang] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)
teresajohnson wrote: > @teresajohnson I mentioned the same thing on > [discourse](https://discourse.llvm.org/t/pgo-are-the-llvm-profile-functions-stable-c-apis-across-llvm-releases/75832/5) > but it seems like linking on AIX does not support this model. I see. Perhaps instead of defining these away completely, they should be defined to a dummy function with the same signature that always returns success (0 I think)? Alternatively, not define them at all when __LLVM_INSTR_PROFILE_GENERATE not defined and have the user guard calls by `#ifdef __LLVM_INSTR_PROFILE_GENERATE`. Otherwise, the user either cannot check the return values, or they have to guard their usage by a check of __LLVM_INSTR_PROFILE_GENERATE anyway. https://github.com/llvm/llvm-project/pull/76471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang] [clang-tools-extra] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)
teresajohnson wrote: The way we have done this in the past is to declare these as weak symbols and check if they exist before calling. E.g.: ``` extern "C" __attribute__((weak)) int __llvm_profile_dump(void); if (__llvm_profile_dump) if (__llvm_profile_dump() != 0) { ... ``` Not necessarily opposed to adding the macros etc, but the advantage of the weak symbol approach vs what you have here is that the user code can check the return values. https://github.com/llvm/llvm-project/pull/76471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] DiagnosticHandler: refactor error checking (PR #75889)
https://github.com/teresajohnson approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/75889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] DiagnosticHandler: refactor error checking (PR #75889)
@@ -256,10 +256,13 @@ void LLVMContext::diagnose(const DiagnosticInfo &DI) { RS->emit(*OptDiagBase); // If there is a report handler, use it. - if (pImpl->DiagHandler && - (!pImpl->RespectDiagnosticFilters || isDiagnosticEnabled(DI)) && - pImpl->DiagHandler->handleDiagnostics(DI)) -return; + if (pImpl->DiagHandler) { +if (DI.getSeverity() == DS_Error) + pImpl->DiagHandler->HasErrors = true; teresajohnson wrote: I didn't mean calling handle Diagnostics in more places. I just meant rather than directly setting the HasErrors flags here, do that in a new non-virtual method (e.g. prepareToHandleDiagnostics() or whatever), and call it here just before calling handle Diagnostics. To abstract away what it is actually doing and leave the setting of the flag to the DiagnosticHandler class. https://github.com/llvm/llvm-project/pull/75889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] DiagnosticHandler: refactor error checking (PR #75889)
@@ -256,10 +256,13 @@ void LLVMContext::diagnose(const DiagnosticInfo &DI) { RS->emit(*OptDiagBase); // If there is a report handler, use it. - if (pImpl->DiagHandler && - (!pImpl->RespectDiagnosticFilters || isDiagnosticEnabled(DI)) && - pImpl->DiagHandler->handleDiagnostics(DI)) -return; + if (pImpl->DiagHandler) { +if (DI.getSeverity() == DS_Error) + pImpl->DiagHandler->HasErrors = true; teresajohnson wrote: Perhaps move the set of HasErrors into a new non-virtual method that we call below just before calling handleDiagnostics. Also, should this be set when handleDiagnostics is not called? https://github.com/llvm/llvm-project/pull/75889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [llvm] [clang] [LTO] Improve diagnostics handling when parsing module-level inline assembly (PR #75726)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/75726 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] Disable PGO instrumentation on naked function (PR #75224)
https://github.com/teresajohnson approved this pull request. https://github.com/llvm/llvm-project/pull/75224 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] Disable PGO instrumentation on naked function (PR #75224)
@@ -892,6 +892,10 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy, } } + if (FD->hasAttr()) { teresajohnson wrote: Is this change needed given the separate change in LLVM skipPGOGen? https://github.com/llvm/llvm-project/pull/75224 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang] [clang-tools-extra] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
https://github.com/teresajohnson approved this pull request. LGTM other than a couple of minor comments and pending resolution of the LLVM IR test. Thanks! https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang] [clang-tools-extra] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -352,6 +366,8 @@ std::string getIRPGOFuncName(const Function &F, bool InLTO) { return getIRPGOObjectName(F, InLTO, getPGOFuncNameMetadata(F)); } +// DEPRECATED. Use `getIRPGOFuncName`for new code. See that function for teresajohnson wrote: In the header it is described as for FE instrumentation. Probably want the comments to be consistent? https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -0,0 +1,115 @@ +// This is a regression test for ThinLTO indirect-call-promotion when candidate +// callees need to be imported from another IR module. In the C++ test case, +// `main` calls `global_func` which is defined in another module. `global_func` +// has two indirect callees, one has external linkage and one has local linkage. +// All three functions should be imported into the IR module of main. + +// What the test does: +// - Generate raw profiles from executables and convert it to indexed profiles. +// During the conversion, a profiled callee address in raw profiles will be +// converted to function hash in indexed profiles. +// - Run IRPGO profile use and ThinTLO prelink pipeline and get LLVM bitcodes +// for both cpp files in the C++ test case. +// - Generate ThinLTO summary file with LLVM bitcodes, and run `function-import` pass. +// - Run `pgo-icall-prom` pass for the IR module which needs to import callees. + +// Use lld as linker for more robust test. We need to REQUIRE LLVMgold.so for +// LTO if default linker is GNU ld or gold anyway. +// REQUIRES: lld-available + +// Test should fail where linkage-name and mangled-name diverges, see issue https://github.com/llvm/llvm-project/issues/74565). +// Currently, this name divergence happens on Mach-O object file format, or on +// many (but not all) 32-bit Windows systems. +// +// XFAIL: system-darwin +// +// Mark 32-bit Windows as UNSUPPORTED for now as opposed to XFAIL. This test +// should fail on many (but not all) 32-bit Windows systems and succeed on the +// rest. The flexibility in triple string parsing makes it tricky to capture +// both sets accurately. i[3-9]86 specifies arch as Triple::ArchType::x86, (win32|windows) +// specifies OS as Triple::OS::Win32 +// +// UNSUPPORTED: target={{i[3-9]86-.*-(win32|windows).*}} + +// RUN: rm -rf %t && split-file %s %t && cd %t + +// Do setup work for all below tests. +// Generate raw profiles from real programs and convert it into indexed profiles. +// Use clangxx_pgogen for IR level instrumentation for C++. +// RUN: %clangxx_pgogen -fuse-ld=lld -O2 lib.cpp main.cpp -o main +// RUN: env LLVM_PROFILE_FILE=main.profraw %run ./main +// RUN: llvm-profdata merge main.profraw -o main.profdata + +// Use profile on lib and get bitcode, test that local function callee0 has +// expected !PGOFuncName metadata and external function callee1 doesn't have +// !PGOFuncName metadata. Explicitly skip ICP pass to test ICP happens as +// expected in the IR module that imports functions from lib. +// RUN: %clang -mllvm -disable-icp -fprofile-use=main.profdata -flto=thin -O2 -c lib.cpp -o lib.bc +// RUN: llvm-dis lib.bc -o - | FileCheck %s --check-prefix=PGOName + +// Use profile on main and get bitcode. +// RUN: %clang -fprofile-use=main.profdata -flto=thin -O2 -c main.cpp -o main.bc + +// Run llvm-lto to get summary file. +// RUN: llvm-lto -thinlto -o summary main.bc lib.bc + +// Test the imports of functions. Default import thresholds would work but do +// explicit override to be more futureproof. Note all functions have one basic +// block with a function-entry-count of one, so they are actually hot functions +// per default profile summary hotness cutoff. +// RUN: opt -passes=function-import -import-instr-limit=100 -import-cold-multiplier=1 -summary-file summary.thinlto.bc main.bc -o main.import.bc -print-imports 2>&1 | FileCheck %s --check-prefix=IMPORTS +// Test that '_Z11global_funcv' has indirect calls annotated with value profiles. +// RUN: llvm-dis main.import.bc -o - | FileCheck %s --check-prefix=IR + +// Test that both candidates are ICP'ed and there is no `!VP` in the IR. +// RUN: opt main.import.bc -icp-lto -passes=pgo-icall-prom -S -pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s --check-prefixes=ICP-IR,ICP-REMARK --implicit-check-not="!VP" + +// IMPORTS: main.cpp: Import _Z7callee1v +// IMPORTS: main.cpp: Import _ZL7callee0v.llvm.[[#]] +// IMPORTS: main.cpp: Import _Z11global_funcv + +// PGOName: define dso_local void @_Z7callee1v() {{.*}} !prof ![[#]] { +// PGOName: define internal void @_ZL7callee0v() {{.*}} !prof ![[#]] !PGOFuncName ![[#MD:]] { +// PGOName: ![[#MD]] = !{!"{{.*}}lib.cpp;_ZL7callee0v"} + +// IR-LABEL: define available_externally {{.*}} void @_Z11global_funcv() {{.*}} !prof ![[#]] { +// IR-NEXT: entry: +// IR-NEXT: %0 = load ptr, ptr @calleeAddrs +// IR-NEXT: tail call void %0(), !prof ![[#PROF1:]] +// IR-NEXT: %1 = load ptr, ptr getelementptr inbounds ([2 x ptr], ptr @calleeAddrs, +// IR-NEXT: tail call void %1(), !prof ![[#PROF2:]] + +// The GUID of indirect callee is the MD5 hash of `/path/to/lib.cpp:_ZL7callee0v` teresajohnson wrote: should this have the semicolon separator not a colon? https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -1,39 +0,0 @@ -; Do setup work for all below tests: generate bitcode and combined index teresajohnson wrote: Without use of the raw profile, this test would not have caught the regression. If we think the new compiler-rt test is enough to catch this case in the future, then perhaps we don't need this LLVM test at all (there should already be some ICP tests). But if the compiler-rt test is not enough, because some people don't run those, then this should stay and use a raw profile as input. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][fatlto] Don't set ThinLTO module flag with FatLTO (PR #75079)
teresajohnson wrote: Added a comment to that issue, I think it would be good to understand why unified LTO is not expected in that case (for the assertion). https://github.com/llvm/llvm-project/pull/75079 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -1,39 +1,45 @@ -; Do setup work for all below tests: generate bitcode and combined index -; RUN: opt -module-summary %s -o %t.bc -; RUN: opt -module-summary %p/Inputs/thinlto_indirect_call_promotion.ll -o %t2.bc +; The raw profiles and reduced IR inputs are generated from Inputs/update_icall_promotion_inputs.sh teresajohnson wrote: The reason for having this in a single test this way is specifically to detect mismatches in the separator used in the PGO name and in the ThinLTO index. For this we need to convert from raw profile and feed it through ThinLTO+ICP. Having separate tests would not do this. I.e. with the change from ':' to ';', the compiler-rt test would presumably have failed and would be updated, but the test using the proftext input would not fail. The lack of testing these pieces together led to the prior separator rename not updating all the necessary compiler bits. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -1,39 +1,45 @@ -; Do setup work for all below tests: generate bitcode and combined index -; RUN: opt -module-summary %s -o %t.bc -; RUN: opt -module-summary %p/Inputs/thinlto_indirect_call_promotion.ll -o %t2.bc +; The raw profiles and reduced IR inputs are generated from Inputs/update_icall_promotion_inputs.sh teresajohnson wrote: No because the point of this test is to catch regressions if the separator character (which is hardcoded in the proftext) changes again in the future. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [compiler-rt] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
teresajohnson wrote: > David says the itanium remapper file was only used once during gcc to llvm > transition, so not relevant here. I believe it was actually for the libstdc++ to libc++ transition (see https://reviews.llvm.org/D51247 and https://reviews.llvm.org/D51240). If it is broken we'll at least want to add a FIXME there. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
teresajohnson wrote: > Using the same added ICP test, profile matching on local-linkage > `_ZL7callee0v` using `clang++ -v -fuse-ld=lld -O2 > -fprofile-use=thinlto_icall_prom.profdata ` , as > [this](https://gist.github.com/minglotus-6/11817ba645c6b12cd7116f41bfb1185e) > pgo-instr-use output shows. Can you clarify what you are saying here - is it working or not working? https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO, GlobalValue::LinkageTypes Linkage, StringRef FileName) { SmallString<64> Name; - if (llvm::GlobalValue::isLocalLinkage(Linkage)) { -Name.append(FileName.empty() ? "" : FileName); -Name.append(";"); - } Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true); teresajohnson wrote: > It wasn't broken in general, but it's needed to get -order_file working > correctly. Unfortunately this change broke aspects of ThinLTO ICP, however. Is it possible to change the handling of -order_file in the linker to modify the symbol names appropriately? > To avoid subtle issues when linkage-name is different from mangled names,,I'm > wondering if it warrants a change to use linkage-names (as opposed to mangled > name) in GlobalValue::getGlobalIdentifier in this PR. If the -order_name handling cannot be fixed as I suggested above, then yes, we need some solution to ensure that the hashed symbol names are consistent across PGO and ThinLTO. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO, GlobalValue::LinkageTypes Linkage, StringRef FileName) { SmallString<64> Name; - if (llvm::GlobalValue::isLocalLinkage(Linkage)) { -Name.append(FileName.empty() ? "" : FileName); -Name.append(";"); - } Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true); teresajohnson wrote: Was PGO broken in general before D156569? Or is this specific to `-symbol-ordering-file` and `-order_file`? Also, it looks like some of the main changes here by the mangler for objective C are to strip the "\01" prefix. Is this different than the support to remove the '\1' mangling escape in getGlobalIdentifier()? Trying to figure out how we keep these interfaces in sync to avoid more issues with ICP... https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO, GlobalValue::LinkageTypes Linkage, StringRef FileName) { SmallString<64> Name; - if (llvm::GlobalValue::isLocalLinkage(Linkage)) { -Name.append(FileName.empty() ? "" : FileName); -Name.append(";"); - } Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true); teresajohnson wrote: Can you give an example of where this is called such that there is a difference between the function name and linkage name (at least for c++ these are the same afaik), and what that difference looks like? Are there specific callsites used for the symbol ordering support that need the mangling? https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -246,11 +246,27 @@ std::string InstrProfError::message() const { char InstrProfError::ID = 0; -std::string getPGOFuncName(StringRef RawFuncName, - GlobalValue::LinkageTypes Linkage, +std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage, teresajohnson wrote: I guess it depends on whether it is intentional that Clang (and Swift apparently?) use the old name. @ellishg was there a reason to leave that as is? If they must use the old one then maybe getFEPGOFuncName. If they should be transitioned eventually then using "Legacy" makes sense. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -246,11 +246,27 @@ std::string InstrProfError::message() const { char InstrProfError::ID = 0; -std::string getPGOFuncName(StringRef RawFuncName, - GlobalValue::LinkageTypes Linkage, +std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage, teresajohnson wrote: Oh I missed the fact that one getPGOFuncName interface was left. I am only seeing 2 invocations of that interface outside of the unittest test. I would be in favor of doing renames of both getPGOFuncName interfaces in one patch. A separate NFC patch is a fine option, and keeps this patch just about fixing the ICP breakage caused by the delimiter change. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO, GlobalValue::LinkageTypes Linkage, StringRef FileName) { SmallString<64> Name; - if (llvm::GlobalValue::isLocalLinkage(Linkage)) { -Name.append(FileName.empty() ? "" : FileName); -Name.append(";"); - } Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true); teresajohnson wrote: What I'm confused about is that the function name is already the mangled name, at least using clang with c++. Is it not for objective C? https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -246,11 +246,27 @@ std::string InstrProfError::message() const { char InstrProfError::ID = 0; -std::string getPGOFuncName(StringRef RawFuncName, - GlobalValue::LinkageTypes Linkage, +std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage, teresajohnson wrote: Is this resolved? It looks like the most recent version of the patch changes getPGOFuncName to getLegacyPGOFuncName for the old format, and uses getIRPGOFuncName for the new format. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)
@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO, GlobalValue::LinkageTypes Linkage, StringRef FileName) { SmallString<64> Name; - if (llvm::GlobalValue::isLocalLinkage(Linkage)) { -Name.append(FileName.empty() ? "" : FileName); -Name.append(";"); - } Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true); teresajohnson wrote: @ellishg How much of D156569 relied on the invocation of Mangler? It is not mentioned in the patch description, only the rationale for changing ":" to ";". The problem is if these are out of sync, then cross-module importing of indirectly called local functions will continue to be broken in whatever cases Mangler().getNameWithPrefix affects. https://github.com/llvm/llvm-project/pull/74008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)
teresajohnson wrote: @snehasish and I chatted more about this offline. Using the dwarf info to figure out the right source name prefix during indexing is not straightforward. The source file name used as the prefix in the compiler for the IRPGOName is that of the TU. The source file of the line in the dwarf info may be a header. It could be possible to walk up the profiled inline frames to find a source file, but that is error prone in certain circumstances (e.g. if the profiled binary had thinlto cross module inlining, and if it isn't obvious which source file name is a non-included TU). I think your change to the matching is almost strictly better than the complete non-matching we get currently for local functions. It would even work in many cases without uniq suffixes, but uniq suffixes will make that work even better. So let's go ahead with that change. However, we'd like to request that you split the __cxx_global_var_init change into a separate patch, as it is somewhat orthogonal to the matching change, and in case there are unexpected issues with that change. Can you split that out and make the test case here for matching of a non __cxx_global_var_init function? Thanks! https://github.com/llvm/llvm-project/pull/73236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)
teresajohnson wrote: > > Yes, you're right. As an alternative can we use the symbol table and find > > Bind = LOCAL to add the prefix before hashing? > > If we choose this method. I think we can't deal with the situation which one > symbol is not local linkage type in thin compile, but will be converted to > local linkage after thin backend by thinlto's internalization. In this > situation function name in llvm-profdata will have prefix with file name. But > when llvm consumes memory profile, PGOFuncName won't return function name > with prefix. If I understand the issue you are describing, that would only occur if the instrumentation build also used ThinLTO. Is that a typical use case for you? https://github.com/llvm/llvm-project/pull/73236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -1530,14 +1530,11 @@ PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level, } ModulePassManager -PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level, bool ThinLTO, -bool EmitSummary) { +PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level) { ModulePassManager MPM; - MPM.addPass(EmbedBitcodePass(ThinLTO, EmitSummary, - ThinLTO - ? buildThinLTOPreLinkDefaultPipeline(Level) - : buildLTOPreLinkDefaultPipeline(Level))); - MPM.addPass(buildPerModuleDefaultPipeline(Level)); + MPM.addPass(buildThinLTOPreLinkDefaultPipeline(Level)); + MPM.addPass(EmbedBitcodePass()); + MPM.addPass(buildThinLTODefaultPipeline(Level, /*ImportSummary=*/nullptr)); teresajohnson wrote: What was the downside of using ThinLTODefaultPipeline always? I guess it was essentially over-optimizing in the non-LTO case? I guess using the ThinLTODefaultPipeline only under SamplePGO is ok with me, although it seems like over time as the pipelines get modified it will be difficult to ensure that is the only case where the pipelines get out of sync. I think in either case we are essentially saying: if you use Fat LTO then don't expect the resulting non-LTO native code to be the same as that of a fully non-LTO compile. In one case there is more optimization, in the other there is the risk of future deoptimization if things don't stay in sync. https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
https://github.com/teresajohnson approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -1861,6 +1861,13 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, if (Args.hasArg(OPT_funified_lto)) Opts.PrepareForThinLTO = true; } + if (Arg *A = Args.getLastArg(options::OPT_ffat_lto_objects, + options::OPT_fno_fat_lto_objects)) { +if (!Args.hasArg(OPT_funified_lto)) + Diags.Report(diag::err_drv_incompatible_options) teresajohnson wrote: It might be less confusing to users if this error message is only given upon an explicit -fno-unified-lto, and diag::err_drv_argument_only_allowed_with is used for the lack of -funified-lto. Also can you add driver tests to check that we get the expected error(s) in the expected option combinations? https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)
teresajohnson wrote: > @lifengxiang1025 thanks for flagging this issue. I think it's best to not > rely on unique-internal-linkage-name here. Instead we should extend the logic > in RawMemProfReader.cpp to include "filename;" if the function is internal > linkage as expected by IRPGOFuncName. Can you try this approach? I don't think we want to change the name in the frames themselves, as the names in the debug info when matching don't contain this prefix. Although I suppose we could modify the matcher to add the prefixes when matching frames too. I.e. here: https://github.com/llvm/llvm-project/blob/c0fe0719df457b0992606b0f2a235719a5bbfd54/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp#L806 It sounds like the issue is that the names used to index the memprof records do not use the PGOFuncName scheme and therefore we never find any memprof records for an internal linkage function during matching. I.e. the GUID passed to InstrProfWriter::addMemProfRecord. Unfortunately, looking at RawMemProfReader.cpp it does look like that eventually gets populated out of the Function (GUID) field from same Frame setup code. So we'd either need to do some extra handling in that code so that the prefix is only used for the record entry, or change the matcher (the latter may be the easiest). https://github.com/llvm/llvm-project/pull/73236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // Only enable CGProfilePass when using integrated assembler, since // non-integrated assemblers don't recognize .cgprofile section. PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; - PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO; + PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO; teresajohnson wrote: Actually, I believe the cc1 parsing is handled by ParseCodeGenArgs in clang/lib/Frontend/CompilerInvocation.cpp. You can potentially add something in there to ensure UnifiedLTO is set with FatLTO? Or give your error there that UnifiedLTO must be specified with Fat LTO. https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // Only enable CGProfilePass when using integrated assembler, since // non-integrated assemblers don't recognize .cgprofile section. PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; - PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO; + PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO; teresajohnson wrote: Ah ok. I guess that would be expected. Whether it is test friendly is another question, but in general the driver does a lot of option set up for the cc1 invocation. https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // Only enable CGProfilePass when using integrated assembler, since // non-integrated assemblers don't recognize .cgprofile section. PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; - PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO; + PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO; teresajohnson wrote: Not sure I follow. Isn't the change in Driver/Toolchains/Clang.cpp going to ensure that -funified-lto is also passed to the cc1 invocation, and thus both options should be set in CodeGenOpts during the cc1 invocation? https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)
@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // Only enable CGProfilePass when using integrated assembler, since // non-integrated assemblers don't recognize .cgprofile section. PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS; - PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO; + PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO; teresajohnson wrote: Won't your change to lib/Driver/ToolChains/Clang.cpp below mean that CodeGenOpts.UnifiedLTO will always be set with FatLTO? Do we thus need this change or the one further below for the module flag (and maybe add an assert that UnifiedLTO is set of FatLTO is set)? https://github.com/llvm/llvm-project/pull/72180 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)
teresajohnson wrote: @snehasish can you take a look at the issue described here? Should we be doing something different in llvm-profdata? https://github.com/llvm/llvm-project/pull/73236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] Support VFE in thinLTO (PR #69735)
@@ -338,12 +574,33 @@ PreservedAnalyses GlobalDCEPass::run(Module &M, ModuleAnalysisManager &MAM) { // The second pass drops the bodies of functions which are dead... std::vector DeadFunctions; - for (Function &F : M) + std::set DeadFunctionsSet; + auto funcRemovedInIndex = [&](Function *F) -> bool { +if (!ImportSummary) + return false; +auto &vfuncsRemoved = ImportSummary->funcsToBeRemoved(); +// If function is internalized, its current GUID will be different +// from the GUID in funcsToBeRemoved. Treat it as a global and search +// again. +if (vfuncsRemoved.count(F->getGUID())) + return true; +if (vfuncsRemoved.count(GlobalValue::getGUID(F->getName( teresajohnson wrote: These lookups are a little dangerous as I think there are cases where we might rename and it wouldn't find the right function. For internalization we tag the function with an attribute before any thinlto renaming, linkage changes, or other optimization passes. See https://github.com/llvm/llvm-project/blob/0936a718492d248a726b8e8d4529e4aa194bc8e9/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp#L271. https://github.com/llvm/llvm-project/pull/69735 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] Support VFE in thinLTO (PR #69735)
@@ -34,12 +40,223 @@ static cl::opt ClEnableVFE("enable-vfe", cl::Hidden, cl::init(true), cl::desc("Enable virtual function elimination")); +static cl::opt ClReadSummary( +"globaldce-read-summary", +cl::desc("Read summary from given bitcode before running pass"), +cl::Hidden); + STATISTIC(NumAliases , "Number of global aliases removed"); STATISTIC(NumFunctions, "Number of functions removed"); STATISTIC(NumIFuncs,"Number of indirect functions removed"); STATISTIC(NumVariables, "Number of global variables removed"); STATISTIC(NumVFuncs,"Number of virtual functions removed"); +namespace llvm { + +// Returning a representative summary for the vtable, also set isSafe. +static const GlobalVarSummary * +getVTableFuncsForTId(const TypeIdOffsetVtableInfo &P, bool &isSafe) { + // Find a representative copy of the vtable initializer. + const GlobalVarSummary *VS = nullptr; + bool LocalFound = false; + for (auto &S : P.VTableVI.getSummaryList()) { +if (GlobalValue::isLocalLinkage(S->linkage())) { + if (LocalFound) { +isSafe = false; +return nullptr; + } + LocalFound = true; +} +auto *CurVS = cast(S->getBaseObject()); +// Ignore if vTableFuncs is empty and vtable is available_externally. +if (!CurVS->vTableFuncs().empty() || +!GlobalValue::isAvailableExternallyLinkage(S->linkage())) { + VS = CurVS; + if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic) { +isSafe = false; +return VS; + } +} + } + + if (!VS) { +isSafe = false; +return nullptr; + } + if (!VS->isLive()) { +isSafe = true; +return nullptr; + } + isSafe = true; + return VS; +} + +static void collectSafeVTables( +ModuleSummaryIndex &Summary, +DenseMap> &NameByGUID, +std::map> &VFESafeVTablesAndFns) { + // Update VFESafeVTablesAndFns with information from summary. + for (auto &P : Summary.typeIdCompatibleVtableMap()) { +NameByGUID[GlobalValue::getGUID(P.first)].push_back(P.first); +LLVM_DEBUG(dbgs() << "TId " << GlobalValue::getGUID(P.first) << " " + << P.first << "\n"); + } + llvm::errs() << "VFEThinLTO number of TIds: " << NameByGUID.size() << "\n"; + + // VFESafeVTablesAndFns: map from VI for vTable to VI for vfunc + std::map> vFuncSet; + unsigned numSafeVFuncs = 0; + // Collect stats for VTables (safe, public-visibility, other). + std::set vTablePublicVis; + std::set vTableOther; + for (auto &TidSummary : Summary.typeIdCompatibleVtableMap()) { +for (const TypeIdOffsetVtableInfo &P : TidSummary.second) { + LLVM_DEBUG(dbgs() << "TId-vTable " << TidSummary.first << " " +<< P.VTableVI.name() << " " << P.AddressPointOffset +<< "\n"); + bool isSafe = false; + const GlobalVarSummary *VS = getVTableFuncsForTId(P, isSafe); + if (!isSafe && VS) +vTablePublicVis.insert(P.VTableVI); + if ((isSafe && !VS) || (!isSafe && !VS)) +vTableOther.insert(P.VTableVI); + if (!isSafe || !VS) { +continue; + } + + // Go through VS->vTableFuncs + for (auto VTP : VS->vTableFuncs()) { +if (vFuncSet.find(P.VTableVI) == vFuncSet.end() || +!vFuncSet[P.VTableVI].count(VTP.FuncVI.getGUID())) { + VFESafeVTablesAndFns[P.VTableVI].push_back(VTP); + LLVM_DEBUG(dbgs() + << "vTable " << P.VTableVI.name() << " " + << VTP.FuncVI.name() << " " << VTP.VTableOffset << "\n"); + ++numSafeVFuncs; +} +vFuncSet[P.VTableVI].insert(VTP.FuncVI.getGUID()); + } +} + } + llvm::errs() << "VFEThinLTO number of vTables: " << vFuncSet.size() << " " + << vTablePublicVis.size() << " " << vTableOther.size() << "\n"; + llvm::errs() << "VFEThinLTO numSafeVFuncs: " << numSafeVFuncs << "\n"; +} + +static void checkVTableLoadsIndex( +ModuleSummaryIndex &Summary, +DenseMap> &NameByGUID, +std::map> &VFESafeVTablesAndFns, +std::map> &VFuncsAndCallers) { + // Go through Function summarys for intrinsics, also funcHasNonVTableRef to + // erase entries from VFESafeVTableAndFns. teresajohnson wrote: why and when are entries being erased? More qualitative comments about what this algorithm is doing would be helpful. https://github.com/llvm/llvm-project/pull/69735 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits