[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)

2024-10-02 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/90934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CGData][ThinLTO][NFC] Prep for two-codegen rounds (PR #90934)

2024-10-02 Thread Teresa Johnson via cfe-commits


@@ -586,7 +586,9 @@ Error lto::thinBackend(const Config &Conf, unsigned Task, 
AddStreamFn AddStream,
   Mod.setPartialSampleProfileRatio(CombinedIndex);
 
   LLVM_DEBUG(dbgs() << "Running ThinLTO\n");
-  if (Conf.CodeGenOnly) {
+  if (CodeGenOnly) {

teresajohnson wrote:

Can you add a comment that this value may be different than the one in the 
provided Config.

https://github.com/llvm/llvm-project/pull/90934
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)

2024-09-30 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> > * Looking at the NFC, this seems like it has very similar issues to 
> > Propeller, which wants to redo just the codegen with a new injected profile 
> > and BB ordering. It would be good to see if we can converge to similar 
> > approaches. I asked @rlavaee to take a look and he is reading through the 
> > background on this work. @rlavaee do you think Propeller could use a 
> > similar approach to this where it saves the pre-codegen bitcode and 
> > re-loads it instead of redoing opt? This isn't necessarily an action item 
> > for this PR, but I wanted Rahman to take a look since he is more familiar 
> > with codegen.
> 
> It's interesting to know that Propeller wants to redo the codegen. I'm happy 
> to align with this work. We've already started discussing this and have 
> shared some details from our side. Here's the link for more info: 
> https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/11?u=kyulee-com.

Great, I hadn't looked at the RFC again and didn't realize Rahman already 
responded there.

> 
> > * I think this should be doable to implement with distributed ThinLTO if we 
> > want in the future, from what I can tell. But we'll need a way to save the 
> > bitcode before codegen from a non-in-process backend (e.g. the thinBackend 
> > invoked from clang). Not asking you to do anything for this PR on that, but 
> > just thinking through it. Seems doable...
> 
> I think technically it's doable, but initially, I believed we did not want to 
> repeat the code generation with distributed ThinLTO unless we were willing to 
> introduce additional synchronization and spawn distributed ThinLTO backends 
> again with the merged codegen data. If we aim to leverage the saved optimized 
> IR, I suppose we need to assign the same backend work to the same machine 
> used in the first run. As commented above in the link, I wanted to confine 
> repeating the codegen is for a single machine (In-process backend). I mainly 
> wanted to separate the writer/reader builds for the use of distributed 
> ThinLTO builds to avoid this complication while allowing some stale codege 
> data. However, it's certainly worth experimenting with.

Agree it makes sense to do the in-process mode support first as that is more 
straightforward. 

> If we aim to leverage the saved optimized IR, I suppose we need to assign the 
> same backend work to the same machine used in the first run.

Not necessarily. The IR just needs to be saved in a path that the build system 
specifies and can therefore move around.

> 
> > * My biggest concern about this patch as written is whether it will break 
> > down under LTO's caching mechanism - have you tested it with caching? It 
> > seems like it might just skip the backend completely since you aren't 
> > adding anything to the cache key.
> 
> That's a good catch! Assuming this is not a common scenario (as mentioned in 
> the link above RFC), my initial design intentionally disables LTO's caching 
> for correctness and simplicity by setting an empty `FileCache()` in the 
> constructor for both the first and second round backends. To enable proper 
> caching, we need two additional caches/streams, in addition to one for the 
> final object output: one for the scratch object files and another for the 
> optimized IR files. I've managed to implement this with some refactoring, 
> details of which I will follow.

Yeah, there are a few caching aspects. Adding caching of the scratch files and 
optimized IR being reused here is one aspect that will enable more caching and 
better build performance for this optimization. I was more concerned about the 
existing LTO caching and whether that would be broken, but missed that you were 
previously disabling caching. We use distributed ThinLTO but my understanding 
is that LTO caching is frequently used for in-process ThinLTO. I'll look at 
your refactoring and updates as soon as I can.



https://github.com/llvm/llvm-project/pull/90933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)

2024-09-27 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> > @teresajohnson Do you have any concern or comment on this direction?
> 
> Just a quick reply to say that I am taking a look today (unfamiliar with this 
> approach so need to read through the RFC etc) and will get back later today

Sorry I was obviously over optimistic on when I could do this! A couple of high 
level comments / questions:

- Looking at the NFC, this seems like it has very similar issues to Propeller, 
which wants to redo just the codegen with a new injected profile and BB 
ordering. It would be good to see if we can converge to similar approaches. I 
asked @rlavaee to take a look and he is reading through the background on this 
work. @rlavaee do you think Propeller could use a similar approach to this 
where it saves the pre-codegen bitcode and re-loads it instead of redoing opt? 
This isn't necessarily an action item for this PR, but I wanted Rahman to take 
a look since he is more familiar with codegen.

- I think this should be doable to implement with distributed ThinLTO if we 
want in the future, from what I can tell. But we'll need a way to save the 
bitcode before codegen from a non-in-process backend (e.g. the thinBackend 
invoked from clang). Not asking you to do anything for this PR on that, but 
just thinking through it. Seems doable...

- My biggest concern about this patch as written is whether it will break down 
under LTO's caching mechanism - have you tested it with caching? It seems like 
it might just skip the backend completely since you aren't adding anything to 
the cache key.



https://github.com/llvm/llvm-project/pull/90933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)

2024-09-25 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> @teresajohnson Do you have any concern or comment on this direction?

Just a quick reply to say that I am taking a look today (unfamiliar with this 
approach so need to read through the RFC etc) and will get back later today

https://github.com/llvm/llvm-project/pull/90933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-25 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency

teresajohnson wrote:

"the dependency chain"

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison

teresajohnson wrote:

The wording "earlier candidate of one icall can affect later function candidate 
in an undesired way" is a little confusing to me. I think what you mean is that 
doing so for an earlier candidate delays the comparisons for later candidates, 
but that for the last candidate, only the fallback path is affected? Do we 
expect to set this parameter above 1?

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency
+// chain for the subsequent candidates. Set its value to 1 for non-last
+// candidate and allow option to override it for the last candidate.
+int MaxNumVTable = 1;
+if (I == CandidateSize - 1)
+  MaxNumVTable = ICPMaxNumVTableLastCandidate;
+
+if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {
+  return false;
+}
+  }
+
+  // If the indirect fallback is not cold, don't compare vtables.
+  if (PSI && PSI->hasProfileSummary() &&
+  !PSI->isColdCount(RemainingVTableCount))
+return false;

teresajohnson wrote:

Ditto

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -140,14 +348,56 @@ class IndirectCallPromoter {
   // indirect callee with functions. Returns true if there are IR
   // transformations and false otherwise.
   bool tryToPromoteWithFuncCmp(
-  CallBase &CB, const std::vector &Candidates,
-  uint64_t TotalCount, ArrayRef ICallProfDataRef,
-  uint32_t NumCandidates);
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates, uint64_t TotalCount,
+  ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Promote a list of targets for one indirect call by comparing vtables with
+  // functions. Returns true if there are IR transformations and false
+  // otherwise.
+  bool tryToPromoteWithVTableCmp(
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates,
+  uint64_t TotalFuncCount, uint32_t NumCandidates,
+  MutableArrayRef ICallProfDataRef,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Returns true if it's profitable to compare vtables for the callsite.
+  bool isProfitableToCompareVTables(
+  const CallBase &CB, const std::vector &Candidates,
+  uint64_t TotalCount);
+
+  // Given an indirect callsite and the list of function candidates, compute
+  // the following vtable information in output parameters and returns vtable
+  // pointer if type profiles exist.
+  // - Populate `VTableGUIDCounts` with  using !prof
+  // metadata attached on the vtable pointer.
+  // - For each function candidate, finds out the vtables from which it get

teresajohnson wrote:

s/get/gets/

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency

teresajohnson wrote:

s/elongates/elongate/

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) {
   return F.getMetadata(getPGOFuncNameMetadataName());
 }
 
-void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) {
-  // Only for internal linkage functions.
-  if (PGOFuncName == F.getName())
-  return;
-  // Don't create duplicated meta-data.
-  if (getPGOFuncNameMetadata(F))
+static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
+  StringRef PGOName) {
+  // For internal linkage objects, its name is not the same as its PGO name.

teresajohnson wrote:

I think it would be useful to keep the old comment too about this only being 
for internal linkage functions.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to

teresajohnson wrote:

s/container/contain/

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  if (!EnableVTableProfileUse || VPtr == nullptr ||
+  !VPtr->getMetadata(LLVMContext::MD_prof))
+return;
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call. After each candidate
+// is promoted, a new basic block is created for the indirect fallback 
basic
+// block and indirect call `CB` is moved into this new BB.
+BasicBlock *OriginalBB = CB.getParent();
+promoteCallWithVTableCmp(
+CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints,
+createBranchWeights(CB.getContext(), Candidate.Count,
+TotalFuncCount - Candidate.Count));
+
+int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent());
+
+ORE.emi

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

lgtm with some mostly comment related fixes/suggestions

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to
+// this vtable, for example, `Base*` can be a pointer to an instantiated type
+// but not vice versa. See also https://llvm.org/docs/TypeMetadata.html
+static std::optional
+getAddressPointOffset(const GlobalVariable &VTableVar,
+  StringRef CompatibleType) {
+  SmallVector Types;
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` is used via its `UserInst`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Guaranteed by ICP transformation");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)

teresajohnson wrote:

It looks like tryToSinkInstruction is called repeatedly for insts all from the 
same BB, and for the same DestBB - consider hoisting this check out to the 
caller of tryToSinkInstruction (i.e. into tryToSinkInstructions).

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -277,35 +626,160 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!EnableVTableProfileUse || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// After a virtual call candidate gets promoted, update the vtable's counts
+// proportionally. Each vtable-guid in `C.VTableGUIDAndCounts` represents
+// a vtable from which the virtual call is loaded. Compute the sum and use
+// 128-bit APInt to improve accuracy.
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  if (!EnableVTableProfileUse || VPtr == nullptr ||
+  !VPtr->getMetadata(LLVMContext::MD_prof))
+return;
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+
+  for (const auto &Candidate : Candidates) {
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableGUIDCounts[GUID] -= Count;
+
+// 'OriginalBB' is the basic block of indirect call. After each candidate
+// is promoted, a new basic block is created for the indirect fallback 
basic
+// block and indirect call `CB` is moved into this new BB.
+BasicBlock *OriginalBB = CB.getParent();
+promoteCallWithVTableCmp(
+CB, VPtr, Candidate.TargetFunction, Candidate.AddressPoints,
+createBranchWeights(CB.getContext(), Candidate.Count,
+TotalFuncCount - Candidate.Count));
+
+int SinkCount = tryToSinkInstructions(OriginalBB, CB.getParent());
+
+ORE.emi

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -322,14 +796,133 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const CallBase &CB, const std::vector &Candidates,
+uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+auto &Candidate = Candidates[I];
+uint64_t CandidateVTableCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  CandidateVTableCount += Count;
+
+if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) 
{
+  LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+<< "-th candidate, function count " << Candidate.Count
+<< " and its vtable count " << CandidateVTableCount
+<< " have discrepancies\n");
+  return false;
+}
+
+RemainingVTableCount -= Candidate.Count;
+
+// 'MaxNumVTable' limits the number of vtables to make vtable comparison
+// profitable. Comparing multiple vtables for one function candidate will
+// insert additional instructions on the hot path, and allowing more than
+// one vtable for non last candidates may or may not elongates dependency
+// chain for the subsequent candidates. Set its value to 1 for non-last
+// candidate and allow option to override it for the last candidate.
+int MaxNumVTable = 1;
+if (I == CandidateSize - 1)
+  MaxNumVTable = ICPMaxNumVTableLastCandidate;
+
+if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {

teresajohnson wrote:

Might be useful to have a debug or missed optimization message for this case? 

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -1425,16 +1430,27 @@ MDNode *getPGOFuncNameMetadata(const Function &F) {
   return F.getMetadata(getPGOFuncNameMetadataName());
 }
 
-void createPGOFuncNameMetadata(Function &F, StringRef PGOFuncName) {
-  // Only for internal linkage functions.
-  if (PGOFuncName == F.getName())
-  return;
-  // Don't create duplicated meta-data.
-  if (getPGOFuncNameMetadata(F))
+static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
+  StringRef PGOName) {
+  // For internal linkage objects, its name is not the same as its PGO name.
+  if (GO.getName() == PGOName)
 return;
-  LLVMContext &C = F.getContext();
-  MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName));
-  F.setMetadata(getPGOFuncNameMetadataName(), N);
+
+  // Don't created duplicated metadata.

teresajohnson wrote:

s/created/create/

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -103,27 +112,226 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt ICPVTablePercentageThreshold(
+"icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
+// for the last function candidate and use the option below to cap the number
+// of vtables.
+static cl::opt ICPMaxNumVTableLastCandidate(
+"icp-max-num-vtable-last-candidate", cl::init(1), cl::Hidden,
+cl::desc("The maximum number of vtable for the last candidate."));
+
 namespace {
 
+// The key is a vtable global variable, and the value is a map.
+// In the inner map, the key represents address point offsets and the value is 
a
+// constant for this address point.
+using VTableAddressPointOffsetValMap =
+SmallDenseMap>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// The key is vtable GUID, and value is its value profile count.
+using VTableGUIDCountsMap = SmallDenseMap;
+
+// Returns the address point offset of the given compatible type.
+//
+// Type metadata of a vtable specifies the types that can container a pointer 
to
+// this vtable, for example, `Base*` can be a pointer to an instantiated type

teresajohnson wrote:

I think "instantiated type" should be "derived type"

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -20,25 +20,25 @@ CHECK-NEXT:  -
 
 CHECK:  Records:
 CHECK-NEXT:  -
-CHECK-NEXT:FunctionGUID: 15505678318020221912
+CHECK-NEXT:FunctionGUID: 3873612792189045660
 CHECK-NEXT:AllocSites:
 CHECK-NEXT:-
 CHECK-NEXT:  Callstack:
 CHECK-NEXT:  -
-CHECK-NEXT:Function: 15505678318020221912
-CHECK-NEXT:SymbolName: qux
+CHECK-NEXT:Function: 3873612792189045660
+CHECK-NEXT:SymbolName: _Z3quxi

teresajohnson wrote:

Not a problem, just struck me as odd since your patch shouldn't change. But 
like you said, probably some prior change was made that didn't require updating 
this test to get it to pass, and now those changes are showing up.

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -96,19 +102,63 @@ llvm::SmallVector readSegmentEntries(const 
char *Ptr) {
 }
 
 llvm::SmallVector>
-readMemInfoBlocks(const char *Ptr) {
+readMemInfoBlocksV3(const char *Ptr) {
   using namespace support;
 
   const uint64_t NumItemsToRead =
-  endian::readNext(Ptr);
+  endian::readNext(Ptr);
+
   llvm::SmallVector> Items;
   for (uint64_t I = 0; I < NumItemsToRead; I++) {
 const uint64_t Id =
-endian::readNext(Ptr);
-const MemInfoBlock MIB = *reinterpret_cast(Ptr);
+endian::readNext(Ptr);
+
+// We cheat a bit here and remove the const from cast to set the
+// Histogram Pointer to newly allocated buffer. We also cheat, since V3 and
+// V4 do not have the same fields. V3 is missing AccessHistogramSize and
+// AccessHistogram. This means we read "dirty" data in here, but it should
+// not segfault, since there will be callstack data placed after this in 
the
+// binary format.
+MemInfoBlock MIB = *reinterpret_cast(Ptr);
+// Overwrite dirty data.

teresajohnson wrote:

Oh yes you are right, I missed the first `*` which means it is making a copy of 
what was in the memory pointed to by Ptr.

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-24 Thread Teresa Johnson via cfe-commits


@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) {
   return count;
 }
 
+// Accumulates the access count from the shadow for the given pointer and size.
+// See memprof_mapping.h for an overview on histogram counters.
+u64 GetShadowCountHistogram(uptr p, u32 size) {
+  u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p);
+  u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size);
+  u64 count = 0;
+  for (; shadow <= shadow_end; shadow++)
+count += *shadow;
+  return count;
+}
+
+// If we use the normal approach from clearCountersWithoutHistogram, the
+// histogram will clear too much data and may overwrite shadow counters that 
are
+// in use. Likely because of rounding up the shadow_end pointer.
+// See memprof_mapping.h for an overview on histogram counters.
+void clearCountersHistogram(uptr addr, uptr size) {
+  u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr);
+  u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size);
+  for (; shadow_8 < shadow_end_8; shadow_8++) {
+*shadow_8 = 0;
+  }
+}
+
+void clearCountersWithoutHistogram(uptr addr, uptr size) {
+  uptr shadow_beg = MEM_TO_SHADOW(addr);
+  uptr shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1;
+  REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg);
+}
+
 // Clears the shadow counters (when memory is allocated).
 void ClearShadow(uptr addr, uptr size) {

teresajohnson wrote:

I guess it works, because the shadow granularities are less than the page size, 
and because they are both scaling by the same 8 to 1 scale, and also I guess 
because the clear_shadow_mmap_threshold is an optimization and doesn't need to 
be exact. However, it still feels a little wonky to me (and also means that we 
have to do extra mapping operations here and again in `clearCounters*`). I do 
think I would prefer to have it be something like:

```
  uptr shadow_beg;
  uptr shadow_end;
if (__memprof_histogram) {
  shadow_beg = HISTOGRAM_MEM_TO_SHADOW(addr);
  shadow_end = HISTOGRAM_MEM_TO_SHADOW(addr + size);
} else {
  shadow_beg = MEM_TO_SHADOW(addr);
  shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1;
}
if (shadow_end - shadow_beg < common_flags()->clear_shadow_mmap_threshold) {
  REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg);
} else {
...
```

I.e. set shadow_beg/end based on whether we are doing the histogramming or not, 
leave the rest as-is. Would that work?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -0,0 +1,101 @@
+REQUIRES: x86_64-linux
+
+This is a copy of memprof-basict.test with slight changes to check that we can 
still read v3 of memprofraw.
+
+To update the inputs used below run Inputs/update_memprof_inputs.sh 
/path/to/updated/clang

teresajohnson wrote:

I think this comment is wrong - we can't update the v3 version, is that 
correct? Nor would we want to.

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -38,4 +38,5 @@ MEMPROF_FLAG(bool, 
allocator_frees_and_returns_null_on_realloc_zero, true,
 MEMPROF_FLAG(bool, print_text, false,
   "If set, prints the heap profile in text format. Else use the raw binary 
serialization format.")
 MEMPROF_FLAG(bool, print_terse, false,
- "If set, prints memory profile in a terse format. Only applicable 
if print_text = true.")
+ "If set, prints memory profile in a terse format. Only applicable 
"

teresajohnson wrote:

Formatting change only, remove from patch

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -205,8 +205,14 @@ class RawMemProfReader final : public MemProfReader {
 
   object::SectionedAddress getModuleOffset(uint64_t VirtualAddress);
 
+  llvm::SmallVector>
+  readMemInfoBlocks(const char *Ptr);
+
   // The profiled binary.
   object::OwningBinary Binary;
+  // Version of raw memprof binary currently being read. Defaults to most 
update

teresajohnson wrote:

typo: s/update to date/up to date/

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -610,13 +670,33 @@ RawMemProfReader::peekBuildIds(MemoryBuffer *DataBuffer) {
   return BuildIds.takeVector();
 }
 
+// FIXME: Add a schema for serializing similiar to IndexedMemprofReader. This
+// will help being able to deserialize different versions raw memprof versions
+// more easily.
+llvm::SmallVector>
+RawMemProfReader::readMemInfoBlocks(const char *Ptr) {
+  if (MemprofRawVersion == 3ULL) {
+errs() << "Reading V3\n";

teresajohnson wrote:

Once you do that, the braces can be removed from all of the if else conditions 
here which have single statement bodies

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -610,13 +670,33 @@ RawMemProfReader::peekBuildIds(MemoryBuffer *DataBuffer) {
   return BuildIds.takeVector();
 }
 
+// FIXME: Add a schema for serializing similiar to IndexedMemprofReader. This
+// will help being able to deserialize different versions raw memprof versions
+// more easily.
+llvm::SmallVector>
+RawMemProfReader::readMemInfoBlocks(const char *Ptr) {
+  if (MemprofRawVersion == 3ULL) {
+errs() << "Reading V3\n";

teresajohnson wrote:

Leftover debugging message that should be removed?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -0,0 +1,101 @@
+REQUIRES: x86_64-linux
+
+This is a copy of memprof-basict.test with slight changes to check that we can 
still read v3 of memprofraw.

teresajohnson wrote:

typo: basict

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) {
   return count;
 }
 
+// Accumulates the access count from the shadow for the given pointer and size.
+// See memprof_mapping.h for an overview on histogram counters.
+u64 GetShadowCountHistogram(uptr p, u32 size) {
+  u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p);
+  u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size);
+  u64 count = 0;
+  for (; shadow <= shadow_end; shadow++)
+count += *shadow;
+  return count;
+}
+
+// If we use the normal approach from clearCountersWithoutHistogram, the
+// histogram will clear too much data and may overwrite shadow counters that 
are
+// in use. Likely because of rounding up the shadow_end pointer.
+// See memprof_mapping.h for an overview on histogram counters.
+void clearCountersHistogram(uptr addr, uptr size) {
+  u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr);
+  u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size);
+  for (; shadow_8 < shadow_end_8; shadow_8++) {
+*shadow_8 = 0;
+  }
+}
+
+void clearCountersWithoutHistogram(uptr addr, uptr size) {
+  uptr shadow_beg = MEM_TO_SHADOW(addr);
+  uptr shadow_end = MEM_TO_SHADOW(addr + size - SHADOW_GRANULARITY) + 1;
+  REAL(memset)((void *)shadow_beg, 0, shadow_end - shadow_beg);
+}
+
 // Clears the shadow counters (when memory is allocated).
 void ClearShadow(uptr addr, uptr size) {

teresajohnson wrote:

Looking at this function more, there are a lot of uses of the MEM_TO_SHADOW 
computed values, which is not the right mapping function to use with the 
smaller granularity of the histogram case. I think you probably need to version 
the calls to MEM_TO_SHADOW here, then all of the rest of the code can work as 
is? I.e. you wouldn't need the 2 versions of `clearCounters*`

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -124,6 +124,13 @@ struct PortableMemInfoBlock {
   OS << "" << #Name << ": " << Name << "\n";
 #include "llvm/ProfileData/MIBEntryDef.inc"
 #undef MIBEntryDef
+if (AccessHistogramSize > 0) {
+  OS << "" << "AccessHistogramValues" << ":";
+  for (uint32_t I = 0; I < AccessHistogramSize; ++I) {
+OS << " -" << ((uint64_t *)AccessHistogram)[I];

teresajohnson wrote:

Why the " -" in the outputs? I noticed when looking at the text that they look 
like negative values as a result. Any reason not to just delimit with the space?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -508,7 +519,26 @@ void createProfileFileNameVar(Module &M) {
   }
 }
 
+// Set MemprofHistogramFlag as a Global veriable in IR. This makes it 
accessible
+// to

teresajohnson wrote:

nit: fix line wrapping

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -96,19 +102,63 @@ llvm::SmallVector readSegmentEntries(const 
char *Ptr) {
 }
 
 llvm::SmallVector>
-readMemInfoBlocks(const char *Ptr) {
+readMemInfoBlocksV3(const char *Ptr) {
   using namespace support;
 
   const uint64_t NumItemsToRead =
-  endian::readNext(Ptr);
+  endian::readNext(Ptr);
+
   llvm::SmallVector> Items;
   for (uint64_t I = 0; I < NumItemsToRead; I++) {
 const uint64_t Id =
-endian::readNext(Ptr);
-const MemInfoBlock MIB = *reinterpret_cast(Ptr);
+endian::readNext(Ptr);
+
+// We cheat a bit here and remove the const from cast to set the
+// Histogram Pointer to newly allocated buffer. We also cheat, since V3 and
+// V4 do not have the same fields. V3 is missing AccessHistogramSize and
+// AccessHistogram. This means we read "dirty" data in here, but it should
+// not segfault, since there will be callstack data placed after this in 
the
+// binary format.
+MemInfoBlock MIB = *reinterpret_cast(Ptr);
+// Overwrite dirty data.

teresajohnson wrote:

Isn't this going to overwrite some callstack data?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -20,25 +20,25 @@ CHECK-NEXT:  -
 
 CHECK:  Records:
 CHECK-NEXT:  -
-CHECK-NEXT:FunctionGUID: 15505678318020221912
+CHECK-NEXT:FunctionGUID: 3873612792189045660
 CHECK-NEXT:AllocSites:
 CHECK-NEXT:-
 CHECK-NEXT:  Callstack:
 CHECK-NEXT:  -
-CHECK-NEXT:Function: 15505678318020221912
-CHECK-NEXT:SymbolName: qux
+CHECK-NEXT:Function: 3873612792189045660
+CHECK-NEXT:SymbolName: _Z3quxi

teresajohnson wrote:

Why did the function symbol names change with your patch?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -508,7 +519,26 @@ void createProfileFileNameVar(Module &M) {
   }
 }
 
+// Set MemprofHistogramFlag as a Global veriable in IR. This makes it 
accessible
+// to
+// the runtime, changing shadow count behavior.
+void createMemprofHistogramFlagVar(Module &M) {
+  const StringRef VarName(MemProfHistogramFlagVar);
+  Type *IntTy1 = Type::getInt1Ty(M.getContext());
+  auto MemprofHistogramFlag = new GlobalVariable(
+  M, IntTy1, true, GlobalValue::WeakAnyLinkage,
+  Constant::getIntegerValue(IntTy1, APInt(1, ClHistogram)), VarName);
+  // MemprofHistogramFlag->setVisibility(GlobalValue::HiddenVisibility);

teresajohnson wrote:

remove?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [lldb] [llvm] [Memprof] Adds the option to collect AccessCountHistograms for memprof. (PR #94264)

2024-06-21 Thread Teresa Johnson via cfe-commits


@@ -216,6 +228,35 @@ u64 GetShadowCount(uptr p, u32 size) {
   return count;
 }
 
+// Accumulates the access count from the shadow for the given pointer and size.
+// See memprof_mapping.h for an overview on histogram counters.
+u64 GetShadowCountHistogram(uptr p, u32 size) {
+  u8 *shadow = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p);
+  u8 *shadow_end = (u8 *)HISTOGRAM_MEM_TO_SHADOW(p + size);
+  u64 count = 0;
+  for (; shadow <= shadow_end; shadow++)
+count += *shadow;
+  return count;
+}
+
+// If we use the normal approach from clearCountersWithoutHistogram, the
+// histogram will clear too much data and may overwrite shadow counters that 
are
+// in use. Likely because of rounding up the shadow_end pointer.
+// See memprof_mapping.h for an overview on histogram counters.
+void clearCountersHistogram(uptr addr, uptr size) {
+  u8 *shadow_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr);
+  u8 *shadow_end_8 = (u8 *)HISTOGRAM_MEM_TO_SHADOW(addr + size);
+  for (; shadow_8 < shadow_end_8; shadow_8++) {

teresajohnson wrote:

Why not use REAL(memset) like the non-histogram version below?

https://github.com/llvm/llvm-project/pull/94264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -140,14 +337,51 @@ class IndirectCallPromoter {
   // indirect callee with functions. Returns true if there are IR
   // transformations and false otherwise.
   bool tryToPromoteWithFuncCmp(
-  CallBase &CB, const std::vector &Candidates,
-  uint64_t TotalCount, ArrayRef ICallProfDataRef,
-  uint32_t NumCandidates);
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates, uint64_t TotalCount,
+  ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Promote a list of targets for one indirect call by comparing vtables with
+  // functions. Returns true if there are IR transformations and false
+  // otherwise.
+  bool tryToPromoteWithVTableCmp(
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates,
+  uint64_t TotalFuncCount, uint32_t NumCandidates,
+  MutableArrayRef ICallProfDataRef,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Returns true if it's profitable to compare vtables.
+  bool isProfitableToCompareVTables(
+  const std::vector &Candidates, uint64_t TotalCount);
+
+  // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each

teresajohnson wrote:

should that be "for each candidate with vtable information"?

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -321,14 +746,127 @@ bool 
IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
 if (!NumCandidates ||
 (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
   continue;
+
 auto PromotionCandidates = getPromotionCandidatesForCallSite(
 *CB, ICallProfDataRef, TotalCount, NumCandidates);
-Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-   ICallProfDataRef, NumCandidates);
+
+VTableGUIDCountsMap VTableGUIDCounts;
+Instruction *VPtr =
+computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+if (isProfitableToCompareVTables(PromotionCandidates, TotalCount))
+  Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+   TotalCount, NumCandidates,
+   ICallProfDataRef, VTableGUIDCounts);
+else
+  Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+ TotalCount, ICallProfDataRef,
+ NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+const std::vector &Candidates, uint64_t TotalCount) {
+  if (!ICPEnableVTableCmp || Candidates.empty())
+return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  for (size_t I = 0; I < Candidates.size(); I++) {
+auto &Candidate = Candidates[I];
+uint64_t VTableSumCount = 0;
+for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+  VTableSumCount += Count;
+
+if (VTableSumCount < Candidate.Count * ICPVTableCountPercentage)
+  return false;
+
+RemainingVTableCount -= Candidate.Count;
+
+int NumAdditionalVTable = 0;
+if (I == Candidates.size() - 1)
+  NumAdditionalVTable = ICPNumAdditionalVTableLast;
+
+int ActualNumAdditionalInst = Candidate.AddressPoints.size() - 1;
+if (ActualNumAdditionalInst > NumAdditionalVTable) {
+  return false;
+}
+  }
+
+  // If the indirect fallback is not cold, don't compare vtables.
+  if (PSI && PSI->hasProfileSummary() &&
+  !PSI->isColdCount(RemainingVTableCount))
+return false;
+
+  return true;
+}
+
+static void
+computeVirtualCallSiteTypeInfoMap(Module &M, ModuleAnalysisManager &MAM,
+  VirtualCallSiteTypeInfoMap &VirtualCSInfo) {
+  auto &FAM = 
MAM.getResult(M).getManager();
+  auto LookupDomTree = [&FAM](Function &F) -> DominatorTree & {
+return FAM.getResult(F);
+  };
+
+  auto compute = [&](Function *Func) {
+if (!Func || Func->use_empty())
+  return;
+// Iterate all type.test calls and find all indirect calls.
+// TODO: Add llvm.public.type.test
+for (Use &U : llvm::make_early_inc_range(Func->uses())) {
+  auto *CI = dyn_cast(U.getUser());
+  if (!CI)
+continue;
+  auto *TypeMDVal = cast(CI->getArgOperand(1));
+  if (!TypeMDVal)
+continue;
+  auto *CompatibleTypeId = dyn_cast(TypeMDVal->getMetadata());
+  if (!CompatibleTypeId)
+continue;
+
+  // Find out all devirtualizable call sites given a llvm.type.test
+  // intrinsic call.
+  SmallVector DevirtCalls;
+  SmallVector Assumes;
+  auto &DT = LookupDomTree(*CI->getFunction());
+  findDevirtualizableCallsForTypeTest(DevirtCalls, Assumes, CI, DT);
+
+  // type-id, offset from the address point
+  // combined with type metadata to compute function offset
+  for (auto &DevirtCall : DevirtCalls) {
+CallBase &CB = DevirtCall.CB;
+// Given an indirect call, try find the instruction which loads a
+// pointer to virtual table.
+Instruction *VTablePtr =
+PGOIndirectCallVisitor::tryGetVTableInstruction(&CB);
+if (!VTablePtr)
+  continue;
+VirtualCSInfo[&CB] = {DevirtCall.Offset, VTablePtr,
+  CompatibleTypeId->getString()};
+  }
+}
+  };
+
+  // Right now only llvm.type.test is used to find out virtual call sites.
+  // With ThinLTO and whole-program-devirtualization, llvm.type.test and
+  // llvm.public.type.test are emitted, and llvm.public.type.test is either
+  // refined to llvm.type.test or dropped before indirect-call-promotion pass.

teresajohnson wrote:

Do we then need to do the below analysis on llvm.public.type.test? ICP should 
be after it gets converted or dropped.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =

teresajohnson wrote:

Add comment describing the contents

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -140,14 +337,51 @@ class IndirectCallPromoter {
   // indirect callee with functions. Returns true if there are IR
   // transformations and false otherwise.
   bool tryToPromoteWithFuncCmp(
-  CallBase &CB, const std::vector &Candidates,
-  uint64_t TotalCount, ArrayRef ICallProfDataRef,
-  uint32_t NumCandidates);
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates, uint64_t TotalCount,
+  ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Promote a list of targets for one indirect call by comparing vtables with
+  // functions. Returns true if there are IR transformations and false
+  // otherwise.
+  bool tryToPromoteWithVTableCmp(
+  CallBase &CB, Instruction *VPtr,
+  const std::vector &Candidates,
+  uint64_t TotalFuncCount, uint32_t NumCandidates,
+  MutableArrayRef ICallProfDataRef,
+  VTableGUIDCountsMap &VTableGUIDCounts);
+
+  // Returns true if it's profitable to compare vtables.
+  bool isProfitableToCompareVTables(
+  const std::vector &Candidates, uint64_t TotalCount);
+
+  // Populate `VTableGUIDCounts` vtable GUIDs and their counts and each

teresajohnson wrote:

"with vtable GUIDs and..." ? (i.e. missing "and"?)

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -341,6 +879,17 @@ static bool promoteIndirectCalls(Module &M, 
ProfileSummaryInfo *PSI, bool InLTO,
 return false;
   }
   bool Changed = false;
+  VirtualCallSiteTypeInfoMap VirtualCSInfo;
+
+  computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);
+
+  // This map records states across functions in an LLVM IR module.

teresajohnson wrote:

What do you mean by "records states"?

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,222 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)

teresajohnson wrote:

What you are getting here is the address point offset of the given compatible 
type within the given vtable. See also 
https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata.
 It might be good to add a comment describing this more explicitly.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// Update VTableGUIDCounts
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {

teresajohnson wrote:

Can you add a comment describing the intuition behind the way the counts are 
being updated here

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -276,35 +585,151 @@ CallBase &llvm::pgo::promoteIndirectCall(CallBase &CB, 
Function *DirectCallee,
 
 // Promote indirect-call to conditional direct-call for one callsite.
 bool IndirectCallPromoter::tryToPromoteWithFuncCmp(
-CallBase &CB, const std::vector &Candidates,
-uint64_t TotalCount, ArrayRef ICallProfDataRef,
-uint32_t NumCandidates) {
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalCount,
+ArrayRef ICallProfDataRef, uint32_t NumCandidates,
+VTableGUIDCountsMap &VTableGUIDCounts) {
   uint32_t NumPromoted = 0;
 
   for (const auto &C : Candidates) {
-uint64_t Count = C.Count;
-pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, 
SamplePGO,
- &ORE);
-assert(TotalCount >= Count);
-TotalCount -= Count;
+uint64_t FuncCount = C.Count;
+pgo::promoteIndirectCall(CB, C.TargetFunction, FuncCount, TotalCount,
+ SamplePGO, &ORE);
+assert(TotalCount >= FuncCount);
+TotalCount -= FuncCount;
 NumOfPGOICallPromotion++;
 NumPromoted++;
-  }
 
+if (!ICPEnableVTableCmp || C.VTableGUIDAndCounts.empty())
+  continue;
+
+// Update VTableGUIDCounts
+uint64_t SumVTableCount = 0;
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
+  SumVTableCount += VTableCount;
+
+for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
+  APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
+  APFuncCount *= VTableCount;
+  VTableGUIDCounts[GUID] -= 
APFuncCount.udiv(SumVTableCount).getZExtValue();
+}
+  }
   if (NumPromoted == 0)
 return false;
 
-  // Adjust the MD.prof metadata. First delete the old one.
-  CB.setMetadata(LLVMContext::MD_prof, nullptr);
-
   assert(NumPromoted <= ICallProfDataRef.size() &&
  "Number of promoted functions should not be greater than the number "
  "of values in profile metadata");
+
+  // Update value profiles on the indirect call.
+  // TODO: Handle profile update properly when Clang `-fstrict-vtable-pointers`
+  // is enabled and a vtable is used to load multiple virtual functions.
+  updateFuncValueProfiles(CB, ICallProfDataRef.slice(NumPromoted), TotalCount,
+  NumCandidates);
+  // Update value profiles on the vtable pointer if it exists.
+  if (VPtr)
+updateVPtrValueProfiles(VPtr, VTableGUIDCounts);
+  return true;
+}
+
+void IndirectCallPromoter::updateFuncValueProfiles(
+CallBase &CB, ArrayRef CallVDs, uint64_t TotalCount,
+uint32_t MaxMDCount) {
+  // First clear the existing !prof.
+  CB.setMetadata(LLVMContext::MD_prof, nullptr);
   // Annotate the remaining value profiles if counter is not zero.
   if (TotalCount != 0)
-annotateValueSite(*F.getParent(), CB, ICallProfDataRef.slice(NumPromoted),
-  TotalCount, IPVK_IndirectCallTarget, NumCandidates);
+annotateValueSite(M, CB, CallVDs, TotalCount, IPVK_IndirectCallTarget,
+  MaxMDCount);
+}
+
+void IndirectCallPromoter::updateVPtrValueProfiles(
+Instruction *VPtr, VTableGUIDCountsMap &VTableGUIDCounts) {
+  VPtr->setMetadata(LLVMContext::MD_prof, nullptr);
+  std::vector VTableValueProfiles;
+  uint64_t TotalVTableCount = 0;
+  for (auto [GUID, Count] : VTableGUIDCounts) {
+if (Count == 0)
+  continue;
+
+VTableValueProfiles.push_back({GUID, Count});
+TotalVTableCount += Count;
+  }
+  llvm::sort(VTableValueProfiles,
+ [](const InstrProfValueData &LHS, const InstrProfValueData &RHS) {
+   return LHS.Count > RHS.Count;
+ });
+
+  annotateValueSite(M, *VPtr, VTableValueProfiles, TotalVTableCount,
+IPVK_VTableTarget, VTableValueProfiles.size());
+}
+
+bool IndirectCallPromoter::tryToPromoteWithVTableCmp(
+CallBase &CB, Instruction *VPtr,
+const std::vector &Candidates, uint64_t TotalFuncCount,
+uint32_t NumCandidates,
+MutableArrayRef ICallProfDataRef,
+VTableGUIDCountsMap &VTableGUIDCounts) {
+  SmallVector PromotedFuncCount;
+  for (const auto &Candidate : Candidates) {
+uint64_t IfCount = 0;

teresajohnson wrote:

The name IfCount isn't very intuitive to me, suggest using a different name.

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson commented:

I haven't looked at the tests yet, but some comments/questions from a first 
pass through most of the code

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson edited 
https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.

teresajohnson wrote:

I don't quite follow the comment, can you expand it a bit?

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt

teresajohnson wrote:

Add comment

https://github.com/llvm/llvm-project/pull/81442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

2024-06-04 Thread Teresa Johnson via cfe-commits


@@ -103,30 +110,220 @@ static cl::opt
 ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt ICPEnableVTableCmp(
+"icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt
+ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to 
compare"));
+
+static cl::opt ICPNumAdditionalVTableLast(
+"icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+SmallDenseMap, 
8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+SmallDenseMap;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+StringRef CompatibleType) {
+  SmallVector Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+if (auto *TypeId = dyn_cast(Type->getOperand(1).get());
+TypeId && TypeId->getString() == CompatibleType)
+
+  return cast(
+ cast(Type->getOperand(0))->getValue())
+  ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+  Type::getInt8Ty(Context), VTable,
+  llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Use &U, Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast(UserInst))
+return PN->getIncomingBlock(U);
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way 
`DestBB`
+//is dominated by `Inst->getParent()` and we don't need to sink across a
+//critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+User *User = Use.getUser();
+// Do checked cast since IR verifier guarantees that the user of an
+// instruction must be an instruction. See `Verifier::visitInstruction`.
+Instruction *UserInst = cast(User);
+// We can sink debug or pseudo instructions together with Inst.
+if (UserInst->isDebugOrPseudoInst())
+  continue;
+UserBB = getUserBasicBlock(Use, UserInst);
+// Do not sink if Inst is used in a basic block that is not DestBB.
+// TODO: Sink to the common dominator of all user blocks.
+if (UserBB != DestBB)
+  return false;
+  }
+  return UserBB != nullptr;
+}
+
+// For the virtual call dispatch sequence, try to sink vtable load instructions
+// to the cold indirect call fallback.
+static bool tryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) {
+  assert(!I->isTerminator());
+  if (!isDestBBSuitableForSink(I, DestBlock))
+return false;
+
+  assert(DestBlock->getUniquePre

[clang] [clang-tools-extra] [compiler-rt] [lldb] [llvm] [mlir] [openmp] [polly] fix(python): fix comparison to None (PR #91857)

2024-05-15 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

compiler-rt changes lgtm

https://github.com/llvm/llvm-project/pull/91857
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)

2024-02-28 Thread Teresa Johnson via cfe-commits


@@ -1036,7 +1041,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   if (!actionRequiresCodeGen(Action) && CodeGenOpts.VerifyModule)
 MPM.addPass(VerifierPass());
 
-  if (Action == Backend_EmitBC || Action == Backend_EmitLL) {
+  if (Action == Backend_EmitBC || Action == Backend_EmitLL ||
+  CodeGenOpts.FatLTO) {

teresajohnson wrote:

I see, I guess then the check for the CodeGenOpt is needed here to catch this 
case.

https://github.com/llvm/llvm-project/pull/83159
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)

2024-02-28 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/83159
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)

2024-02-28 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson edited 
https://github.com/llvm/llvm-project/pull/83159
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (PR #83159)

2024-02-27 Thread Teresa Johnson via cfe-commits


@@ -1036,7 +1041,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   if (!actionRequiresCodeGen(Action) && CodeGenOpts.VerifyModule)
 MPM.addPass(VerifierPass());
 
-  if (Action == Backend_EmitBC || Action == Backend_EmitLL) {
+  if (Action == Backend_EmitBC || Action == Backend_EmitLL ||
+  CodeGenOpts.FatLTO) {

teresajohnson wrote:

What is the Action type with FatLTO?

https://github.com/llvm/llvm-project/pull/83159
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LTO] Fix fat-lto output for -c -emit-llvm. (PR #79404)

2024-01-25 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/79404
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libcxx] [libc] [clang] [lld] [clang-tools-extra] [flang] [compiler-rt] [llvm] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)

2024-01-23 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/78835
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [clang-tools-extra] [llvm] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)

2024-01-23 Thread Teresa Johnson via cfe-commits


@@ -352,32 +357,49 @@ std::vector BitcodeCompiler::compile() {
 pruneCache(config->thinLTOCacheDir, config->thinLTOCachePolicy, files);
 
   if (!config->ltoObjPath.empty()) {
-saveBuffer(buf[0], config->ltoObjPath);
+saveBuffer(buf[0].second, config->ltoObjPath);
 for (unsigned i = 1; i != maxTasks; ++i)
-  saveBuffer(buf[i], config->ltoObjPath + Twine(i));
-  }
-
-  if (config->saveTempsArgs.contains("prelink")) {
-if (!buf[0].empty())
-  saveBuffer(buf[0], config->outputFile + ".lto.o");
-for (unsigned i = 1; i != maxTasks; ++i)
-  saveBuffer(buf[i], config->outputFile + Twine(i) + ".lto.o");
-  }
-
-  if (config->ltoEmitAsm) {
-saveBuffer(buf[0], config->outputFile);
-for (unsigned i = 1; i != maxTasks; ++i)
-  saveBuffer(buf[i], config->outputFile + Twine(i));
-return {};
+  saveBuffer(buf[i].second, config->ltoObjPath + Twine(i));
   }
 
+  bool savePrelink = config->saveTempsArgs.contains("prelink");
   std::vector ret;
-  for (unsigned i = 0; i != maxTasks; ++i)
-if (!buf[i].empty())
-  ret.push_back(createObjFile(MemoryBufferRef(buf[i], "lto.tmp")));
+  const char *ext = config->ltoEmitAsm ? ".s" : ".o";
+  for (unsigned i = 0; i != maxTasks; ++i) {
+StringRef bitcodeFilePath;
+StringRef objBuf;
+if (files[i]) {

teresajohnson wrote:

Thanks, those are helpful. Can you also add a comment before 373 on meaning of 
files[i] being null vs not - reading through the comments you added it sounds 
like files[i] is non-null when object file was found in the cache?

https://github.com/llvm/llvm-project/pull/78835
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [clang-tools-extra] [llvm] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)

2024-01-23 Thread Teresa Johnson via cfe-commits


@@ -53,10 +53,10 @@
 
 ; RUN: rm -fr cache && mkdir cache
 ; RUN: ld.lld --thinlto-cache-dir=cache --save-temps -o out b.bc a.bc -M | 
FileCheck %s --check-prefix=MAP
-; RUN: ls out1.lto.o a.bc.0.preopt.bc b.bc.0.preopt.bc
+; RUN: ls out.lto.a.o a.bc.0.preopt.bc b.bc.0.preopt.bc
 
-; MAP: llvmcache-{{.*}}:(.text)
-; MAP: llvmcache-{{.*}}:(.text)
+; MAP: out.lto.b.o:(.text)

teresajohnson wrote:

Confused about what changed in the latest commit to cause these lines to need 
updating - was this just a missed test update in the prior commit?

https://github.com/llvm/llvm-project/pull/78835
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [lld] [clang-tools-extra] [clang] [ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names (PR #78835)

2024-01-23 Thread Teresa Johnson via cfe-commits


@@ -46,8 +46,9 @@ class BitcodeCompiler {
 
 private:
   std::unique_ptr ltoObj;
-  std::vector> buf;
+  SmallVector>, 0> buf;

teresajohnson wrote:

ping on this comment.

https://github.com/llvm/llvm-project/pull/78835
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [compiler-rt] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)

2024-01-03 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> I realized one problem during testing IRPGO (thanks again for the suggestion 
> @minglotus-6 !).
> 
> A function's control flow may change between `-fprofile-generate` and 
> `-fprofile-use` when we make use of definitions in the new header. For 
> example, one may have the following code:
> 
> ```c
> #include "profile/instr_prof_interface.h"
> 
> void main() {
>...
>   if (__llvm_profile_dump())
> return error;
>   
>   cleanup();
> }
> ```
> 
> During `-fprofile-generate`, `__llvm_profile_dump` is a declared name and 
> main's control flow includes a branch to `return error;`. During 
> `-fprofile-use`, `__llvm_profile_dump()` is replaced by `(0)` and the 
> frontend eliminates the `if` statement and the branch to `return error`. Such 
> control flow change can lead to PGO warnings (hash mismatch).
> 
> I think it may be OK to keep the PR this way because the new macros can 
> potentially cause control flow changes directly as well. The documentation is 
> updated 
> (https://github.com/llvm/llvm-project/pull/76471/files#diff-7389be311daf0b9b476c876bef04245fa3c0ad9337ce865682174bd77d53b648R2908)
>  to advise against using these APIs in a program's hot regions and warn about 
> possible impact on control flow.
> 
> Do you all think this is reasonable?

That's probably ok, as it doesn't make sense to do dumping etc in a hot region 
of code anyway. An alternate suggestion is to make these functions real 
functions that simply return 0 instead of #defined to 0. But that may not avoid 
the issue described above because early inlining will likely inline and 
simplify the code before IR PGO matching anyway.


https://github.com/llvm/llvm-project/pull/76471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [compiler-rt] [clang] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)

2023-12-27 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> @teresajohnson I mentioned the same thing on 
> [discourse](https://discourse.llvm.org/t/pgo-are-the-llvm-profile-functions-stable-c-apis-across-llvm-releases/75832/5)
>  but it seems like linking on AIX does not support this model.

I see. Perhaps instead of defining these away completely, they should be 
defined to a dummy function with the same signature that always returns success 
(0 I think)? Alternatively, not define them at all when 
__LLVM_INSTR_PROFILE_GENERATE not defined and have the user guard calls by 
`#ifdef __LLVM_INSTR_PROFILE_GENERATE`. Otherwise, the user either cannot check 
the return values, or they have to guard their usage by a check of 
__LLVM_INSTR_PROFILE_GENERATE anyway.

https://github.com/llvm/llvm-project/pull/76471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang] [clang-tools-extra] [PGO] Exposing PGO's Counter Reset and File Dumping APIs (PR #76471)

2023-12-27 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

The way we have done this in the past is to declare these as weak symbols and 
check if they exist before calling. E.g.:

```
extern "C" __attribute__((weak)) int __llvm_profile_dump(void);

if (__llvm_profile_dump)
  if (__llvm_profile_dump() != 0) { ...
```

Not necessarily opposed to adding the macros etc, but the advantage of the weak 
symbol approach vs what you have here is that the user code can check the 
return values.

https://github.com/llvm/llvm-project/pull/76471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] DiagnosticHandler: refactor error checking (PR #75889)

2023-12-19 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/75889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] DiagnosticHandler: refactor error checking (PR #75889)

2023-12-19 Thread Teresa Johnson via cfe-commits


@@ -256,10 +256,13 @@ void LLVMContext::diagnose(const DiagnosticInfo &DI) {
   RS->emit(*OptDiagBase);
 
   // If there is a report handler, use it.
-  if (pImpl->DiagHandler &&
-  (!pImpl->RespectDiagnosticFilters || isDiagnosticEnabled(DI)) &&
-  pImpl->DiagHandler->handleDiagnostics(DI))
-return;
+  if (pImpl->DiagHandler) {
+if (DI.getSeverity() == DS_Error)
+  pImpl->DiagHandler->HasErrors = true;

teresajohnson wrote:

I didn't mean calling handle Diagnostics in more places. I just meant rather 
than directly setting the HasErrors flags here, do that in a new non-virtual 
method (e.g. prepareToHandleDiagnostics() or whatever), and call it here just 
before calling handle Diagnostics. To abstract away what it is actually doing 
and leave the setting of the flag to the DiagnosticHandler class.

https://github.com/llvm/llvm-project/pull/75889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] DiagnosticHandler: refactor error checking (PR #75889)

2023-12-19 Thread Teresa Johnson via cfe-commits


@@ -256,10 +256,13 @@ void LLVMContext::diagnose(const DiagnosticInfo &DI) {
   RS->emit(*OptDiagBase);
 
   // If there is a report handler, use it.
-  if (pImpl->DiagHandler &&
-  (!pImpl->RespectDiagnosticFilters || isDiagnosticEnabled(DI)) &&
-  pImpl->DiagHandler->handleDiagnostics(DI))
-return;
+  if (pImpl->DiagHandler) {
+if (DI.getSeverity() == DS_Error)
+  pImpl->DiagHandler->HasErrors = true;

teresajohnson wrote:

Perhaps move the set of HasErrors into a new non-virtual method that we call 
below just before calling handleDiagnostics.

Also, should this be set when handleDiagnostics is not called?

https://github.com/llvm/llvm-project/pull/75889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [llvm] [clang] [LTO] Improve diagnostics handling when parsing module-level inline assembly (PR #75726)

2023-12-18 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/75726
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] Disable PGO instrumentation on naked function (PR #75224)

2023-12-12 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/75224
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] Disable PGO instrumentation on naked function (PR #75224)

2023-12-12 Thread Teresa Johnson via cfe-commits


@@ -892,6 +892,10 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, 
QualType RetTy,
 }
   }
 
+  if (FD->hasAttr()) {

teresajohnson wrote:

Is this change needed given the separate change in LLVM skipPGOGen?

https://github.com/llvm/llvm-project/pull/75224
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang] [clang-tools-extra] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-11 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

LGTM other than a couple of minor comments and pending resolution of the LLVM 
IR test. Thanks!

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang] [clang-tools-extra] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-11 Thread Teresa Johnson via cfe-commits


@@ -352,6 +366,8 @@ std::string getIRPGOFuncName(const Function &F, bool InLTO) 
{
   return getIRPGOObjectName(F, InLTO, getPGOFuncNameMetadata(F));
 }
 
+// DEPRECATED. Use `getIRPGOFuncName`for new code. See that function for

teresajohnson wrote:

In the header it is described as for FE instrumentation. Probably want the 
comments to be consistent?

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-11 Thread Teresa Johnson via cfe-commits


@@ -0,0 +1,115 @@
+// This is a regression test for ThinLTO indirect-call-promotion when candidate
+// callees need to be imported from another IR module.  In the C++ test case,
+// `main` calls `global_func` which is defined in another module. `global_func`
+// has two indirect callees, one has external linkage and one has local 
linkage.
+// All three functions should be imported into the IR module of main.
+
+// What the test does:
+// - Generate raw profiles from executables and convert it to indexed profiles.
+//   During the conversion, a profiled callee address in raw profiles will be
+//   converted to function hash in indexed profiles.
+// - Run IRPGO profile use and ThinTLO prelink pipeline and get LLVM bitcodes
+//   for both cpp files in the C++ test case.
+// - Generate ThinLTO summary file with LLVM bitcodes, and run 
`function-import` pass.
+// - Run `pgo-icall-prom` pass for the IR module which needs to import callees.
+
+// Use lld as linker for more robust test. We need to REQUIRE LLVMgold.so for
+// LTO if default linker is GNU ld or gold anyway.
+// REQUIRES: lld-available
+
+// Test should fail where linkage-name and mangled-name diverges, see issue 
https://github.com/llvm/llvm-project/issues/74565).
+// Currently, this name divergence happens on Mach-O object file format, or on
+// many (but not all) 32-bit Windows systems.
+//
+// XFAIL: system-darwin
+//
+// Mark 32-bit Windows as UNSUPPORTED for now as opposed to XFAIL. This test
+// should fail on many (but not all) 32-bit Windows systems and succeed on the
+// rest. The flexibility in triple string parsing makes it tricky to capture
+// both sets accurately. i[3-9]86 specifies arch as Triple::ArchType::x86, 
(win32|windows)
+// specifies OS as Triple::OS::Win32
+//
+// UNSUPPORTED: target={{i[3-9]86-.*-(win32|windows).*}}
+
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+// Do setup work for all below tests.
+// Generate raw profiles from real programs and convert it into indexed 
profiles.
+// Use clangxx_pgogen for IR level instrumentation for C++.
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 lib.cpp main.cpp -o main
+// RUN: env LLVM_PROFILE_FILE=main.profraw %run ./main
+// RUN: llvm-profdata merge main.profraw -o main.profdata
+
+// Use profile on lib and get bitcode, test that local function callee0 has
+// expected !PGOFuncName metadata and external function callee1 doesn't have
+// !PGOFuncName metadata. Explicitly skip ICP pass to test ICP happens as
+// expected in the IR module that imports functions from lib.
+// RUN: %clang -mllvm -disable-icp -fprofile-use=main.profdata -flto=thin -O2 
-c lib.cpp -o lib.bc
+// RUN: llvm-dis lib.bc -o - | FileCheck %s --check-prefix=PGOName
+
+// Use profile on main and get bitcode.
+// RUN: %clang -fprofile-use=main.profdata -flto=thin -O2 -c main.cpp -o 
main.bc
+
+// Run llvm-lto to get summary file.
+// RUN: llvm-lto -thinlto -o summary main.bc lib.bc
+
+// Test the imports of functions. Default import thresholds would work but do
+// explicit override to be more futureproof. Note all functions have one basic
+// block with a function-entry-count of one, so they are actually hot functions
+// per default profile summary hotness cutoff.
+// RUN: opt -passes=function-import -import-instr-limit=100 
-import-cold-multiplier=1 -summary-file summary.thinlto.bc main.bc -o 
main.import.bc -print-imports 2>&1 | FileCheck %s --check-prefix=IMPORTS
+// Test that '_Z11global_funcv' has indirect calls annotated with value 
profiles.
+// RUN: llvm-dis main.import.bc -o - | FileCheck %s --check-prefix=IR
+
+// Test that both candidates are ICP'ed and there is no `!VP` in the IR.
+// RUN: opt main.import.bc -icp-lto -passes=pgo-icall-prom -S 
-pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s 
--check-prefixes=ICP-IR,ICP-REMARK --implicit-check-not="!VP"
+
+// IMPORTS: main.cpp: Import _Z7callee1v
+// IMPORTS: main.cpp: Import _ZL7callee0v.llvm.[[#]]
+// IMPORTS: main.cpp: Import _Z11global_funcv
+
+// PGOName: define dso_local void @_Z7callee1v() {{.*}} !prof ![[#]] {
+// PGOName: define internal void @_ZL7callee0v() {{.*}} !prof ![[#]] 
!PGOFuncName ![[#MD:]] {
+// PGOName: ![[#MD]] = !{!"{{.*}}lib.cpp;_ZL7callee0v"}
+
+// IR-LABEL: define available_externally {{.*}} void @_Z11global_funcv() 
{{.*}} !prof ![[#]] {
+// IR-NEXT: entry:
+// IR-NEXT:  %0 = load ptr, ptr @calleeAddrs
+// IR-NEXT:  tail call void %0(), !prof ![[#PROF1:]]
+// IR-NEXT:  %1 = load ptr, ptr getelementptr inbounds ([2 x ptr], ptr 
@calleeAddrs,
+// IR-NEXT:  tail call void %1(), !prof ![[#PROF2:]]
+
+// The GUID of indirect callee is the MD5 hash of 
`/path/to/lib.cpp:_ZL7callee0v`

teresajohnson wrote:

should this have the semicolon separator not a colon?

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-11 Thread Teresa Johnson via cfe-commits


@@ -1,39 +0,0 @@
-; Do setup work for all below tests: generate bitcode and combined index

teresajohnson wrote:

Without use of the raw profile, this test would not have caught the regression. 
If we think the new compiler-rt test is enough to catch this case in the 
future, then perhaps we don't need this LLVM test at all (there should already 
be some ICP tests). But if the compiler-rt test is not enough, because some 
people don't run those, then this should stay and use a raw profile as input.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][fatlto] Don't set ThinLTO module flag with FatLTO (PR #75079)

2023-12-11 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

Added a comment to that issue, I think it would be good to understand why 
unified LTO is not expected in that case (for the assertion).

https://github.com/llvm/llvm-project/pull/75079
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang-tools-extra] [llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-08 Thread Teresa Johnson via cfe-commits


@@ -1,39 +1,45 @@
-; Do setup work for all below tests: generate bitcode and combined index
-; RUN: opt -module-summary %s -o %t.bc
-; RUN: opt -module-summary %p/Inputs/thinlto_indirect_call_promotion.ll -o 
%t2.bc
+; The raw profiles and reduced IR inputs are generated from 
Inputs/update_icall_promotion_inputs.sh

teresajohnson wrote:

The reason for having this in a single test this way is specifically to detect 
mismatches in the separator used in the PGO name and in the ThinLTO index. For 
this we need to convert from raw profile and feed it through ThinLTO+ICP. 

Having separate tests would not do this. I.e. with the change from ':' to ';', 
the compiler-rt test would presumably have failed and would be updated, but the 
test using the proftext input would not fail. The lack of testing these pieces 
together led to the prior separator rename not updating all the necessary 
compiler bits.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [clang-tools-extra] [clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-07 Thread Teresa Johnson via cfe-commits


@@ -1,39 +1,45 @@
-; Do setup work for all below tests: generate bitcode and combined index
-; RUN: opt -module-summary %s -o %t.bc
-; RUN: opt -module-summary %p/Inputs/thinlto_indirect_call_promotion.ll -o 
%t2.bc
+; The raw profiles and reduced IR inputs are generated from 
Inputs/update_icall_promotion_inputs.sh

teresajohnson wrote:

No because the point of this test is to catch regressions if the separator 
character (which is hardcoded in the proftext) changes again in the future.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [clang] [compiler-rt] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-06 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> David says the itanium remapper file was only used once during gcc to llvm 
> transition, so not relevant here.

I believe it was actually for the libstdc++ to libc++ transition (see 
https://reviews.llvm.org/D51247 and https://reviews.llvm.org/D51240).

If it is broken we'll at least want to add a FIXME there. 

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-06 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> Using the same added ICP test, profile matching on local-linkage 
> `_ZL7callee0v` using `clang++ -v -fuse-ld=lld -O2 
> -fprofile-use=thinlto_icall_prom.profdata ` , as 
> [this](https://gist.github.com/minglotus-6/11817ba645c6b12cd7116f41bfb1185e) 
> pgo-instr-use output shows.

Can you clarify what you are saying here - is it working or not working?

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-05 Thread Teresa Johnson via cfe-commits


@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO,
 GlobalValue::LinkageTypes Linkage,
 StringRef FileName) {
   SmallString<64> Name;
-  if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
-Name.append(FileName.empty() ? "" : FileName);
-Name.append(";");
-  }
   Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);

teresajohnson wrote:

> It wasn't broken in general, but it's needed to get -order_file working 
> correctly.

Unfortunately this change broke aspects of ThinLTO ICP, however. Is it possible 
to change the handling of -order_file in the linker to modify the symbol names 
appropriately?

> To avoid subtle issues when linkage-name is different from mangled names,,I'm 
> wondering if it warrants a change to use linkage-names (as opposed to mangled 
> name) in GlobalValue::getGlobalIdentifier in this PR.

If the -order_name handling cannot be fixed as I suggested above, then yes, we 
need some solution to ensure that the hashed symbol names are consistent across 
PGO and ThinLTO.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO,
 GlobalValue::LinkageTypes Linkage,
 StringRef FileName) {
   SmallString<64> Name;
-  if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
-Name.append(FileName.empty() ? "" : FileName);
-Name.append(";");
-  }
   Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);

teresajohnson wrote:

Was PGO broken in general before D156569? Or is this specific to 
`-symbol-ordering-file` and `-order_file`?

Also, it looks like some of the main changes here by the mangler for objective 
C are to strip the "\01" prefix. Is this different than the support to remove 
the '\1' mangling escape in getGlobalIdentifier()?

Trying to figure out how we keep these interfaces in sync to avoid more issues 
with ICP...

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO,
 GlobalValue::LinkageTypes Linkage,
 StringRef FileName) {
   SmallString<64> Name;
-  if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
-Name.append(FileName.empty() ? "" : FileName);
-Name.append(";");
-  }
   Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);

teresajohnson wrote:

Can you give an example of where this is called such that there is a difference 
between the function name and linkage name (at least for c++ these are the same 
afaik), and what that difference looks like? Are there specific callsites used 
for the symbol ordering support that need the mangling?

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -246,11 +246,27 @@ std::string InstrProfError::message() const {
 
 char InstrProfError::ID = 0;
 
-std::string getPGOFuncName(StringRef RawFuncName,
-   GlobalValue::LinkageTypes Linkage,
+std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage,

teresajohnson wrote:

I guess it depends on whether it is intentional that Clang (and Swift 
apparently?) use the old name.

@ellishg was there a reason to leave that as is?

If they must use the old one then maybe getFEPGOFuncName. If they should be 
transitioned eventually then using "Legacy" makes sense.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -246,11 +246,27 @@ std::string InstrProfError::message() const {
 
 char InstrProfError::ID = 0;
 
-std::string getPGOFuncName(StringRef RawFuncName,
-   GlobalValue::LinkageTypes Linkage,
+std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage,

teresajohnson wrote:

Oh I missed the fact that one getPGOFuncName interface was left. I am only 
seeing 2 invocations of that interface outside of the unittest test. I would be 
in favor of doing renames of both getPGOFuncName interfaces in one patch. A 
separate NFC patch is a fine option, and keeps this patch just about fixing the 
ICP breakage caused by the delimiter change.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO,
 GlobalValue::LinkageTypes Linkage,
 StringRef FileName) {
   SmallString<64> Name;
-  if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
-Name.append(FileName.empty() ? "" : FileName);
-Name.append(";");
-  }
   Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);

teresajohnson wrote:

What I'm confused about is that the function name is already the mangled name, 
at least using clang with c++. Is it not for objective C?

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -246,11 +246,27 @@ std::string InstrProfError::message() const {
 
 char InstrProfError::ID = 0;
 
-std::string getPGOFuncName(StringRef RawFuncName,
-   GlobalValue::LinkageTypes Linkage,
+std::string getPGOFuncName(StringRef Name, GlobalValue::LinkageTypes Linkage,

teresajohnson wrote:

Is this resolved? It looks like the most recent version of the patch changes 
getPGOFuncName to getLegacyPGOFuncName for the old format, and uses 
getIRPGOFuncName for the new format.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (PR #74008)

2023-12-01 Thread Teresa Johnson via cfe-commits


@@ -300,12 +316,8 @@ getIRPGONameForGlobalObject(const GlobalObject &GO,
 GlobalValue::LinkageTypes Linkage,
 StringRef FileName) {
   SmallString<64> Name;
-  if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
-Name.append(FileName.empty() ? "" : FileName);
-Name.append(";");
-  }
   Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);

teresajohnson wrote:

@ellishg How much of D156569 relied on the invocation of Mangler? It is not 
mentioned in the patch description, only the rationale for changing ":" to ";". 
The problem is if these are out of sync, then cross-module importing of 
indirectly called local functions will continue to be broken in whatever cases 
Mangler().getNameWithPrefix affects.

https://github.com/llvm/llvm-project/pull/74008
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)

2023-11-29 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

@snehasish and I chatted more about this offline. Using the dwarf info to 
figure out the right source name prefix during indexing is not straightforward. 
The source file name used as the prefix in the compiler for the IRPGOName is 
that of the TU. The source file of the line in the dwarf info may be a header. 
It could be possible to walk up the profiled inline frames to find a source 
file, but that is error prone in certain circumstances (e.g. if the profiled 
binary had thinlto cross module inlining, and if it isn't obvious which source 
file name is a non-included TU).

I think your change to the matching is almost strictly better than the complete 
non-matching we get currently for local functions. It would even work in many 
cases without uniq suffixes, but uniq suffixes will make that work even better. 
So let's go ahead with that change.

However, we'd like to request that you split the __cxx_global_var_init change 
into a separate patch, as it is somewhat orthogonal to the matching change, and 
in case there are unexpected issues with that change. Can you split that out 
and make the test case here for matching of a non __cxx_global_var_init 
function?

Thanks!

https://github.com/llvm/llvm-project/pull/73236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)

2023-11-29 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> > Yes, you're right. As an alternative can we use the symbol table and find 
> > Bind = LOCAL to add the prefix before hashing?
> 
> If we choose this method. I think we can't deal with the situation which one 
> symbol is not local linkage type in thin compile, but will be converted to 
> local linkage after thin backend by thinlto's internalization. In this 
> situation function name in llvm-profdata will have prefix with file name. But 
> when llvm consumes memory profile, PGOFuncName won't return function name 
> with prefix.

If I understand the issue you are describing, that would only occur if the 
instrumentation build also used ThinLTO. Is that a typical use case for you?

https://github.com/llvm/llvm-project/pull/73236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-28 Thread Teresa Johnson via cfe-commits


@@ -1530,14 +1530,11 @@ 
PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level,
 }
 
 ModulePassManager
-PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level, bool ThinLTO,
-bool EmitSummary) {
+PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level) {
   ModulePassManager MPM;
-  MPM.addPass(EmbedBitcodePass(ThinLTO, EmitSummary,
-   ThinLTO
-   ? buildThinLTOPreLinkDefaultPipeline(Level)
-   : buildLTOPreLinkDefaultPipeline(Level)));
-  MPM.addPass(buildPerModuleDefaultPipeline(Level));
+  MPM.addPass(buildThinLTOPreLinkDefaultPipeline(Level));
+  MPM.addPass(EmbedBitcodePass());
+  MPM.addPass(buildThinLTODefaultPipeline(Level, /*ImportSummary=*/nullptr));

teresajohnson wrote:

What was the downside of using ThinLTODefaultPipeline always? I guess it was 
essentially over-optimizing in the non-LTO case? I guess using the 
ThinLTODefaultPipeline only under SamplePGO is ok with me, although it seems 
like over time as the pipelines get modified it will be difficult to ensure 
that is the only case where the pipelines get out of sync. I think in either 
case we are essentially saying: if you use Fat LTO then don't expect the 
resulting non-LTO native code to be the same as that of a fully non-LTO 
compile. In one case there is more optimization, in the other there is the risk 
of future deoptimization if things don't stay in sync.

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-28 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-27 Thread Teresa Johnson via cfe-commits


@@ -1861,6 +1861,13 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions 
&Opts, ArgList &Args,
 if (Args.hasArg(OPT_funified_lto))
   Opts.PrepareForThinLTO = true;
   }
+  if (Arg *A = Args.getLastArg(options::OPT_ffat_lto_objects,
+   options::OPT_fno_fat_lto_objects)) {
+if (!Args.hasArg(OPT_funified_lto))
+  Diags.Report(diag::err_drv_incompatible_options)

teresajohnson wrote:

It might be less confusing to users if this error message is only given upon an 
explicit -fno-unified-lto, and diag::err_drv_argument_only_allowed_with is used 
for the lack of -funified-lto.

Also can you add driver tests to check that we get the expected error(s) in the 
expected option combinations?

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)

2023-11-27 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> @lifengxiang1025 thanks for flagging this issue. I think it's best to not 
> rely on unique-internal-linkage-name here. Instead we should extend the logic 
> in RawMemProfReader.cpp to include "filename;" if the function is internal 
> linkage as expected by IRPGOFuncName. Can you try this approach?

I don't think we want to change the name in the frames themselves, as the names 
in the debug info when matching don't contain this prefix. Although I suppose 
we could modify the matcher to add the prefixes when matching frames too. I.e. 
here: 
https://github.com/llvm/llvm-project/blob/c0fe0719df457b0992606b0f2a235719a5bbfd54/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp#L806

It sounds like the issue is that the names used to index the memprof records do 
not use the PGOFuncName scheme and therefore we never find any memprof records 
for an internal linkage function during matching. I.e. the GUID passed to 
InstrProfWriter::addMemProfRecord. Unfortunately, looking at 
RawMemProfReader.cpp it does look like that eventually gets populated out of 
the Function (GUID) field from same Frame setup code. So we'd either need to do 
some extra handling in that code so that the prefix is only used for the record 
entry, or change the matcher (the latter may be the easiest).

https://github.com/llvm/llvm-project/pull/73236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-27 Thread Teresa Johnson via cfe-commits


@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
   PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;

teresajohnson wrote:

Actually, I believe the cc1 parsing is handled by ParseCodeGenArgs in 
clang/lib/Frontend/CompilerInvocation.cpp. You can potentially add something in 
there to ensure UnifiedLTO is set with FatLTO? Or give your error there that 
UnifiedLTO must be specified with Fat LTO.

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-27 Thread Teresa Johnson via cfe-commits


@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
   PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;

teresajohnson wrote:

Ah ok. I guess that would be expected. Whether it is test friendly is another 
question, but in general the driver does a lot of option set up for the cc1 
invocation.

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-27 Thread Teresa Johnson via cfe-commits


@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
   PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;

teresajohnson wrote:

Not sure I follow. Isn't the change in Driver/Toolchains/Clang.cpp going to 
ensure that -funified-lto is also passed to the cc1 invocation, and thus both 
options should be set in CodeGenOpts during the cc1 invocation?

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][llvm][fatlto] Avoid cloning modules in FatLTO (PR #72180)

2023-11-27 Thread Teresa Johnson via cfe-commits


@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
   PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;

teresajohnson wrote:

Won't your change to lib/Driver/ToolChains/Clang.cpp below mean that 
CodeGenOpts.UnifiedLTO will always be set with FatLTO? Do we thus need this 
change or the one further below for the module flag (and maybe add an assert 
that UnifiedLTO is set of FatLTO is set)?

https://github.com/llvm/llvm-project/pull/72180
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [MemProf] Expand optimization scope to internal linakge function (PR #73236)

2023-11-27 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

@snehasish can you take a look at the issue described here? Should we be doing 
something different in llvm-profdata?

https://github.com/llvm/llvm-project/pull/73236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] Support VFE in thinLTO (PR #69735)

2023-11-06 Thread Teresa Johnson via cfe-commits


@@ -338,12 +574,33 @@ PreservedAnalyses GlobalDCEPass::run(Module &M, 
ModuleAnalysisManager &MAM) {
 
   // The second pass drops the bodies of functions which are dead...
   std::vector DeadFunctions;
-  for (Function &F : M)
+  std::set DeadFunctionsSet;
+  auto funcRemovedInIndex = [&](Function *F) -> bool {
+if (!ImportSummary)
+  return false;
+auto &vfuncsRemoved = ImportSummary->funcsToBeRemoved();
+// If function is internalized, its current GUID will be different
+// from the GUID in funcsToBeRemoved. Treat it as a global and search
+// again.
+if (vfuncsRemoved.count(F->getGUID()))
+  return true;
+if (vfuncsRemoved.count(GlobalValue::getGUID(F->getName(

teresajohnson wrote:

These lookups are a little dangerous as I think there are cases where we might 
rename and it wouldn't find the right function. For internalization we tag the 
function with an attribute before any thinlto renaming, linkage changes, or 
other optimization passes. See 
https://github.com/llvm/llvm-project/blob/0936a718492d248a726b8e8d4529e4aa194bc8e9/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp#L271.

https://github.com/llvm/llvm-project/pull/69735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] Support VFE in thinLTO (PR #69735)

2023-11-06 Thread Teresa Johnson via cfe-commits


@@ -34,12 +40,223 @@ static cl::opt
 ClEnableVFE("enable-vfe", cl::Hidden, cl::init(true),
 cl::desc("Enable virtual function elimination"));
 
+static cl::opt ClReadSummary(
+"globaldce-read-summary",
+cl::desc("Read summary from given bitcode before running pass"),
+cl::Hidden);
+
 STATISTIC(NumAliases  , "Number of global aliases removed");
 STATISTIC(NumFunctions, "Number of functions removed");
 STATISTIC(NumIFuncs,"Number of indirect functions removed");
 STATISTIC(NumVariables, "Number of global variables removed");
 STATISTIC(NumVFuncs,"Number of virtual functions removed");
 
+namespace llvm {
+
+// Returning a representative summary for the vtable, also set isSafe.
+static const GlobalVarSummary *
+getVTableFuncsForTId(const TypeIdOffsetVtableInfo &P, bool &isSafe) {
+  // Find a representative copy of the vtable initializer.
+  const GlobalVarSummary *VS = nullptr;
+  bool LocalFound = false;
+  for (auto &S : P.VTableVI.getSummaryList()) {
+if (GlobalValue::isLocalLinkage(S->linkage())) {
+  if (LocalFound) {
+isSafe = false;
+return nullptr;
+  }
+  LocalFound = true;
+}
+auto *CurVS = cast(S->getBaseObject());
+// Ignore if vTableFuncs is empty and vtable is available_externally.
+if (!CurVS->vTableFuncs().empty() ||
+!GlobalValue::isAvailableExternallyLinkage(S->linkage())) {
+  VS = CurVS;
+  if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic) {
+isSafe = false;
+return VS;
+  }
+}
+  }
+
+  if (!VS) {
+isSafe = false;
+return nullptr;
+  }
+  if (!VS->isLive()) {
+isSafe = true;
+return nullptr;
+  }
+  isSafe = true;
+  return VS;
+}
+
+static void collectSafeVTables(
+ModuleSummaryIndex &Summary,
+DenseMap> &NameByGUID,
+std::map> &VFESafeVTablesAndFns) {
+  // Update VFESafeVTablesAndFns with information from summary.
+  for (auto &P : Summary.typeIdCompatibleVtableMap()) {
+NameByGUID[GlobalValue::getGUID(P.first)].push_back(P.first);
+LLVM_DEBUG(dbgs() << "TId " << GlobalValue::getGUID(P.first) << " "
+  << P.first << "\n");
+  }
+  llvm::errs() << "VFEThinLTO number of TIds: " << NameByGUID.size() << "\n";
+
+  // VFESafeVTablesAndFns: map from VI for vTable to VI for vfunc
+  std::map> vFuncSet;
+  unsigned numSafeVFuncs = 0;
+  // Collect stats for VTables (safe, public-visibility, other).
+  std::set vTablePublicVis;
+  std::set vTableOther;
+  for (auto &TidSummary : Summary.typeIdCompatibleVtableMap()) {
+for (const TypeIdOffsetVtableInfo &P : TidSummary.second) {
+  LLVM_DEBUG(dbgs() << "TId-vTable " << TidSummary.first << " "
+<< P.VTableVI.name() << " " << P.AddressPointOffset
+<< "\n");
+  bool isSafe = false;
+  const GlobalVarSummary *VS = getVTableFuncsForTId(P, isSafe);
+  if (!isSafe && VS)
+vTablePublicVis.insert(P.VTableVI);
+  if ((isSafe && !VS) || (!isSafe && !VS))
+vTableOther.insert(P.VTableVI);
+  if (!isSafe || !VS) {
+continue;
+  }
+
+  // Go through VS->vTableFuncs
+  for (auto VTP : VS->vTableFuncs()) {
+if (vFuncSet.find(P.VTableVI) == vFuncSet.end() ||
+!vFuncSet[P.VTableVI].count(VTP.FuncVI.getGUID())) {
+  VFESafeVTablesAndFns[P.VTableVI].push_back(VTP);
+  LLVM_DEBUG(dbgs()
+ << "vTable " << P.VTableVI.name() << " "
+ << VTP.FuncVI.name() << " " << VTP.VTableOffset << "\n");
+  ++numSafeVFuncs;
+}
+vFuncSet[P.VTableVI].insert(VTP.FuncVI.getGUID());
+  }
+}
+  }
+  llvm::errs() << "VFEThinLTO number of vTables: " << vFuncSet.size() << " "
+   << vTablePublicVis.size() << " " << vTableOther.size() << "\n";
+  llvm::errs() << "VFEThinLTO numSafeVFuncs: " << numSafeVFuncs << "\n";
+}
+
+static void checkVTableLoadsIndex(
+ModuleSummaryIndex &Summary,
+DenseMap> &NameByGUID,
+std::map> &VFESafeVTablesAndFns,
+std::map> &VFuncsAndCallers) {
+  // Go through Function summarys for intrinsics, also funcHasNonVTableRef to
+  // erase entries from VFESafeVTableAndFns.

teresajohnson wrote:

why and when are entries being erased? More qualitative comments about what 
this algorithm is doing would be helpful.

https://github.com/llvm/llvm-project/pull/69735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   >