[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

Wei Mi via Phabricator via cfe-commits Thu, 14 Jan 2021 11:12:29 -0800

wmi added inline comments.


================
Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType
----------------
hoy wrote:
> wmi wrote:
> > hoy wrote:
> > > hoy wrote:
> > > > wmi wrote:
> > > > > The bits in discriminator is a scare resource. Have you considered 
> > > > > using less bits to represent probe distribution factor? I guess it is 
> > > > > possible that using a little more coarse grain distribution factor 
> > > > > won't affect performance.
> > > > That's a good point. We are using seven bits to represent [0, 100] so 
> > > > that integral numbers can be distinguished. Yes, we could use fewer 
> > > > bits to represent, say 4 bits to represent only even numbers. We could 
> > > > also not use any bits here but instead use the distribution factor of 
> > > > the outer block probes when the competition of those bits are high. I 
> > > > can do an experiment to see how well that works.
> > > On a second thought, using the distribution factor of block probes for 
> > > call probe may not work well since a callsite may be surrounded by more 
> > > than one block probes. 
> > > 
> > > We could use also fewer bits like 6 bits to encode even numbers in the 
> > > range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did a 
> > > profile quality measurement with the even number encoding. It's OK 
> > > overall except for two SPEC benchmarks. I guess it's a trade-off we'll 
> > > have to take when there's a competition on those bits. 
> > Could you elaborate a little bit about the case that a callsite is 
> > surrounded by more than one block probe? Is it because bb merge like in cfg 
> > simplification?
> Yes, block merge in cfg simplification is a good example. Inlining can also 
> end up with callee code and caller code in one block. Jump threading or other 
> cfg optimizations that convert a conditional jump into an unconditional jump 
> can result in block merge too.
> 
> So far our way to track block weight for blocks with multiple probes is to 
> take the maximum count out of those probes. When it comes to tracking 
> callsite count, it is handy and accurate to attach a dedicated distribution 
> factor for each individual call. For example, when a call is inlined, the 
> inlinee's probes will be cloned into the caller, and they will be prorated 
> based on the callsite's dedicated distribution factor.
Actually, I think we may be able to extend Discriminator and 
PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to 
follow Dwarf standard about how many bits Discrminator is going to occupy. But 
inside compiler, Discriminator is represented as MetaData so it hasn't to be 
32bits. For example, we can extend Discriminator MetaData to be 64bits or even 
larger and specify only lower 32bits will be actually emitted into Dwarf 
section. For intermediate information like distribution factors, we can put it 
into the higher bits. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

Reply via email to