[clang-tools-extra] Bfi precision (PR #66285)

Matthias Braun via cfe-commits Fri, 27 Oct 2023 14:54:32 -0700

MatzeB wrote:

And digging even deeper:
- FWIW I noticed that I only used `clang -c` as benchmark previously, should 
have used `clang -c -O3` resulting in this:
```
Old-BFI:          insn: 37,821,687,947  (baseline)
New-BFI:          insn: 38,133,312,923  +0.82%
Old-BFI, no-cold: insn: 37,423,365,184  -1.05%
New-BFI, no-cold: insn: 37,438,736,713  -1.01%
```
- The problematic part of the code that is inlined/not-inlined is actually 
marked as `LLVM_UNLIKELY` here: 
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607
  and intuitively one would think that it is probably best to not inline `grow` 
 (as will happen with the new BFI version)
- In practice not-inlining `grow` mostly ends up being worse because of 
knock-on effects as far as I can tell. These are the inlining decisions I 
noticed for the most prominent situation:
    - `grow` inlined (Old-BFI):
         - Instruction::getMetadataImpl
             -> Value::getMetadata           not inlined
             -> DenseMap::operator[]         inlined
             -> DenseMap::FindAndConstruct   inlined
             -> DenseMap::InsertIntoBucket   not inlined, size
                   likely too big with `grow` inlined here
             -> DenseMap::grow               inlined
    - `grow` inlined (New-BFI):
         - Instruction::getMadataImpl
             -> Value::getMetadata          inlined
             -> DenseMap::operator[]        inlined
             -> DenseMap::FindAndConstruct  not inlined, size
             -> DenseMap::InsertIntoBucket  inlined
             -> DenseMap::grow              not inlined
             
Though if you look into this then I would state that the code got better for 
the wrong reasons! Not inlining `grow` is a sensible decision in this context 
and the `LLVM_UNLIKELY` annotation makes sense (I actually added some counters 
and see the unlikely branch taken somewhere between 1% and 10% of the cases 
depending on inputs, so seems sensible).


Unfortunately the particular code here `getMetadataImpl` never inserts new 
things into the map, but unfortunately `operator[]` gives you that behavior by 
default so nobody noticed. So not inlining `InsertIntoBucket` happened to be a 
great decision previously that the compiler did by accident without having good 
data to support this. Now with better but still insufficient data (as this is 
PGO) we happen to make the decisions that ends up being worse.

Long story short: This ended up another of the many stories where the compiler 
cannot make the right inlining decisions without having actual runtime data. It 
tends to make a better decision based on the data that ends up being wrong 
anyway here. I'm gonna leave things as is and rather put up some improvements 
to LLVM code instead!

https://github.com/llvm/llvm-project/pull/66285
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] Bfi precision (PR #66285)

Reply via email to