david-xl wrote: > And digging even deeper: > > * FWIW I noticed that I only used `clang -c` as benchmark previously, should > have used `clang -c -O3` resulting in this: > > ``` > Old-BFI: insn: 37,821,687,947 (baseline) > New-BFI: insn: 38,133,312,923 +0.82% > Old-BFI, no-cold: insn: 37,423,365,184 -1.05% > New-BFI, no-cold: insn: 37,438,736,713 -1.01% > ``` > > * The problematic part of the code that is inlined/not-inlined is actually > marked as `LLVM_UNLIKELY` here: > https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607 > and intuitively one would think that it is probably best to not inline > `grow` (as will happen with the new BFI version) > * In practice not-inlining `grow` mostly ends up being worse because of > knock-on effects as far as I can tell. These are the inlining decisions I > noticed for the most prominent situation: > > ``` > - `grow` inlined (Old-BFI): > - Instruction::getMetadataImpl > -> Value::getMetadata not inlined > -> DenseMap::operator[] inlined > -> DenseMap::FindAndConstruct inlined > -> DenseMap::InsertIntoBucket not inlined, size > likely too big with `grow` inlined here > -> DenseMap::grow inlined > - `grow` inlined (New-BFI): > - Instruction::getMadataImpl > -> Value::getMetadata inlined > -> DenseMap::operator[] inlined > -> DenseMap::FindAndConstruct not inlined, size > -> DenseMap::InsertIntoBucket inlined > -> DenseMap::grow not inlined > ``` > > Though if you look into this then I would state that the code got better for > the wrong reasons! Not inlining `grow` is a sensible decision in this context > and the `LLVM_UNLIKELY` annotation makes sense (I actually added some > counters and see the unlikely branch taken somewhere between 1% and 10% of > the cases depending on inputs, so seems sensible). > > Unfortunately the particular code here `getMetadataImpl` never inserts new > things into the map, but unfortunately `operator[]` gives you that behavior > by default so nobody noticed. So not inlining `InsertIntoBucket` happened to > be a great decision previously that the compiler did by accident without > having good data to support this. Now with better but still insufficient data > (as this is PGO) we happen to end up inlining `InsertIntoBucket` wasting code > size leading to worse inlining decisions further up the stack... > > Long story short: This ended up another of the many stories where the > compiler cannot make the right inlining decisions without having actual > runtime data. It tends to make a better decision based on the data that ends > up being wrong anyway here. I'm gonna leave things as is and rather put up > some improvements to LLVM code instead!
Good analysis. Your plan sounds good but lowering the cold threshold is probably a good thing to do with the new BFI. The FindAndConstruct method in DenseMap can probably be annotated with buitin_expect to indicate the insertion is a unlikely path. https://github.com/llvm/llvm-project/pull/66285 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits