MatzeB wrote: And digging even deeper: - FWIW I noticed that I only used `clang -c` as benchmark previously, should have used `clang -c -O3` resulting in this: ``` Old-BFI: insn: 37,821,687,947 (baseline) New-BFI: insn: 38,133,312,923 +0.82% Old-BFI, no-cold: insn: 37,423,365,184 -1.05% New-BFI, no-cold: insn: 37,438,736,713 -1.01% ``` - The problematic part of the code that is inlined/not-inlined is actually marked as `LLVM_UNLIKELY` here: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607 and intuitively one would think that it is probably best to not inline `grow` (as will happen with the new BFI version) - In practice not-inlining `grow` mostly ends up being worse because of knock-on effects as far as I can tell. These are the inlining decisions I noticed for the most prominent situation: - `grow` inlined (Old-BFI): - Instruction::getMetadataImpl -> Value::getMetadata not inlined -> DenseMap::operator[] inlined -> DenseMap::FindAndConstruct inlined -> DenseMap::InsertIntoBucket not inlined, size likely too big with `grow` inlined here -> DenseMap::grow inlined - `grow` inlined (New-BFI): - Instruction::getMadataImpl -> Value::getMetadata inlined -> DenseMap::operator[] inlined -> DenseMap::FindAndConstruct not inlined, size -> DenseMap::InsertIntoBucket inlined -> DenseMap::grow not inlined Though if you look into this then I would state that the code got better for the wrong reasons! Not inlining `grow` is a sensible decision in this context and the `LLVM_UNLIKELY` annotation makes sense (I actually added some counters and see the unlikely branch taken somewhere between 1% and 10% of the cases depending on inputs, so seems sensible).
Unfortunately the particular code here `getMetadataImpl` never inserts new things into the map, but unfortunately `operator[]` gives you that behavior by default so nobody noticed. So not inlining `InsertIntoBucket` happened to be a great decision previously that the compiler did by accident without having good data to support this. Now with better but still insufficient data (as this is PGO) we happen to make the decisions that ends up being worse. Long story short: This ended up another of the many stories where the compiler cannot make the right inlining decisions without having actual runtime data. It tends to make a better decision based on the data that ends up being wrong anyway here. I'm gonna leave things as is and rather put up some improvements to LLVM code instead! https://github.com/llvm/llvm-project/pull/66285 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits