[clang] Bfi precision (PR #66285)

2023-10-30 Thread via cfe-commits

WenleiHe wrote:

Thanks for investigating, Matthias. So it looks like the regression is indeed 
just clang non-PGO build being unlucky after the BFI improvement.. Sounds good 
to leave it as is. 

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-28 Thread David Li via cfe-commits

david-xl wrote:

> And digging even deeper:
> 
> * FWIW I noticed that I only used `clang -c` as benchmark previously, should 
> have used `clang -c -O3` resulting in this:
> 
> ```
> Old-BFI:  insn: 37,821,687,947  (baseline)
> New-BFI:  insn: 38,133,312,923  +0.82%
> Old-BFI, no-cold: insn: 37,423,365,184  -1.05%
> New-BFI, no-cold: insn: 37,438,736,713  -1.01%
> ```
> 
> * The problematic part of the code that is inlined/not-inlined is actually 
> marked as `LLVM_UNLIKELY` here: 
> https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607
>   and intuitively one would think that it is probably best to not inline 
> `grow`  (as will happen with the new BFI version)
> * In practice not-inlining `grow` mostly ends up being worse because of 
> knock-on effects as far as I can tell. These are the inlining decisions I 
> noticed for the most prominent situation:
> 
> ```
> - `grow` inlined (Old-BFI):
>  - Instruction::getMetadataImpl
>  -> Value::getMetadata   not inlined
>  -> DenseMap::operator[] inlined
>  -> DenseMap::FindAndConstruct   inlined
>  -> DenseMap::InsertIntoBucket   not inlined, size
>likely too big with `grow` inlined here
>  -> DenseMap::grow   inlined
> - `grow` inlined (New-BFI):
>  - Instruction::getMadataImpl
>  -> Value::getMetadata  inlined
>  -> DenseMap::operator[]inlined
>  -> DenseMap::FindAndConstruct  not inlined, size
>  -> DenseMap::InsertIntoBucket  inlined
>  -> DenseMap::grow  not inlined
> ```
> 
> Though if you look into this then I would state that the code got better for 
> the wrong reasons! Not inlining `grow` is a sensible decision in this context 
> and the `LLVM_UNLIKELY` annotation makes sense (I actually added some 
> counters and see the unlikely branch taken somewhere between 1% and 10% of 
> the cases depending on inputs, so seems sensible).
> 
> Unfortunately the particular code here `getMetadataImpl` never inserts new 
> things into the map, but unfortunately `operator[]` gives you that behavior 
> by default so nobody noticed. So not inlining `InsertIntoBucket` happened to 
> be a great decision previously that the compiler did by accident without 
> having good data to support this. Now with better but still insufficient data 
> (as this is PGO) we happen to end up inlining `InsertIntoBucket` wasting code 
> size leading to worse inlining decisions further up the stack...
> 
> Long story short: This ended up another of the many stories where the 
> compiler cannot make the right inlining decisions without having actual 
> runtime data. It tends to make a better decision based on the data that ends 
> up being wrong anyway here. I'm gonna leave things as is and rather put up 
> some improvements to LLVM code instead!

Good analysis.

Your plan sounds good but lowering the cold threshold is probably a good thing 
to do with the new BFI. The FindAndConstruct method in DenseMap can probably be 
annotated with buitin_expect to indicate the insertion is a unlikely path.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-28 Thread Duncan P . N . Exon Smith via cfe-commits

dexonsmith wrote:

Interesting. Probably `Value::getMetadata()` could/should call 
`DenseMap::find()` instead of `operator[]()` and assert that it's found before 
dereferencing, because `Value::hasMetadata()` (which, IIRC, consults a bit 
stored in `Value`) has already promised something will be there. Probably 
you'll find me on the git-blame for this...

Your plan SGTM!

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-27 Thread Matthias Braun via cfe-commits

MatzeB wrote:

Some very ad-hoc benchmarking. Of clang compilation speed (measured in 
instructions as reported by valgrind/callgrind which I think somewhat matches 
the setup of nikic) compiling `sqlite3` of CTMark:

Old-BFI (this PR reverted), New-BFI (this PR applied), no-cold 
(cold-callsite-rel-freq set to 0 to disable 
`InlineCostCallAnalyzer::isColdCallSite` behavior for non-PGO builds:

```
Old-BFI:  size:  59,802,772  (baseline)  insn: 2,812,833,144  (baseline)
New-BFI:  size:  60,247,076  +0.74%  insn: 2,818,639,641  +0.21%
Old-BFI, no-cold: size:  60,773,988  +1.62%  insn: 2,806,521,932  -0.22%
New-BFI, no-cold: size:  60,741,700  +1.57%  insn: 2,803,489,298  -0.33%
```

I could benchmark more with llvm-test-suite and put up a PR to disable 
cold-callsite-rel-freq by default if people support this direction...

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread Vitaly Buka via cfe-commits

vitalybuka wrote:

FYI @ldionne 

This patch causing test_vector2.pass.cpp trigger memory leak 
https://lab.llvm.org/buildbot/#/builders/168/builds/16422
Not sure if this was leaking but lsan didn't see, it's possible. Or some 
regression in lsan.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread David Li via cfe-commits

david-xl wrote:

Contracting my past self (one of the reviewers of the patch),  I think coldness 
check should be based on a global threshold, which exists when synthetic entry 
count propagation is enabled (without PGO). 



https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread Duncan P . N . Exon Smith via cfe-commits

dexonsmith wrote:

> Seems this got introduced in https://reviews.llvm.org/D34312 with the rough 
> idea that we shouldn't inline into parts of the code that 
> `_builtin_expect(...)` deems unlikely. Which makes sense when you express it 
> like this, but I guess numeric thresholds can go wrong...

Heh, yeah, the premise seems correct, but a percentage-based numeric threshold 
doesn't seem right. You kind of want a flag for the block. Or a special value, 
like "freq=0", which indicates "annotation says this is cold".

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread Matthias Braun via cfe-commits

MatzeB wrote:

Seems this got introduced in https://reviews.llvm.org/D34312 with the rough 
idea that we shouldn't inline into parts of the code that 
`_builtin_expect(...)` deems unlikely. Which makes sense when you say it 
express it like this, but I guess numeric thresholds can go wrong...

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread David Li via cfe-commits

david-xl wrote:

Agree that deciding coldness based on local entry frequency (2%) is a bad idea 
though.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-26 Thread Duncan P . N . Exon Smith via cfe-commits

dexonsmith wrote:

Not entirely accidental. When BPI/BFI first landed it was heavily profiled to 
be sure it didn't pessimize non-PGO code. I don't see why we'd suddenly be okay 
with pessimizing it.

Under 2% isn't hard to hit for hot path code. Lots of functions will have 
strings of early exit conditions.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-25 Thread David Li via cfe-commits

david-xl wrote:

Yes -- the revert can wait until more data is available. I agree that it should 
help performance in theory.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-25 Thread via cfe-commits

WenleiHe wrote:

> This is concerning. Can this be reverted for now and we can help with some 
> internal performance testing. @xur-llvm

Internally, with this change, on a few large workloads, we saw ~0.2-0.5% 
performance improvements, all with 2-3% .text size reduction at the same time. 
Our testing covered both sampling pgo and instrumentation pgo.

The change makes profile more accurate, which is a good improvement by itself. 
But PGO performance isn't just about profile accuracy, and it depends on 
training quality and representativeness too. Hypothetically, if some of the 
training isn't that representative relative to perf testing scenario, by making 
profile more accurate, we could also make profile deviate more from what's 
needed for best perf on testing scenario. 

That said, I think it'd be fair to revert if this is proven to be a general 
regression. For that, can we wait for your results before reverting? 

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-25 Thread Matthias Braun via cfe-commits

MatzeB wrote:

Oh so this isn't even a PGO enabled build, interesting...

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-25 Thread Nikita Popov via cfe-commits

nikic wrote:

This is a standard release build. The exact cmake config is 
https://gist.github.com/nikic/1bf97b5b6aade298b87f41cd13bf37d5 where the stage1 
build has the same configuration but is built with gcc. I'm testing CTMark, but 
the impact is quite consistent across the board, so anything will probably do.



https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Bfi precision (PR #66285)

2023-10-25 Thread Nikita Popov via cfe-commits

nikic wrote:

It looks like this has made clang 0.7% slower (in the sense that the clang 
stage2 binary is optimized worse -- there is no negative impact on stage1). The 
binary is also 0.7% larger. It looks like this change has significant impact on 
inlining heuristics.

Not really sure what to make of that, I assume you determined that this ends up 
being faster on average, so maybe clang is just unlucky.

https://github.com/llvm/llvm-project/pull/66285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits