On Tue, Aug 01, 2023 at 10:44:02AM +0200, Jan Hubicka wrote: > > > If I comment it out as above patch, then O3/PGO can get 16% and 12% > > > performance > > > improvement compared to O2 on x86. > > > > > > O2 O3 PGO > > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > > instructions 10,457,508,646 9,723,056,131 10,457,216,225 > > > branches 2,303,029,380 2,250,522,323 2,302,994,942 > > > branch-misses 0.00% 0.01% 0.01% > > > > > > The main difference in the compilation output about code around the > > > miss-prediction > > > branch is: > > > o In O2: predicated instruction (cmov here) is selected to eliminate > > > above > > > branch. cmov is true better than branch here. > > > o In O3/PGO: bitout() is inlined into encode_file(), and branch > > > instruction > > > is selected. But this branch is obviously *unpredictable* and the > > > compiler > > > doesn't know it. This why O3/PGO are are so bad for this program. > > > > > > Gcc doesn't support __builtin_unpredictable() which has been introduced > > > by llvm. > > > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can > > > serve the > > > same purpose. The result is negative. > > > > But does it appear to be predictable with your profiling data? > > Also one thing is that __builtin_expect and > __builtin_expect_with_probability only affects the static branch > prediciton algorithm, so with profile feedback they are ignored on every > branch executed at least once during the train run. > > setting probability 0.5 is really not exactly the same as hint that the > branch will be mispredicted, since modern CPUs handle well regularly > behaving branchs (such as a branch firing every even iteration of loop). > Yeah. Setting probability 0.5 is just an experimental attempt. I don't know how the heuristic works internally.
> So I think having the builting is not a bad idea. I was thinking if it > makes sense to represent it withing profile_probability type and I am > not convinced, since "unpredictable probability" sounds counceptually > odd and we would need to keep the flag intact over all probability > updates we do. For things like loop exits we recompute probabilities > from frequencies after unrolling/vectorizaiton and other things and we > would need to invent new API to propagate the flag from previous > probability (which is not even part of the computation right now) > > So I guess the challenge is how to pass this info down through the > optimization pipeline, since we would need to annotate gimple > conds/switches and manage it to RTL level. On gimple we have flags and > on rtl level notes so there is space for it, but we would need to > maintain the info through CFG changes. > > Auto-FDO may be interesting way to detect such branches. > So I suppose PGO also could. But branch instruction is selected in my test just as O3 does. And data shows that comv works better than branch here. > Honza > > > > > I think we could come to a conclusion that there must be something can > > > improve in > > > Gcc's heuristic strategy about Predicated Instructions and branches, at > > > least > > > for O3 and PGO. > > > > > > And can we add __builtin_unpredictable() support for Gcc? As usually it's > > > hard > > > for the compiler to detect unpredictable branches. > > > > > > -- > > > Cheers, > > > Changbin Du -- Cheers, Changbin Du