Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

Jan Hubicka via Gcc-bugs Tue, 01 Aug 2023 01:44:13 -0700

> > If I comment it out as above patch, then O3/PGO can get 16% and 12% 
> > performance
> > improvement compared to O2 on x86.
> >
> >                         O2              O3              PGO
> > cycles                  2,497,674,824   2,104,993,224   2,199,753,593
> > instructions            10,457,508,646  9,723,056,131   10,457,216,225
> > branches                2,303,029,380   2,250,522,323   2,302,994,942
> > branch-misses           0.00%           0.01%           0.01%
> >
> > The main difference in the compilation output about code around the 
> > miss-prediction
> > branch is:
> >   o In O2: predicated instruction (cmov here) is selected to eliminate above
> >     branch. cmov is true better than branch here.
> >   o In O3/PGO: bitout() is inlined into encode_file(), and branch 
> > instruction
> >     is selected. But this branch is obviously *unpredictable* and the 
> > compiler
> >     doesn't know it. This why O3/PGO are are so bad for this program.
> >
> > Gcc doesn't support __builtin_unpredictable() which has been introduced by 
> > llvm.
> > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can 
> > serve the
> > same purpose. The result is negative.
> 
> But does it appear to be predictable with your profiling data?


Also one thing is that __builtin_expect and
__builtin_expect_with_probability only affects the static branch
prediciton algorithm, so with profile feedback they are ignored on every
branch executed at least once during the train run.

setting probability 0.5 is really not exactly the same as hint that the
branch will be mispredicted, since modern CPUs handle well regularly
behaving branchs (such as a branch firing every even iteration of loop).

So I think having the builting is not a bad idea.  I was thinking if it
makes sense to represent it withing profile_probability type and I am
not convinced, since "unpredictable probability" sounds counceptually
odd and we would need to keep the flag intact over all probability
updates we do.  For things like loop exits we recompute probabilities
from frequencies after unrolling/vectorizaiton and other things and we
would need to invent new API to propagate the flag from previous
probability (which is not even part of the computation right now)

So I guess the challenge is how to pass this info down through the
optimization pipeline, since we would need to annotate gimple
conds/switches and manage it to RTL level.  On gimple we have flags and
on rtl level notes so there is space for it, but we would need to
maintain the info through CFG changes.

Auto-FDO may be interesting way to detect such branches.

Honza
> 
> > I think we could come to a conclusion that there must be something can 
> > improve in
> > Gcc's heuristic strategy about Predicated Instructions and branches, at 
> > least
> > for O3 and PGO.
> >
> > And can we add __builtin_unpredictable() support for Gcc? As usually it's 
> > hard
> > for the compiler to detect unpredictable branches.
> >
> > --
> > Cheers,
> > Changbin Du

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

Reply via email to