https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573

--- Comment #9 from Joshua Green <jvg1981 at aim dot com> ---
(In reply to Segher Boessenkool from comment #8)
> GCC does some fairly involved prediction (in predict.c).  It isn't
> "a priori".
> 
> > (It's also not clear HOW this could be "faster
> > on essentially all processors"
> 
> Fall-through is faster than branching in most cases.  Most CPUs have
> some kind of pipelining on instruction fetch.
> 

This is the point on which I'm confused.  I understand that fall through is
faster than branching, that it's good to keep the pipeline running smoothly. 
It seems to me, though, that in this case the compiler has complete freedom in
deciding which function call (bar1() or bar2()) is in the "fall through case"
and which is in the "branching case."  Why not make the same choice as other
compilers do (and documentation recommends, and O0 does [, and O1 used to do?])
by replacing the above O2-O3 code with

foo(bool):
        testb   %dil, %dil
        je      .L4
        jmp     bar1()
.L4:
        jmp     bar2()

?

Reply via email to