https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Alexander Monakov from comment #3)
> > So perhaps an unpopular opinion, but I'd say a
> > __builtin_branchless_select(c, a, b) (guaranteed to live throughout
> > optimization pipeline as a non-branchy COND_EXPR) is badly missing.
> 
> I am going to say otherwise.  Many of the time conditional move is faster
> than using a branch; even if the branch is predictable (there are a few
> exceptions) on most non-Intel/AMD targets.  This is because the conditional
> move is just one cycle and only a "predictable" branch is one cyle too.

The issue with a conditional move is that it adds a data dependence while
branches are usually speculated and thus have zero overhead in the execution
stage.  The extra dependence can easily slow things down dependent on the
(three!) instructions feeding the conditional move (condition, first and
second source).  This is why well-predicted branches are often so much
faster.

> It is even worse when doing things like:
> if (a && b)
> where on aarch64, this can be done using only one cmp followed by one ccmp.
> NOTE on PowerPC, you could use in theory crand/cror (though this is not done
> currently and I don't know if they are profitable in any recent design).
> 
> Plus aarch64 has conditional add and a few other things which improve the
> idea of a conditional move.

I can see conditional moves are almost always a win on less
pipelined/speculative implementations.

Reply via email to