https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #4) > (In reply to Alexander Monakov from comment #3) > > So perhaps an unpopular opinion, but I'd say a > > __builtin_branchless_select(c, a, b) (guaranteed to live throughout > > optimization pipeline as a non-branchy COND_EXPR) is badly missing. > > I am going to say otherwise. Many of the time conditional move is faster > than using a branch; even if the branch is predictable (there are a few > exceptions) on most non-Intel/AMD targets. This is because the conditional > move is just one cycle and only a "predictable" branch is one cyle too. The issue with a conditional move is that it adds a data dependence while branches are usually speculated and thus have zero overhead in the execution stage. The extra dependence can easily slow things down dependent on the (three!) instructions feeding the conditional move (condition, first and second source). This is why well-predicted branches are often so much faster. > It is even worse when doing things like: > if (a && b) > where on aarch64, this can be done using only one cmp followed by one ccmp. > NOTE on PowerPC, you could use in theory crand/cror (though this is not done > currently and I don't know if they are profitable in any recent design). > > Plus aarch64 has conditional add and a few other things which improve the > idea of a conditional move. I can see conditional moves are almost always a win on less pipelined/speculative implementations.