https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335
Bug ID: 92335 Summary: missed transformation to branchless Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in the following code (compiled with -O2 or -O3 and even with -march=haswell) gcc will use a branchless construct in foo but not in bar (changing from float to int does not modify the behavior) (see https://godbolt.org/z/0ZWKb5 ) with -Ofast they will compile in the same vectorized branchless loop, still I do not see why the branch shall be retained at -O2 in bar for random "x" the branchless version is 6 times faster on any out-of-order cpu float foo(float const * __restrict__ x, float const * __restrict__ y) { float ret=0.f; for (int i=0;i<1024;++i) { auto k = y[i]; ret += x[i]>0.f ? k : 0.f; } return ret; } float bar(float const * __restrict__ x, float const * __restrict__ y) { float ret=0.f; for (int i=0;i<1024;++i) { auto k = y[i]; if(x[i]>0.f) ret += k; } return ret; }