https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950
Bug ID: 104950 Summary: GCC does not emit branchless code Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- In this example GCC fails to emit branchless code while CLANG does. In the actual application, measurements shows slow down up to a factor 2. I managed to force branchless (-DBL) but the code is pretty unfriendly godbolt link (GCC, clang, GCC -DBL https://godbolt.org/z/KWY1rjhhY and here inlined include <vector> const float defaultBaseResponse = 0.5; class DForest { public: //based on FastForest::evaluate() and BDTree::parseTree() DForest() { } float evaluate(const float* features) const; std::vector<int> rootIndices_; //"node" layout: cut, index, left, right struct Node{ float v; int i,l,r; constexpr int eval(float const * f) const { #ifdef BL auto m = f[i] > v; return *((&l) + int(m)); #else return f[i] > v ? r : l; #endif } }; std::vector<Node> nodes_; std::vector<float> responses_; std::vector<float> baseResponses_; }; float DForest::evaluate(const float* features) const{ float sum{defaultBaseResponse + baseResponses_[0]}; for(int index : rootIndices_){ do { index = nodes_[index].eval(features); } while (index>0); sum += responses_[-index]; } return sum; }