https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950

            Bug ID: 104950
           Summary: GCC does not emit branchless code
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

In this example GCC fails to emit branchless code while CLANG does.
In the actual application, measurements shows slow down up to a factor 2.
I managed to force branchless (-DBL) but the code is pretty unfriendly
godbolt link (GCC, clang, GCC -DBL 

https://godbolt.org/z/KWY1rjhhY



and here inlined

include <vector>
const float defaultBaseResponse = 0.5;
class DForest {
public:
    //based on FastForest::evaluate() and BDTree::parseTree()
    DForest() {
    }
    float evaluate(const float* features) const;

    std::vector<int> rootIndices_;
    //"node" layout: cut, index, left, right
    struct Node{
        float v; int i,l,r;
        constexpr int eval(float const * f) const {
#ifdef BL 
          auto m = f[i] > v;
          return *((&l) + int(m));
#else
          return f[i] > v ? r : l;
#endif
        }
    };
    std::vector<Node> nodes_;
    std::vector<float> responses_;
    std::vector<float> baseResponses_;
};

float DForest::evaluate(const float* features) const{
    float sum{defaultBaseResponse + baseResponses_[0]};
    for(int index : rootIndices_){
        do {
            index = nodes_[index].eval(features);
        } while (index>0);
        sum += responses_[-index];
    }
    return sum;
}

Reply via email to