Hi all,

I thought some people on this mailing list would be interested in the
compiled decision tree library for scikit-learn I put together two weeks
ago.  This library speeds up evaluation of regression trees/ensembles by
compiling custom prediction code representing the model.

It has been tested on OS X and Linux, has ~100% unit test coverage, and a
fairly extensive benchmark suite.  Results show ~4x-8x speedup for random
forests across a range of datasets and model parameters.

The code is available at
https://github.com/ajtulloch/sklearn-compiledtrees/ and
has ~30 followers over the last two weeks, so it seemed there was some
interest from users of scikit-learn.

## Idea

The idea is to speed up the prediction part of ensemble/tree methods by
generating C code representing a given ensemble/tree, compiling this
representation to optimised object code, and loading the shared library
representing the tree via `dlopen`.

For example, a trivial decision tree with depth one could be represented in
C as:

```
float evaluate(float* f) {
  if (f[9] <= 0.175931170583) {
    return 0.0;
  }
  else {
    return 1.0;
  }
}
```

## Usage

To use it in your own projects, simply run `pip install
sklearn-compiledtrees`, and

```
from compiledtrees import CompiledRegressionPredictor

clf = ...

# Code-gens, compiles, and loads the optimised model
compiled_clf = CompiledRegressionPredictor(clf)

y_pred = compiled_clf.predict(X)
```

See https://gist.github.com/9911766 for a full example.

## Benchmarks

Benchmarks are given in the README on [GitHub], and across a range of
ensemble sizes, tree depths, and datasets, show ~4x-8x speedup for random
forests, and around ~1x-3x speedup for gradient boosted ensembles. All
those benchmarks were reported on a 2.4 GHz Intel i7 with OS X.

## More

It was initial proposed to be added into the mainline scikit-learn
distribution, but the consensus was that the increase in project complexity
was not worth the  performance improvement. See [PR] for more discussion.

[GitHub]: https://github.com/ajtulloch/sklearn-compiledtrees/
[PyPI]: pypi.python.org/pypi/sklearn-compiledtrees/
[PR]: https://github.com/scikit-learn/scikit-learn/pull/2975/
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to