Hi all, I thought some people on this mailing list would be interested in the compiled decision tree library for scikit-learn I put together two weeks ago. This library speeds up evaluation of regression trees/ensembles by compiling custom prediction code representing the model.
It has been tested on OS X and Linux, has ~100% unit test coverage, and a fairly extensive benchmark suite. Results show ~4x-8x speedup for random forests across a range of datasets and model parameters. The code is available at https://github.com/ajtulloch/sklearn-compiledtrees/ and has ~30 followers over the last two weeks, so it seemed there was some interest from users of scikit-learn. ## Idea The idea is to speed up the prediction part of ensemble/tree methods by generating C code representing a given ensemble/tree, compiling this representation to optimised object code, and loading the shared library representing the tree via `dlopen`. For example, a trivial decision tree with depth one could be represented in C as: ``` float evaluate(float* f) { if (f[9] <= 0.175931170583) { return 0.0; } else { return 1.0; } } ``` ## Usage To use it in your own projects, simply run `pip install sklearn-compiledtrees`, and ``` from compiledtrees import CompiledRegressionPredictor clf = ... # Code-gens, compiles, and loads the optimised model compiled_clf = CompiledRegressionPredictor(clf) y_pred = compiled_clf.predict(X) ``` See https://gist.github.com/9911766 for a full example. ## Benchmarks Benchmarks are given in the README on [GitHub], and across a range of ensemble sizes, tree depths, and datasets, show ~4x-8x speedup for random forests, and around ~1x-3x speedup for gradient boosted ensembles. All those benchmarks were reported on a 2.4 GHz Intel i7 with OS X. ## More It was initial proposed to be added into the mainline scikit-learn distribution, but the consensus was that the increase in project complexity was not worth the performance improvement. See [PR] for more discussion. [GitHub]: https://github.com/ajtulloch/sklearn-compiledtrees/ [PyPI]: pypi.python.org/pypi/sklearn-compiledtrees/ [PR]: https://github.com/scikit-learn/scikit-learn/pull/2975/
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
