I use a PYTHON BASED ECOSYSTEM (SCIKIT-LEARN, … ) FOR PROTOTYPING and I have a C++ BASED PRODUCTION SYSTEM. A scikit-learn compatible interface allows me to take advantage of scikit-learn’s ecosystem. Implementing the algorithm in C++ allows me to develop and test my algorithms already during prototyping.

I started with scikit-learn’s project template to roll my own decision tree and forest classifier and implemented the algorithms in a C++ library, using Cython to create the Python bindings.

Starting out with a Python implementation, I experimented a little bit with implementing the algorithms in Cython. But I found that if you are proficient in Python and C++ coding, that implementing the algorithm directly in C++ was much faster than writing it in Cython.

I made this project available to everybody, because I think it could serve as an example or template for anybody who would like to roll their own scikit-learn compatible classifier with a C++ based implementation of the algorithms to be re-used in a production system. At least version 1.0.0 should be useful, after that it might become too complex to be used as an example.

Check it out:

READTHEDOCs: https://koho.readthedocs.io

 GITHUB: https://github.com/AIWerkstatt/koho

I tried to be consistent with scikit-learn’s decision tree and ensemble modules, and the basic concepts, including stack, samples LUT with in-place partitioning, incremental histogram updates, for the implementation of the classifiers are based on: G. Louppe, Understanding Random Forests, PhD Thesis, 2014. Thanks a lot Gilles for that comprehensive work on random forests!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to