I use a PYTHON BASED ECOSYSTEM (SCIKIT-LEARN, … ) FOR PROTOTYPING and
I have a C++ BASED PRODUCTION SYSTEM. A scikit-learn compatible
interface allows me to take advantage of scikit-learn’s ecosystem.
Implementing the algorithm in C++ allows me to develop and test my
algorithms already during prototyping.
I started with scikit-learn’s project template to roll my own decision
tree and forest classifier and implemented the algorithms in a C++
library, using Cython to create the Python bindings.
Starting out with a Python implementation, I experimented a little bit
with implementing the algorithms in Cython. But I found that if you
are proficient in Python and C++ coding, that implementing the
algorithm directly in C++ was much faster than writing it in Cython.
I made this project available to everybody, because I think it could
serve as an example or template for anybody who would like to roll
their own scikit-learn compatible classifier with a C++ based
implementation of the algorithms to be re-used in a production system.
At least version 1.0.0 should be useful, after that it might become
too complex to be used as an example.
Check it out:
READTHEDOCs: https://koho.readthedocs.io
GITHUB: https://github.com/AIWerkstatt/koho
I tried to be consistent with scikit-learn’s decision tree and
ensemble modules, and the basic concepts, including stack, samples LUT
with in-place partitioning, incremental histogram updates, for the
implementation of the classifiers are based on: G. Louppe,
Understanding Random Forests, PhD Thesis, 2014. Thanks a lot Gilles
for that comprehensive work on random forests!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn