Hi Ken, I share and understand your concerns about the rigidity of the current implementation.
> I like using Extremely Randomized Trees, but I'm looking for more flexibility > in generating them. In particular, I'd like to be able to specify my own > criterion and split finding algorithm. I'm curious why these are passed in as > strings instead of functions/objects. Part of me thinks it has something to > do with Cython. Otherwise, I could imagine wanting to be more abstract and > leave decisions to the code; for example, best_split and random_split would > use different implementations to have an efficient MAE criterion. Those are passed a strings because we don't want the user to have to instantiate other objects in order to instantiate and build a forest. Under the hood however, those strings are converted into appropriate Criterion instances (see _tree.pyx) which are then used within a common construction procedure. > > So I'd like to contribute a simple MAE criterion that would be efficient for > random splits (i.e. O(n) given a single batch update.) Is the direction > forward for something like this to hard-code more criteria in _tree.pyx, or > would it be better to approach some modularity and allow a Criterion object > to be passed in? At the moment, adding a criterion require writing a new class implementing the Criterion interface defined in _tree.pyx. It should then be pluggable as is without any other change to the code. Hope this helps, Gilles > > > Ken Geis > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
