Hi Ken,

I share and understand your concerns about the rigidity of the current
implementation.

> I like using Extremely Randomized Trees, but I'm looking for more flexibility 
> in generating them. In particular, I'd like to be able to specify my own 
> criterion and split finding algorithm. I'm curious why these are passed in as 
> strings instead of functions/objects. Part of me thinks it has something to 
> do with Cython. Otherwise, I could imagine wanting to be more abstract and 
> leave decisions to the code; for example, best_split and random_split would 
> use different implementations to have an efficient MAE criterion.

Those are passed a strings because we don't want the user to have to
instantiate other objects in order to instantiate and build a forest.
Under the hood however, those strings are converted into appropriate
Criterion instances (see _tree.pyx) which are then used within a
common construction procedure.

>
> So I'd like to contribute a simple MAE criterion that would be efficient for 
> random splits (i.e. O(n) given a single batch update.) Is the direction 
> forward for something like this to hard-code more criteria in _tree.pyx, or 
> would it be better to approach some modularity and allow a Criterion object 
> to be passed in?

At the moment, adding a criterion require writing a new class
implementing the Criterion interface defined in _tree.pyx. It should
then be pluggable as is without any other change to the code.

Hope this helps,

Gilles

>
>
> Ken Geis
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to