On Mon, Sep 08, 2014 at 10:05:58AM +0100, Luca Puggini wrote:
> for personal reason I am writing a function to compute the outlier
> measure from random forest
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#
> outliers

> with a little more work I can include the function in the sklearn
> random forest class.

Do you have a guessstimate on the amount of code it would add to the
codebase.

Also, is there a canonical paper on this approach that we could read.

> Is the community interested? Should I do it?

As always, it's very hard to judge whether a method should be included. I
personnally think that outlier detection is something very important, and
I'd like to see more in scikit-learn. However, we need to choose the
methods that bring the most benefit to users to solve that problem. Thus
we need to be convinved that the situations in which the method works
well are reasonnably common. This requires understanding these
situations, and that's usually a bit hard.

Thanks a lot for that proposal!

Gaƫl

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to