Hi everybody,

I'm currently working on a Pull Request for Gradient Boosted
Regression Trees [1] (aka Gradient Boosting, MART, TreeNet) and I'm
looking for collaborators.

GBRTs have been advertised as one of the best off-the-shelf
data-mining procedures; they share many properties with random forests
including little need for tuning and data preprocessing as well as
high predictive accuracy. GBRT have been used very successfully in
areas such as learning of ranking functions and ecology [4].

The main goal is to come up with an alternative to R's 'gbm' package
[2]; it should feature a number of loss functions (both classification
and regression) as well as a number of enhancements (stochastic
gradient boosting, partial dependency plots).

I already have a working solution for regression (least squares and
least absolute deviation) and binary classification. But there is
still much work to do (i.e., debugging, performance tweaks, numerical
stability, testing, documentation). The current codebase can be found
here [3].

Please contact me in case you're interested!

best,
 Peter

[1] J. H. Friedman (2001). `Greedy function approximation: a gradient
boosting machine'. The Annals of Statistics 29(5).
[2] http://cran.r-project.org/web/packages/gbm/index.html
[3] https://github.com/pprett/scikit-learn/tree/gradient_boosting-rebased
[4] 
http://www.stanford.edu/~hastie/Papers/leathwick%20et%20al%202006%20MEPS%20.pdf


-- 
Peter Prettenhofer

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to