Hi everybody, I'm currently working on a Pull Request for Gradient Boosted Regression Trees [1] (aka Gradient Boosting, MART, TreeNet) and I'm looking for collaborators.
GBRTs have been advertised as one of the best off-the-shelf data-mining procedures; they share many properties with random forests including little need for tuning and data preprocessing as well as high predictive accuracy. GBRT have been used very successfully in areas such as learning of ranking functions and ecology [4]. The main goal is to come up with an alternative to R's 'gbm' package [2]; it should feature a number of loss functions (both classification and regression) as well as a number of enhancements (stochastic gradient boosting, partial dependency plots). I already have a working solution for regression (least squares and least absolute deviation) and binary classification. But there is still much work to do (i.e., debugging, performance tweaks, numerical stability, testing, documentation). The current codebase can be found here [3]. Please contact me in case you're interested! best, Peter [1] J. H. Friedman (2001). `Greedy function approximation: a gradient boosting machine'. The Annals of Statistics 29(5). [2] http://cran.r-project.org/web/packages/gbm/index.html [3] https://github.com/pprett/scikit-learn/tree/gradient_boosting-rebased [4] http://www.stanford.edu/~hastie/Papers/leathwick%20et%20al%202006%20MEPS%20.pdf -- Peter Prettenhofer ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
