[ https://issues.apache.org/jira/browse/SPARK-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357868#comment-15357868 ]
Vladimir Feinberg commented on SPARK-4240: ------------------------------------------ [~sethah] Hi Seth, it seems like your comment is outdated now that GBT is indeed in ML. Are you currently working on this? > Refine Tree Predictions in Gradient Boosting to Improve Prediction Accuracy. > ---------------------------------------------------------------------------- > > Key: SPARK-4240 > URL: https://issues.apache.org/jira/browse/SPARK-4240 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Sung Chung > > The gradient boosting as currently implemented estimates the loss-gradient in > each iteration using regression trees. At every iteration, the regression > trees are trained/split to minimize predicted gradient variance. > Additionally, the terminal node predictions are computed to minimize the > prediction variance. > However, such predictions won't be optimal for loss functions other than the > mean-squared error. The TreeBoosting refinement can help mitigate this issue > by modifying terminal node prediction values so that those predictions would > directly minimize the actual loss function. Although this still doesn't > change the fact that the tree splits were done through variance reduction, it > should still lead to improvement in gradient estimations, and thus better > performance. > The details of this can be found in the R vignette. This paper also shows how > to refine the terminal node predictions. > http://www.saedsayad.com/docs/gbm2.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org