[ https://issues.apache.org/jira/browse/SPARK-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley closed SPARK-5870. ------------------------------------ Resolution: Duplicate Fix Version/s: 1.4.0 > GradientBoostedTrees should cache residuals from partial model > -------------------------------------------------------------- > > Key: SPARK-5870 > URL: https://issues.apache.org/jira/browse/SPARK-5870 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Fix For: 1.4.0 > > > On each iteration, GradientBoostedTrees computes predictions for each > training instance using the partial model. This means it re-computes the > prediction of each tree on every following iteration, making for > O(numIterations^2) work instead of O(numIterations). > It should instead cache the current residuals and update them with the > predictions from the newest tree on each iteration. > This will likely speed things up when using small trees (where training trees > is fastest). For large trees, training may be costly enough to amortize the > cost of re-computing predictions on each iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org