Joseph K. Bradley created SPARK-5870: ----------------------------------------
Summary: GradientBoostedTrees should cache residuals from partial model Key: SPARK-5870 URL: https://issues.apache.org/jira/browse/SPARK-5870 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley On each iteration, GradientBoostedTrees computes predictions for each training instance using the partial model. This means it re-computes the prediction of each tree on every following iteration, making for O(numIterations^2) work instead of O(numIterations). It should instead cache the current residuals and update them with the predictions from the newest tree on each iteration. This will likely speed things up when using small trees (where training trees is fastest). For large trees, training may be costly enough to amortize the cost of re-computing predictions on each iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org