Joseph K. Bradley created SPARK-5870:
----------------------------------------

             Summary: GradientBoostedTrees should cache residuals from partial 
model
                 Key: SPARK-5870
                 URL: https://issues.apache.org/jira/browse/SPARK-5870
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley


On each iteration, GradientBoostedTrees computes predictions for each training 
instance using the partial model.  This means it re-computes the prediction of 
each tree on every following iteration, making for O(numIterations^2) work 
instead of O(numIterations).

It should instead cache the current residuals and update them with the 
predictions from the newest tree on each iteration.

This will likely speed things up when using small trees (where training trees 
is fastest).  For large trees, training may be costly enough to amortize the 
cost of re-computing predictions on each iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to