Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4906#discussion_r26056478 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -69,6 +74,42 @@ class GradientBoostedTrees(private val boostingStrategy: BoostingStrategy) case _ => throw new IllegalArgumentException(s"$algo is not supported by the gradient boosting.") } + baseLearners = fitGradientBoostingModel.trees + baseLearnerWeights = fitGradientBoostingModel.treeWeights + fitGradientBoostingModel + } + + /** + * Method to compute error or loss for every iteration of gradient boosting. + * @param data: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]] + * @param loss: evaluation metric that defaults to boostingStrategy.loss + * @return an array with index i having the losses or errors for the ensemble + * containing trees 1 to i + 1 + */ + def evaluateEachIteration( --- End diff -- This method should be implemented in the model, not in the estimator. There's no need to make a duplicate of the model in the estimator class. (We try to keep estimator classes stateless except for parameter values so that they remain lightweight types.) This change will require a bit of refactoring, so I'll hold off on more comments until then.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org