[ https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323317#comment-14323317 ]
Chris T commented on SPARK-5436: -------------------------------- There is already a predict method in the model object, so in principle this can already be achieved. Currently we are iteratively extracting sub-models (with one additional tree in the model per iteration), calling predict() on the sub-model, and calculating the error (in our case MSE for a regression model). I think the helper function you're proposing does just this, right? It seemed to me that, since the error is calculated internally while the model is being built, it is essentially "free" to just store this number as the model builds. But fair enough if you don't want to add complexity to the API, or confusion on differing use cases. I don't have a good sense of how small the cost is to do the error calculation after the fact, but for large datasets it may be non-trivial. In any case, I think some of this discussion is fairly academic. :) > Validate GradientBoostedTrees during training > --------------------------------------------- > > Key: SPARK-5436 > URL: https://issues.apache.org/jira/browse/SPARK-5436 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > > For Gradient Boosting, it would be valuable to compute test error on a > separate validation set during training. That way, training could stop early > based on the test error (or some other metric specified by the user). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org