[ 
https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323317#comment-14323317
 ] 

Chris T commented on SPARK-5436:
--------------------------------

There is already a predict method in the model object, so in principle this can 
already be achieved. Currently we are iteratively extracting sub-models (with 
one additional tree in the model per iteration), calling predict() on the 
sub-model, and calculating the error (in our case MSE for a regression model). 
I think the helper function you're proposing does just this, right?

It seemed to me that, since the error is calculated internally while the model 
is being built, it is essentially "free" to just store this number as the model 
builds. But fair enough if you don't want to add complexity to the API, or 
confusion on differing use cases. I don't have a good sense of how small the 
cost is to do the error calculation after the fact, but for large datasets it 
may be non-trivial.

In any case, I think some of this discussion is fairly academic. :)

> Validate GradientBoostedTrees during training
> ---------------------------------------------
>
>                 Key: SPARK-5436
>                 URL: https://issues.apache.org/jira/browse/SPARK-5436
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> For Gradient Boosting, it would be valuable to compute test error on a 
> separate validation set during training.  That way, training could stop early 
> based on the test error (or some other metric specified by the user).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to