[ 
https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323293#comment-14323293
 ] 

Joseph K. Bradley edited comment on SPARK-5436 at 2/16/15 9:08 PM:
-------------------------------------------------------------------

The cost of computing the error after training, rather than caching it during 
training, seems negligible (since tree training takes much longer than 
prediction).  I'd vote for keeping the API simple (allowing users to compute 
the error using the helper function mentioned in my previous comment), rather 
than adding options which could be handled using the existing API.  If users 
find that prediction takes as long as training, then we should investigate.


was (Author: josephkb):
The cost of computing the error after training, rather than caching it during 
training, seems negligible (since tree training takes much longer than 
prediction).  I'd vote for keeping the API simple, rather than adding options 
which could be handled using the existing API.  If users find that prediction 
takes as long as training, then we should investigate.

> Validate GradientBoostedTrees during training
> ---------------------------------------------
>
>                 Key: SPARK-5436
>                 URL: https://issues.apache.org/jira/browse/SPARK-5436
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> For Gradient Boosting, it would be valuable to compute test error on a 
> separate validation set during training.  That way, training could stop early 
> based on the test error (or some other metric specified by the user).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to