[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

Liang-Chi Hsieh (JIRA) Wed, 25 Feb 2015 18:37:51 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337742#comment-14337742
 ]


Liang-Chi Hsieh commented on SPARK-6004:
----------------------------------------

Stopping training early is making sense for convergence problem. For 
determining iteration number, it is more common to just tune it by monitoring 
error/performance curve with regard to iteration number.

It would be great if we can stop early and get the best model without wasting 
more compute time.  But we know that the validation error does not change 
monotonically. So if you stop at 20 iterations, how do you know it will not 
gain performance again at next iteration? It is too rough to stop training just 
because the validation error is not improved compared with previous iteration.

I think that keeping validationTol is good for allowing users to know where the 
best model is located during the training iterations. So they don't really need 
to draw the error/performance curve regarding validation dataset. My concern is 
only about the default behavior of stopping training early.


> Pick the best model when training GradientBoostedTrees with validation
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6004
>                 URL: https://issues.apache.org/jira/browse/SPARK-6004
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Liang-Chi Hsieh
>            Priority: Minor
>
> Since the validation error does not change monotonically, in practice, it 
> should be proper to pick the best model when training GradientBoostedTrees 
> with validation instead of stopping it early.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

Reply via email to