[ https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337742#comment-14337742 ]
Liang-Chi Hsieh commented on SPARK-6004: ---------------------------------------- Stopping training early is making sense for convergence problem. For determining iteration number, it is more common to just tune it by monitoring error/performance curve with regard to iteration number. It would be great if we can stop early and get the best model without wasting more compute time. But we know that the validation error does not change monotonically. So if you stop at 20 iterations, how do you know it will not gain performance again at next iteration? It is too rough to stop training just because the validation error is not improved compared with previous iteration. I think that keeping validationTol is good for allowing users to know where the best model is located during the training iterations. So they don't really need to draw the error/performance curve regarding validation dataset. My concern is only about the default behavior of stopping training early. > Pick the best model when training GradientBoostedTrees with validation > ---------------------------------------------------------------------- > > Key: SPARK-6004 > URL: https://issues.apache.org/jira/browse/SPARK-6004 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Liang-Chi Hsieh > Priority: Minor > > Since the validation error does not change monotonically, in practice, it > should be proper to pick the best model when training GradientBoostedTrees > with validation instead of stopping it early. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org