Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21129#discussion_r187112582
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -460,18 +461,29 @@ private[ml] trait RandomForestRegressorParams
      *
      * Note: Marked as private and DeveloperApi since this may be made public 
in the future.
      */
    -private[ml] trait GBTParams extends TreeEnsembleParams with HasMaxIter 
with HasStepSize {
    +private[ml] trait GBTParams extends TreeEnsembleParams with HasMaxIter 
with HasStepSize
    +  with HasValidationIndicatorCol {
     
    -  /* TODO: Add this doc when we add this param.  SPARK-7132
    -   * Threshold for stopping early when runWithValidation is used.
    +  /**
    +   * Threshold for stopping early when fit with validation is used.
        * If the error rate on the validation input changes by less than the 
validationTol,
    -   * then learning will stop early (before [[numIterations]]).
    -   * This parameter is ignored when run is used.
    +   * then learning will stop early (before [[maxIter]]).
    +   * This parameter is ignored when fit without validation is used.
        * (default = 1e-5)
    --- End diff --
    
    I forget why we chose 1e-5 (which is different from spark.mllib).  What do 
you think about using 0.01 to match the sklearn docs here? 
http://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_early_stopping.html
  (I also checked xgboost, but they use a different approach based on x number 
of steps without improvement.  We may want to add that at some point since it 
sounds more robust.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to