[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953084#comment-15953084 ]
sudipto pal commented on SPARK-20199: ------------------------------------- [~srowen] GBM Tuning parameters unavailable in Spark 2.1.0: 1. Column Sampling Rate: present in H2O & XGBoost, important feature 2. Regularization on leaf node weights: present in XGBoost 3. learning rate annealing: present in H2O Other features missing (compared to H2O and/or XGBoost): 4. Multiclass Classification can’t be done 5. Offset: present in H2O 6. Choice of distributions do not include Gamma, Tweedie, Poisson 7. Generates classes, not probabilities (they said later version will take care of this) http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/ml/classification/GBTClassifier.html http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.regression.GBTRegressor > GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter > ----------------------------------------------------------------------- > > Key: SPARK-20199 > URL: https://issues.apache.org/jira/browse/SPARK-20199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.1.0 > Reporter: pralabhkumar > Priority: Minor > > Spark GradientBoostedTreesModel doesn't have Column sampling rate parameter > . This parameter is available in H2O and XGBoost. > Sample from H2O.ai > gbmParams._col_sample_rate > Please provide the parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org