[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954796#comment-15954796 ]
pralabhkumar commented on SPARK-20199: -------------------------------------- Hi GBM is internally using Random Forest GradientBoostedTrees have method boost which calls DescisionTreeRegressor Train method to build the trees. private[ml] def train(data: RDD[LabeledPoint], oldStrategy: OldStrategy): DecisionTreeRegressionModel = { val instr = Instrumentation.create(this, data) instr.logParams(params: _*) val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all", seed = $(seed), instr = Some(instr), parentUID = Some(uid)) val m = trees.head.asInstanceOf[DecisionTreeRegressionModel] instr.logSuccess(m) m } Here the featureSubsetStrategy is hardcoded to "all" , is there any specific reason to do that . Shouldn't the property expose to user to chose the featureSubsetStrategy from "auto", "all" ,"sqrt" , "log2" , "onethird" . > GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter > ----------------------------------------------------------------------- > > Key: SPARK-20199 > URL: https://issues.apache.org/jira/browse/SPARK-20199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.1.0 > Reporter: pralabhkumar > Priority: Minor > > Spark GradientBoostedTreesModel doesn't have Column sampling rate parameter > . This parameter is available in H2O and XGBoost. > Sample from H2O.ai > gbmParams._col_sample_rate > Please provide the parameter . -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org