When I choose my featureSubsetStrategy for the RandomForestModel I set it to sqrt which looks like it should let each decision tree have the sqrt of the total number of features be picked as a feature. I have 900 features so I thought each tree would have ~30 features or less for each tree. When I look at the 100 trees I built it turns out that many of them turn out to be using 60-90 features. I have tried looking through the MLLib code to see why this is happening, but haven't come up with any reasoning why this would happen. Anyone have any idea why this would be?
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-PySpark-RandomForest-too-many-features-per-tree-tp26821.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org