MLLib PySpark RandomForest too many features per tree

flewloon Fri, 22 Apr 2016 11:44:33 -0700

When I choose my featureSubsetStrategy for the RandomForestModel I set it to
sqrt which looks like it should let each decision tree have the sqrt of the
total number of features be picked as a feature. I have 900 features so I
thought each tree would have ~30 features or less for each tree. When I look
at the 100 trees I built it turns out that many of them turn out to be using
60-90 features. I have tried looking through the MLLib code to see why this
is happening, but haven't come up with any reasoning why this would happen.
Anyone have any idea why this would be?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-PySpark-RandomForest-too-many-features-per-tree-tp26821.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

MLLib PySpark RandomForest too many features per tree

Reply via email to