[ML] RandomForestRegressor training set size for each trees

OBones Mon, 05 Mar 2018 03:07:49 -0800

We are using |RandomForestRegressor| from Spark 2.1.1 to train a model.

To make sure we have the appropriate parameters we start with a verysmall dataset, one that has 6024 lines. The regressor is created withthis code:

|val rf = new RandomForestRegressor() .setLabelCol("MyLabel").setFeaturesCol("MyFeatures") .setImpurity("variance") .setMaxDepth(3.).setMinInstancesPerNode(1) .setMinInfoGain(0) .setNumTrees(2).setFeatureSubsetStrategy("onethird") .setMaxBins(32).setSubsamplingRate(1) val model = rf.fit(train) |

What I find strange is that this value for each |rootNode| is not always6024 but sometimes more and sometimes less.From my understanding of the method I was under the impression thateach tree would be trained with exactly the same number of rows than theoriginal training set.

Looking at the source code, I could not fully figure out where thishappens, nor why it was decided to do so.


Are there any resources discussing this behavior?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[ML] RandomForestRegressor training set size for each trees

Reply via email to