[ML] Stop conditions for RandomForest

OBones Tue, 27 Jun 2017 08:07:53 -0700

Hello,

Reading around on the theory behind tree based regression, I concludedthat there are various reasons to stop exploring the tree when a givennode has been reached. Among these, I have those two:

1. When starting to process a node, if its size (row count) is less thanX then consider it a leaf2. When a split for a node is considered, if any side of the split hasits size less than Y, then ignore it when selecting the best split

As an example, let's consider a node with 45 rows, that for a givensplit creates two children, containing 5 and 35 rows respectively.


If I set X to 50, then the node is a leaf and no split is attempted

if I set X to 10 and Y to 15, then the splits are computed but becauseone of them has less than 15 rows, that split is ignored.

I'm using DecisionTreeRegressor and RandomForestRegressor on our dataand because the former is implemented using the latter, they both sharethe same parameters.Going through those parameters, I found minInstancesPerNode which to meis the Y value, but I could not find any parameter for the X value.

Have I missed something?
If not, would there be a way to implement this?

Regards



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[ML] Stop conditions for RandomForest

Reply via email to