I'm building a RF 50 trees at a time due to memory limitations (I have roughly 
.5 million observations and around 20 variables). I thought I could combine 
some or all of my forests later and look at global importance. 

If I have say 2 forests : tree1 and tree2, they have similar Gini and Raw 
importances and, additionally, are similar to one another. After combining 
(using the combine command) the trees into one however, the combined tree Raw 
importances have changed in rank order rather dramtically (e.g. the top most 
important becomes least important. It is not however a completely reversed 
ordering). In addtion, the scale of both the Raw and Gini importances is orders 
of magnitude smaller for the combined tree.

Note that the combined tree Gini importance looks roughly similar to the 
individual tree Gini (and Raw) importance, at least in terms of rank ordering.

I'm using the non-formula randomForest specification  along  with  
norm.votes=FALSE to facilitate  large sample  estimation  and  tree combining.

I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also using 
randomForest 4.5-18.

Any advice is appreciated,
Many thanks,
Joe 


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to