Try the following: set.seed(100) rf1 <- randomForest(Species ~ ., data=iris) set.seed(100) rf2 <- randomForest(iris[1:4], iris$Species) object.size(rf1) object.size(rf2) str(rf1) str(rf2)
You can try it on your own data. That should give you some hints about why the formula interface should be avoided with large datasets. Andy -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Foreman Sent: Monday, December 03, 2012 3:43 PM To: r-help@r-project.org Subject: [R] How do I make R randomForest model size smaller? I've been training randomForest models on 7 million rows of data (41 features). Here's an example call: myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30) I thought surely with only 50 trees and 30 terminal nodes that the memory footprint of "myModel" would be small. But it's 65 megs in a dump file. The object seems to be holding all sorts of predicted, actual, and vote data from the training process. What if I just want the forest and that's it? I want a tiny dump file that I can load later to make predictions off of quickly. I feel like the forest by itself shouldn't be all that large... Anyone know how to strip this sucker down to just something I can make predictions off of going forward? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.