Re: [R] Running randomForests on large datasets

2008-02-27 Thread Liaw, Andy
There are a couple of things you may want to try, if you can load the data into R and still have enough to spare: - Run randomForest() with fewer trees, say 10 to start with. - Run randomForest() with nodesize set to something larger than the default (5 for classification). This puts a limit on

Re: [R] Running randomForests on large datasets

2008-02-27 Thread Nagu
Thank you Andy. It is throwing memory allocation error for me for numerous combinations of ntree and nodesize values. I tried with memory.limit() and memory.size to use the maximum memory but the error was consistent. But one thing I noticed was that I had tough time even just loading the dataset

Re: [R] Running randomForests on large datasets

2008-02-27 Thread Max Kuhn
Also, use the non-formula interface to the function: # saves some space randomForest(x, y) the formula interface: # avoid: randomForest(y~., data = something) This second method saves a terms object that is very sparse and takes up a lot of space. Max On Wed, Feb 27, 2008 at 12:31

[R] Running randomForests on large datasets

2008-02-25 Thread Nagu
Hi, I am trying to run randomForests on a datasets of size 50X650 and R pops up memory allocation error. Are there any better ways to deal with large datasets in R, for example, Splus had something like bigData library. Thank you, Nagu __