On Thu, Aug 13, 2009 at 11:11 PM, Mary Putt<mp...@mail.med.upenn.edu> wrote:
Hi Mary, > I would like to use a random Forest model to get an idea about which > variables from a dataset may have some prognostic significance in a smallish > study. The default for the number of trees seems to be 500. I tried changing > the default to ntree=2000 or ntree=200 and the results appear identical. Have > changed mtry from mtry=5 to mtry=6 successfully. Have seen same problem on > both a Windows machine and our linux system running 2.8 and 2.9. I don't think it's correct to call it a problem; it's more likely a feature! Try to take a look a Breiman's paper (in the "Machine Learning" journal), where he introduces random forests. I read it recently, and somewhere he explicitly mentions that ntree often may be set very low without lowering the performance. The random forest algorithm is very robust and apparently 500 trees are usually more than enough. Therefore you don't get better results by using 2000 trees, and often it doesn't affect the performance if you use fewer trees (e.g. 200). Best, Michael -- Michael Knudsen micknud...@gmail.com http://lifeofknudsen.blogspot.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.