On Thu, Aug 13, 2009 at 11:11 PM, Mary Putt<mp...@mail.med.upenn.edu> wrote:

Hi Mary,

> I would like to use a random Forest model to get an idea about which 
> variables from a dataset may have some prognostic significance in a smallish 
> study. The default for the number of trees seems to be 500. I tried changing 
> the default to ntree=2000 or ntree=200 and the results appear identical. Have 
> changed mtry from mtry=5 to mtry=6 successfully. Have seen same problem on 
> both a Windows machine and our linux system running 2.8 and 2.9.

I don't think it's correct to call it a problem; it's more likely a
feature! Try to take a look a Breiman's paper (in the "Machine
Learning" journal), where he introduces random forests. I read it
recently, and somewhere he explicitly mentions that ntree often may be
set very low without lowering the performance.

The random forest algorithm is very robust and apparently 500 trees
are usually more than enough. Therefore you don't get better results
by using 2000 trees, and often it doesn't affect the performance if
you use fewer trees (e.g. 200).

Best,
Michael

-- 
Michael Knudsen
micknud...@gmail.com
http://lifeofknudsen.blogspot.com/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to