Hi all, When running a randomForest run using the following command: forestplas=randomForest(Prev~.,data=plas,ntree=200000) print(forestplas)
I get the following result: Call: randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05, importance = TRUE) Type of random forest: regression Number of trees: 2e+05 No. of variables tried at each split: 5 Mean of squared residuals: 0.0431127 % Var explained: -22.1 Here's my concern; what is the explanation here for a negative percent variation explained? My understanding is that this value is calculated using the formula; 1-MSE(OOB)/nodesize (from Liaw & Wiener's description) Is this analagous to an r-squared that has not been run through a stepwise procedure? Should I be removing variables not contributing to models before running randomForest? This negative value seems contradictory to my standard multiple regression results which indicate up to 58% of the variation explained. Thanks for you help on this, any comments are welcome! -- Ryan Harrigan, Ph.D. Center for Tropical Research Institute of the Environment University of California, Los Angeles La Kretz Hall, Suite 300 Box 951496 Los Angeles, CA 90095-1496 203-804-9505 ilu...@ucla.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.