[R] Random Forest % Variation vs Psuedo-R^2?
Hi all (and Andy!), When running a randomForest run in R, I get the last part of an output (with do.trace=T) that looks like this: 1993 | 0.04606 130.43 | 1994 | 0.04605 130.40 | 1995 | 0.04605 130.43 | 1996 | 0.04605 130.43 | 1997 | 0.04606 130.44 | 1998 | 0.04607 130.47 | 1999 | 0.04606 130.46 | 2000 | 0.04605 130.42 | With the first column representing the iteration, the second column representing the OOB MSE, and the last column representing the %Var(y). If I calculate a "Psuedo-R^2" from these numbers, I would get; 1-(.04605/1.3042) = 0.965 Here's the question, if I look at the summary of forest.rf (this same run), I get the following; randomForest(formula = Prev ~ ., data = plas, ntree = 2000, importance = TRUE, do.trace = T) Type of random forest: regression Number of trees: 2000 No. of variables tried at each split: 5 Mean of squared residuals: 0.04605177 % Var explained: -30.42 What does that -30.42 % Var explained relate to? I find it interesting that the %Var(y) is 130.42, and that the %Var explained is a very similar number, but have no idea how they are related. From my calculations, it seems like I have a good predictor set (Psuedo R^2 over 95%), but am I missing something? Cheers, Ryan -- Ryan Harrigan, Ph.D. Center for Tropical Research Institute of the Environment University of California, Los Angeles La Kretz Hall, Suite 300 Box 951496 Los Angeles, CA 90095-1496 203-804-9505 ilu...@ucla.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Concern with randomForest
Hi all, When running a randomForest run using the following command: forestplas=randomForest(Prev~.,data=plas,ntree=20) print(forestplas) I get the following result: Call: randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05, importance = TRUE) Type of random forest: regression Number of trees: 2e+05 No. of variables tried at each split: 5 Mean of squared residuals: 0.0431127 % Var explained: -22.1 Here's my concern; what is the explanation here for a negative percent variation explained? My understanding is that this value is calculated using the formula; 1-MSE(OOB)/nodesize (from Liaw & Wiener's description) Is this analagous to an r-squared that has not been run through a stepwise procedure? Should I be removing variables not contributing to models before running randomForest? This negative value seems contradictory to my standard multiple regression results which indicate up to 58% of the variation explained. Thanks for you help on this, any comments are welcome! -- Ryan Harrigan, Ph.D. Center for Tropical Research Institute of the Environment University of California, Los Angeles La Kretz Hall, Suite 300 Box 951496 Los Angeles, CA 90095-1496 203-804-9505 ilu...@ucla.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting a Quadratic...
I have an equation describing the best-fit model for a set of points (just 2 axes) that is in the form: y=b+mx+px^2 Where b is the intercept, m is the slope describing a linear term, and p is a slope of the quadratic term. I would like to plot this equation on a curve (I know the equation is y=(.1766x^2)+(.171x)+.101) on the original scatterplot. Any easy way to plot this equation and preferably with a prediction interval around the line? I have tried the lines() and predict() commands, using the linear model to plot, but get very whacky results. abline works great but does not include the quadratic term. Any help you could provide would be much appreciated, Ryan Harigan -- Ryan Harrigan, Ph.D. Center for Tropical Research Institute of the Environment La Kretz Hall, Suite 300 Box 951496 Los Angeles, CA 90095-1496 203-804-9505 [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.