[R] Random Forest % Variation vs Psuedo-R^2?

2009-06-07 Thread Ryan Harrigan
Hi all (and Andy!),
When running a randomForest run in R, I get the last part of an output
(with do.trace=T) that looks like this:

1993 |  0.04606   130.43 |
1994 |  0.04605   130.40 |
1995 |  0.04605   130.43 |
1996 |  0.04605   130.43 |
1997 |  0.04606   130.44 |
1998 |  0.04607   130.47 |
1999 |  0.04606   130.46 |
2000 |  0.04605   130.42 |

With the first column representing the iteration, the second column
representing the OOB MSE, and the last column representing the %Var(y). If I
calculate a "Psuedo-R^2" from these numbers, I would get;

1-(.04605/1.3042) = 0.965

Here's the question, if I look at the summary of forest.rf (this same run),
I get the following;

randomForest(formula = Prev ~ ., data = plas, ntree = 2000, importance =
TRUE, do.trace = T)
   Type of random forest: regression
 Number of trees: 2000
No. of variables tried at each split: 5

  Mean of squared residuals: 0.04605177
% Var explained: -30.42

What does that -30.42 % Var explained relate to? I find it interesting that
the %Var(y) is 130.42, and that the %Var explained is a very similar number,
but have no idea how they are related. From my calculations, it seems like I
have a good predictor set (Psuedo R^2 over 95%), but am I missing something?

Cheers,

Ryan


--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
ilu...@ucla.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Concern with randomForest

2009-04-06 Thread Ryan Harrigan
Hi all,
When running a randomForest run using the following command:

forestplas=randomForest(Prev~.,data=plas,ntree=20)
print(forestplas)

I get the following result:

Call:
 randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05,
importance = TRUE) 
   Type of random forest: regression
 Number of trees: 2e+05
No. of variables tried at each split: 5

  Mean of squared residuals: 0.0431127
% Var explained: -22.1



Here's my concern; what is the explanation here for a negative percent
variation explained? My understanding is that this value is calculated using
the formula;

1-MSE(OOB)/nodesize (from Liaw & Wiener's description)

Is this analagous to an r-squared that has not been run through a stepwise
procedure? Should I be removing variables not contributing to models before
running randomForest? This negative value seems contradictory to my standard
multiple regression results which indicate up to 58% of the variation
explained.

Thanks for you help on this, any comments are welcome!


--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
ilu...@ucla.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting a Quadratic...

2008-05-22 Thread Ryan Harrigan
I have an equation describing the best-fit model for a set of points (just 2
axes) that is in the form:

y=b+mx+px^2

Where b is the intercept, m is the slope describing a linear term, and p is
a slope of the quadratic term.

I would like to plot this equation on a curve (I know the equation is
y=(.1766x^2)+(.171x)+.101) on the original scatterplot. Any easy way to plot
this equation and preferably with a prediction interval around the line?

I have tried the lines() and predict() commands, using the linear model to
plot, but get very whacky results. abline works great but does not include
the quadratic term.

Any help you could provide would be much appreciated,

Ryan Harigan


--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.