It is all a matter of what you are comparing too, or what the null model is.  
For most cases (standard regression) we compare a model with slope and 
intercept to an intercept only model (looking at the effect of the slope), the 
intercept only model fits a horizontal line through the mean of the y's hence 
the subtraction of the mean.  If we don't do that then R-squared can easily 
become meaningless.  Here is an example where we compute the r-squared using 
the no-intercept formula:

x <- rnorm(100, 1000, 20)
y <- rnorm(100, 1000, 20)
cor(x,y)

summary( lm( y ~ rep(1,100) + x + 0 ) )


Notice how big the r-squared value is (and that it is not anywhere near the 
square of the correlation) for data that is pretty independent.

When you force the intercept to 0, then you are using a different null model 
(mean 0).  Part of Thomas's point was that if we still subtract the mean in 
this case then the calculation of r-squared can give a negative number, which 
you pointed out is meaningless, the gist is that that is the incorrect formula 
to use and so R instead uses the formula without subtracting the mean when you 
don't fit an intercept.

The reason the r-squared values are different is because they are using 
different denominators and are therefore not comparable.

The reason that R uses 2 different formulas/denominators is because there is 
not one single formula/denominator that makes general sense in both cases.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of derek
> Sent: Thursday, March 17, 2011 9:29 AM
> To: r-help@r-project.org
> Subject: Re: [R] Strange R squared, possible error
> 
> Thats exactly what I would like to do. Any idea on good text? I've
> consulted
> severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
> Sum((y[i])^2-y*)) still less why to use different formulas for similar
> model
> or why should be R^2 closer to 1 when y=a*x+0 than in general model
> y=a*x+b.
> 
> from manual:
> r.squared R^2, the ‘fraction of variance explained by the model’,
> R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
> where y* is the mean of y[i] "if there is an intercept" and zero
> otherwise.
> 
> I don't need explaining what R^2 does nor how to interpret it, because
> I
> know what it means and how it is derived. I don't need to be told which
> model I should apply. So the answers from Thomas weren't helpful.
> 
> I don't claim it is wrong, otherwise wouldn't be employed, but I want
> to see
> the reason behind using two formulas.
> 
> Control questions:
> 1) Statement "if there is an intercept" means intercept including zero
> intercept?
> 
> 2) If I use model y = a*x+0 which formula for R^2 is used: the one with
> Y*
> or the one without?
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Strange-R-
> squared-possible-error-tp3382818p3384844.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to