Hi, everybody, 3 questions about R-square: ---------(1)----------- Does R2 always increase as variables are added? ---------(2)----------- Does R2 always greater than 1? ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared calculated? It is different from (r.square=sum((y.hat-mean (y))^2)/sum((y-mean(y))^2))
I will illustrate these problems by the following codes: ---------(1)----------- R2 doesn't always increase as variables are added > x=matrix(rnorm(20),ncol=2) > y=rnorm(10) > > lm=lm(y~1) > y.hat=rep(1*lm$coefficients,length(y)) > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) [1] 2.646815e-33 > > lm=lm(y~x-1) > y.hat=x%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) [1] 0.4443356 > > ################ This is the biggest model, but its R2 is not the biggest, why? > lm=lm(y~x) > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) [1] 0.2704789 ---------(2)----------- R2 can greater than 1 > x=rnorm(10) > y=runif(10) > lm=lm(y~x-1) > y.hat=x*lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) [1] 3.513865 ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared calculated? It is different from (r.square=sum((y.hat-mean (y))^2)/sum((y-mean(y))^2)) > x=matrix(rnorm(20),ncol=2) > xx=cbind(rep(1,10),x) > y=x%*%c(1,2)+rnorm(10) > ### r2 calculated by lm(y~x) > lm=lm(y~x) > summary(lm)$r.squared [1] 0.9231062 > ### r2 calculated by lm(y~xx-1) > lm=lm(y~xx-1) > summary(lm)$r.squared [1] 0.9365253 > ### r2 calculated by me > y.hat=xx%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) [1] 0.9231062 Thanks a lot for any cue:) -- Junjie Li, [EMAIL PROTECTED] Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.