I am writing some software to do multiple regression and am using r to benchmark the results. The results are squaring up nicely for the "with-intercept" case but not for the "no-intercept" case. I am not sure what R is doing to get the statistics for the 0 intercept case. For example, I would expect the Multiple R-squared to equal the square of the correlation between the actual values "y" and the fitted values "yprime". For the with-intercept case, they do, but not for the "no-intercept" case. My sample file and R session output are below
> dataset = read.table("/Users/jdhunter/tmp/sample1.csv", header=TRUE, sep=",") The "with-intercept" fit: the "Multiple R-Squared" is equal to the cor(yprime, y)**2: > fit <- lm( y~x1+x2, data=dataset) > summary(fit) Call: lm(formula = y ~ x1 + x2, data = dataset) Residuals: Min 1Q Median 3Q Max -1.8026 -0.4651 0.1778 0.5241 1.0222 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.10358 1.26103 -3.254 0.00467 ** x1 0.08641 0.03144 2.748 0.01372 * x2 0.08760 0.04548 1.926 0.07100 . --- Residual standard error: 0.7589 on 17 degrees of freedom Multiple R-squared: 0.6709, Adjusted R-squared: 0.6322 F-statistic: 17.33 on 2 and 17 DF, p-value: 7.888e-05 > yp = fitted.values(fit) > cor(yp, dataset$y)**2 [1] 0.6709279 The "no-intercept" fit: the "Multiple R-Squared" is not equal to the cor(yprime, y)**2: > fitno <- lm( y~0+x1+x2, data=dataset) > summary(fitno) Call: lm(formula = y ~ 0 + x1 + x2, data = dataset) Residuals: Min 1Q Median 3Q Max -1.69640 -0.58134 0.03650 0.53673 1.33358 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 0.03655 0.03399 1.075 0.296 x2 0.04358 0.05376 0.811 0.428 Residual standard error: 0.9395 on 18 degrees of freedom Multiple R-squared: 0.9341, Adjusted R-squared: 0.9267 F-statistic: 127.5 on 2 and 18 DF, p-value: 2.352e-11 > ypno = fitted.values(fitno) > cor(ypno, dataset$y) [1] 0.6701336 If anyone has some suggestions about how R is computing these summary stats for the no-intercept case, or references to literature or docs, tha would be helpful. It seems odd to me that dropping the intercept would cause the R^2 and F stats to rise so dramatically, and the p value to consequently drop so much. In my implementation, I get the same beta1 and beta2, and the R2 I compute using the variance_regression / variance_total agrees with cor(ypno, dataset$y) but not with the value R reports in the summary, and my F and p values are similarly off for the no-intercept case. Thanks, JDH R version 2.9.1 (2009-06-26) home:~/tmp> uname -a Darwin Macintosh-7.local 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.