Re: [R] OLS standard errors

2008-02-26 Thread Viechtbauer Wolfgang (STAT)
Try multiplying var(e) by n-1 and dividing by n-3 (97 are the degrees of 
freedom for the sum of squares error in your example). Then it all matches up 
nicely.

Best,

-- 
Wolfgang Viechtbauer
 Department of Methodology and Statistics
 University of Maastricht, The Netherlands
 http://www.wvbauer.com/



Original Message
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Daniel Malter Sent:
Tuesday, February 26, 2008 08:25 To: 'r-help'
Subject: [R] OLS standard errors

> Hi,
> 
> the standard errors of the coefficients in two regressions that I
> computed by hand and using lm() differ by about 1%. Can somebody help
> me to identify the source of this difference? The coefficient
> estimates are the same, but the standard errors differ.   
> 
> Simulate data
> 
>   happiness=0
>   income=0
>   gender=(rep(c(0,1,1,0),25))
>   for(i in 1:100){
>   happiness[i]=1000+i+rnorm(1,0,40)
>   income[i]=2*i+rnorm(1,0,40)
>   }
> 
> Run lm()
> 
>   reg=lm(happiness~income+factor(gender))
>   summary(reg)
> 
> Compute coefficient estimates "by hand"
> 
>   x=cbind(income,gender)
>   y=happiness
> 
>   z=as.matrix(cbind(rep(1,100),x))
>   beta=solve(t(z)%*%z)%*%t(z)%*%y
> 
> Compare estimates
> 
>   cbind(reg$coef,beta)  ##fine so far, they both look the same
> 
>   reg$coef[1]-beta[1]
>   reg$coef[2]-beta[2]
>   reg$coef[3]-beta[3] ##differences are too small to cause a 1%
> difference
> 
> Check predicted values
> 
>   estimates=c(beta[2],beta[3])
> 
>   predicted=estimates%*%t(x)
>   predicted=as.vector(t(as.double(predicted+beta[1])))
> 
>   cbind(reg$fitted,predicted) ##inspect fitted values
>   as.vector(reg$fitted-predicted) ##differences are marginal
> 
>  Compute errors
> 
>   e=NA
>   e2=NA
>   for(i in 1:length(happiness)){
>   e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
>   e2[i]=y[i]-reg$fitted[i] ##for lm() regression
>   }
> 
>  Compute standard error of the coefficients
> 
>   sqrt(abs(var(e)*solve(t(z)%*%z)))   ##for "hand-computed" regression
>   sqrt(abs(var(e2)*solve(t(z)%*%z)))  ##for lm() regression using e2
> from 
> above
> 
>   ##Both are the same
> 
> Compare to lm() standard errors of the coefficients again
> 
>   summary(reg)
> 
> 
> The diagonal elements of the variance/covariance matrices should be
> the standard errors of the coefficients. Both are identical when
> computed by hand. However, they differ from the standard errors
> reported in summary(reg). The difference of 1% seems nonmarginal.
> Should I have multiplied var(e)*solve(t(z)%*%z) by n and divided by
> n-1? But even if I do this, I still observe a difference. Can anybody
> help me out what the source of this difference is?  
> 
> Cheers,
> Daniel
> 
> 
> -
> cuncta stricte discussurus
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OLS standard errors

2008-02-26 Thread Prof Brian Ripley
Please check your statistical methods lecture notes.  var(e) has divisor 
n-1, and that is not an unbiased estimator of the residual variance when 
'e' are residuals.  From summary.lm (and you are allowed to read the code)

 rdf <- n - p
 if (is.na(z$df.residual) || rdf != z$df.residual)
 warning("residual degrees of freedom in object suggest this is not 
an \"lm\" fit")
 p1 <- 1:p
 r <- z$residuals
 f <- z$fitted.values
 w <- z$weights
 if (is.null(w)) {
 mss <- if (attr(z$terms, "intercept"))
 sum((f - mean(f))^2)
 else sum(f^2)
 rss <- sum(r^2)
 }
 else {
 mss <- if (attr(z$terms, "intercept")) {
 m <- sum(w * f/sum(w))
 sum(w * (f - m)^2)
 }
 else sum(w * f^2)
 rss <- sum(w * r^2)
 r <- sqrt(w) * r
 }
 resvar <- rss/rdf

the correct divisor is n-p.  Since p=3 in your example, that explains a 2% 
difference in variances and hence a 1% difference in SEs.

On Tue, 26 Feb 2008, Daniel Malter wrote:

> Hi,
>
> the standard errors of the coefficients in two regressions that I computed
> by hand and using lm() differ by about 1%. Can somebody help me to identify
> the source of this difference? The coefficient estimates are the same, but
> the standard errors differ.
>
> Simulate data
>
>   happiness=0
>   income=0
>   gender=(rep(c(0,1,1,0),25))
>   for(i in 1:100){
>   happiness[i]=1000+i+rnorm(1,0,40)
>   income[i]=2*i+rnorm(1,0,40)
>   }
>
> Run lm()
>
>   reg=lm(happiness~income+factor(gender))
>   summary(reg)
>
> Compute coefficient estimates "by hand"
>
>   x=cbind(income,gender)
>   y=happiness
>
>   z=as.matrix(cbind(rep(1,100),x))
>   beta=solve(t(z)%*%z)%*%t(z)%*%y
>
> Compare estimates
>
>   cbind(reg$coef,beta)  ##fine so far, they both look the same
>
>   reg$coef[1]-beta[1]
>   reg$coef[2]-beta[2]
>   reg$coef[3]-beta[3] ##differences are too small to cause a 1%
> difference
>
> Check predicted values
>
>   estimates=c(beta[2],beta[3])
>
>   predicted=estimates%*%t(x)
>   predicted=as.vector(t(as.double(predicted+beta[1])))
>
>   cbind(reg$fitted,predicted) ##inspect fitted values
>   as.vector(reg$fitted-predicted) ##differences are marginal
>
>  Compute errors
>
>   e=NA
>   e2=NA
>   for(i in 1:length(happiness)){
>   e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
>   e2[i]=y[i]-reg$fitted[i] ##for lm() regression
>   }
>
>  Compute standard error of the coefficients
>
>  sqrt(abs(var(e)*solve(t(z)%*%z)))##for "hand-computed" regression
>  sqrt(abs(var(e2)*solve(t(z)%*%z)))   ##for lm() regression using e2 from
> above
>
>   ##Both are the same
>
> Compare to lm() standard errors of the coefficients again
>
>   summary(reg)
>
>
> The diagonal elements of the variance/covariance matrices should be the
> standard errors of the coefficients. Both are identical when computed by
> hand. However, they differ from the standard errors reported in
> summary(reg). The difference of 1% seems nonmarginal. Should I have
> multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
> this, I still observe a difference. Can anybody help me out what the source
> of this difference is?
>
> Cheers,
> Daniel
>
>
> -
> cuncta stricte discussurus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OLS standard errors

2008-02-25 Thread Peter Dalgaard
Daniel Malter wrote:
> Hi,
>
> the standard errors of the coefficients in two regressions that I computed
> by hand and using lm() differ by about 1%. Can somebody help me to identify
> the source of this difference? The coefficient estimates are the same, but
> the standard errors differ. 
>
> Simulate data
>   
>   happiness=0
>   income=0
>   gender=(rep(c(0,1,1,0),25))
>   for(i in 1:100){
>   happiness[i]=1000+i+rnorm(1,0,40)
>   income[i]=2*i+rnorm(1,0,40)
>   }
>
> Run lm()
>
>   reg=lm(happiness~income+factor(gender))
>   summary(reg)
>
> Compute coefficient estimates "by hand"
>
>   x=cbind(income,gender)
>   y=happiness
>
>   z=as.matrix(cbind(rep(1,100),x))
>   beta=solve(t(z)%*%z)%*%t(z)%*%y
>
> Compare estimates
>
>   cbind(reg$coef,beta)  ##fine so far, they both look the same
>   
>   reg$coef[1]-beta[1]
>   reg$coef[2]-beta[2]
>   reg$coef[3]-beta[3] ##differences are too small to cause a 1%
> difference
>
> Check predicted values
>
>   estimates=c(beta[2],beta[3])
>
>   predicted=estimates%*%t(x)
>   predicted=as.vector(t(as.double(predicted+beta[1])))
>   
>   cbind(reg$fitted,predicted) ##inspect fitted values
>   as.vector(reg$fitted-predicted) ##differences are marginal
>
>  Compute errors
>
>   e=NA
>   e2=NA
>   for(i in 1:length(happiness)){
>   e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
>   e2[i]=y[i]-reg$fitted[i] ##for lm() regression
>   }
>
>  Compute standard error of the coefficients
>
>   sqrt(abs(var(e)*solve(t(z)%*%z)))   ##for "hand-computed" regression
>   sqrt(abs(var(e2)*solve(t(z)%*%z)))  ##for lm() regression using e2 from
> above
>
>   ##Both are the same
>
> Compare to lm() standard errors of the coefficients again
>
>   summary(reg)
>
>
> The diagonal elements of the variance/covariance matrices should be the
> standard errors of the coefficients. Both are identical when computed by
> hand. However, they differ from the standard errors reported in
> summary(reg). The difference of 1% seems nonmarginal. Should I have
> multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
> this, I still observe a difference. Can anybody help me out what the source
> of this difference is?
>
>   
The degrees of freedom in a regression analysis is n minus the number of 
parameters, three in your case. I.e. the variance var(e) does not know 
about this and divides by n-1 where it should have been n-3, so.

> Cheers,
> Daniel
>
>
> -
> cuncta stricte discussurus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] OLS standard errors

2008-02-25 Thread Daniel Malter
Hi,

the standard errors of the coefficients in two regressions that I computed
by hand and using lm() differ by about 1%. Can somebody help me to identify
the source of this difference? The coefficient estimates are the same, but
the standard errors differ. 

Simulate data

happiness=0
income=0
gender=(rep(c(0,1,1,0),25))
for(i in 1:100){
happiness[i]=1000+i+rnorm(1,0,40)
income[i]=2*i+rnorm(1,0,40)
}

Run lm()

reg=lm(happiness~income+factor(gender))
summary(reg)

Compute coefficient estimates "by hand"

x=cbind(income,gender)
y=happiness

z=as.matrix(cbind(rep(1,100),x))
beta=solve(t(z)%*%z)%*%t(z)%*%y

Compare estimates

cbind(reg$coef,beta)  ##fine so far, they both look the same

reg$coef[1]-beta[1]
reg$coef[2]-beta[2]
reg$coef[3]-beta[3] ##differences are too small to cause a 1%
difference

Check predicted values

estimates=c(beta[2],beta[3])

predicted=estimates%*%t(x)
predicted=as.vector(t(as.double(predicted+beta[1])))

cbind(reg$fitted,predicted) ##inspect fitted values
as.vector(reg$fitted-predicted) ##differences are marginal

 Compute errors

e=NA
e2=NA
for(i in 1:length(happiness)){
e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
e2[i]=y[i]-reg$fitted[i] ##for lm() regression
}

 Compute standard error of the coefficients

  sqrt(abs(var(e)*solve(t(z)%*%z))) ##for "hand-computed" regression
  sqrt(abs(var(e2)*solve(t(z)%*%z)))##for lm() regression using e2 from
above

##Both are the same

Compare to lm() standard errors of the coefficients again

summary(reg)


The diagonal elements of the variance/covariance matrices should be the
standard errors of the coefficients. Both are identical when computed by
hand. However, they differ from the standard errors reported in
summary(reg). The difference of 1% seems nonmarginal. Should I have
multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
this, I still observe a difference. Can anybody help me out what the source
of this difference is?

Cheers,
Daniel


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.