[R] hatvalues?

2009-03-05 Thread rkevinburton
I am struiggling a bit with this function 'hatvalues'.  I would like a little 
more undrestanding than taking the black-box and using the values. I looked at 
the Fortran source and it is quite opaque to me. So I am asking for some help 
in understanding the theory. First, I take the simplest case of a single 
variant. For this I turn o John Fox's book, Applied Regression Analysis and 
Generalized Linear Models, p 245 and generate this 'R' code:

 library(car)
 attach(Davis)
# remove the NA's
 narepwt - repwt[!is.na(repwt)]
 meanrw - mean(narepwt)
 drw - narepwt - meanrw
 ssrw - sum(drw * drw)
 h - 1/length(narepwt) + (drw * drw)/ssrw
 h

This gives me a array of values the largest of which is

 order(h, decreasing=TRUE)
  [1]  21  52  17  93  30  62 158 113 175 131 182  29 106 125 123 146  91  99

So the largest hatvalue is 

 h[21]
[1] 0.1041207

Which doesn't match the 0.714 value that is reported in the book but I will 
probably take that up with the author later.

Then I use more of 'R' and I get

fit - lm(weight ~ repwt)
hr - hatvalues(fit)
hr[21]
   21 
0.1041207 

So this matches which is reasusing. My question is this, given the QR 
transformation and the residuals derived from that transformation what is a 
simple matrix formula for the hatvalues?

From http://en.wikipedia.org/wiki/Linear_regression I get

residuals = y - Hy = y(I - H)
or
H = -(residuals/y - I)

 fit - lm(weight ~ repwt)
 h - -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - 
 diag(1,length(residuals(fit)), length(residuals(fit

This generates a matrix but I cannot see any coerrelation between this 
hat-matrix and the return from hatvalues.

Comments?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hatvalues?

2009-03-05 Thread John Fox
Dear Kevin,

If you do the same regression as in the text then you'll get the same
hat-values; the regression is the one on the top of p. 244:

 mod - lm(repwt ~ weight*sex, data=Davis)
 max(hatvalues(mod))
[1] 0.7141856

As to making sense of the computations:

 X - model.matrix(mod)
 head(X)
  (Intercept) weight sexM weight:sexM
1   1 771  77
2   1 580   0
3   1 530   0
4   1 681  68
5   1 590   0
6   1 761  76
 H - X %*% solve( t(X) %*% X ) %*% t(X)
 h - diag(H)
 max(h)
[1] 0.7141856

I hope this helps,
 John


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of rkevinbur...@charter.net
 Sent: March-05-09 11:40 AM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] hatvalues?
 
 I am struiggling a bit with this function 'hatvalues'.  I would like a
little
 more undrestanding than taking the black-box and using the values. I
looked
 at the Fortran source and it is quite opaque to me. So I am asking for
some
 help in understanding the theory. First, I take the simplest case of a
single
 variant. For this I turn o John Fox's book, Applied Regression Analysis
and
 Generalized Linear Models, p 245 and generate this 'R' code:
 
  library(car)
  attach(Davis)
 # remove the NA's
  narepwt - repwt[!is.na(repwt)]
  meanrw - mean(narepwt)
  drw - narepwt - meanrw
  ssrw - sum(drw * drw)
  h - 1/length(narepwt) + (drw * drw)/ssrw
  h
 
 This gives me a array of values the largest of which is
 
  order(h, decreasing=TRUE)
   [1]  21  52  17  93  30  62 158 113 175 131 182  29 106 125 123 146  91
99
 
 So the largest hatvalue is
 
  h[21]
 [1] 0.1041207
 
 Which doesn't match the 0.714 value that is reported in the book but I
will
 probably take that up with the author later.
 
 Then I use more of 'R' and I get
 
 fit - lm(weight ~ repwt)
 hr - hatvalues(fit)
 hr[21]
21
 0.1041207
 
 So this matches which is reasusing. My question is this, given the QR
 transformation and the residuals derived from that transformation what is
a
 simple matrix formula for the hatvalues?
 
 From http://en.wikipedia.org/wiki/Linear_regression I get
 
 residuals = y - Hy = y(I - H)
 or
 H = -(residuals/y - I)
 
  fit - lm(weight ~ repwt)
  h - -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] -
 diag(1,length(residuals(fit)), length(residuals(fit
 
 This generates a matrix but I cannot see any coerrelation between this
hat-
 matrix and the return from hatvalues.
 
 Comments?
 
 Thank you.
 
 Kevin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.