Paul & Ted: > > I can get the estimated RRs from > > > RRs <- exp(summary(GLM)$coef[,1]) > > > but do not see how to implement confidence intervals based > > on "robust error variances" using the output in GLM. > > > Thanks for the link to the data. Here's my best guess. If you use > the following approach, with the HC0 type of robust standard errors in > the "sandwich" package (thanks to Achim Zeileis), you get "almost" the > same numbers as that Stata output gives. The estimated b's from the > glm match exactly, but the robust standard errors are a bit off.
Thanks for posting the code reproducing this example! > ### Paul Johnson 2008-05-08 > ### sandwichGLM.R > system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") > > library(foreign) > > dat <- read.dta("eyestudy.dta") > > ### Ach, stata codes factor contrasts backwards > dat$carrot0 <- ifelse(dat$carrot==0,1,0) > dat$gender1 <- ifelse(dat$gender==1,1,0) > > glm1 <- glm(lenses~carrot0, data=dat, family=poisson(link=log)) > > library(sandwich) > sqrt(diag(vcovHC(glm1, type="HC0"))) > # (Intercept) carrot0 > # 0.1673655 0.1971117 > > ### Compare against > ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm > ## robust standard errors are: > ## .1682086 .1981048 I have the solution. Note that the ratio of both standard errors to those from sandwich is almost constant which suggests a scaling difference. In "sandwich" I have implemented two scaling strategies: divide by "n" (number of observations) or by "n-k" (residual degrees of freedom). This leads to R> sqrt(diag(sandwich(glm1))) (Intercept) carrot0 0.1673655 0.1971117 R> sqrt(diag(sandwich(glm1, adjust = TRUE))) (Intercept) carrot0 0.1690647 0.1991129 (Equivalently, you could youse vcovHC() with types "HC0" and "HC1", respectively.) Their standard errors are between those, so I thought that they used a different scaling. Here, n = 100 and n-k = 98 which only left R> sqrt(diag(sandwich(glm1) * 100/99)) (Intercept) carrot0 0.1682086 0.1981048 Bingo! So they scaled with n-1 rather than n or n-k. This is, of course, also consistent (if the other estimators are), but I wouldn't know of a good theoretical underpinning for it. > glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat, > family=poisson(link=log)) > > ### UCLA site has: > ## .4929193 .1963616 .1857414 .0128142 R> round(sqrt(diag(sandwich(glm2) * 100/99)), digits = 7) (Intercept) carrot0 gender1 latitude 0.4929204 0.1963617 0.1857417 0.0128142 Only the constant is a bit off, but that is not unusual in replication exercises. Some further advertising: If you want to get the full table of coefficients, you can use coeftest() from "lmtest" R> library("lmtest") R> coeftest(glm2, sandwich(glm2) * 100/99) z test of coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.652122 0.492920 -1.3230 0.18584 carrot0 0.483220 0.196362 2.4609 0.01386 * gender1 0.205201 0.185742 1.1048 0.26926 latitude -0.010009 0.012814 -0.7811 0.43474 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Best, Z ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.