Dear Peter, Thanks for the input. The zero rates in some strata occurs because sampling depended on case status: In Finland only 50% of the non-cases were sampled, while all others were sampled with 100% probability.
Best Laust On Sat, Oct 10, 2009 at 11:02 AM, Peter Dalgaard <p.dalga...@biostat.ku.dk> wrote: > Sorry, forgot to "reply all"... > > Laust wrote: >> >> Dear list, >> >> I am trying to set up a propensity-weighted regression using the >> survey package. Most of my population is sampled with a sampling >> probability of one (that is, I have the full population). However, for >> a subset of the data I have only a 50% sample of the full population. >> In previous work on the data, I analyzed these data using SAS and >> STATA. In those packages I used a propensity weight of 1/[sampling >> probability] in various generalized linear regression-procedures, but >> I am having trouble setting this up. I bet the solution is simple, but >> I’m a R newbie. Code to illustrate my problem below. > > Hi Laust, > > You probably need the package author to explain fully, but as far as I > can see, the crux is that a dispersion parameter is being used, based on > Pearson residuals, even in the Poisson case (i.e. you effectively get > the same result as with quasipoisson()). > > I don't know what the rationale is for this, but it is clear that with > your data, an estimated dispersion parameter is going to be large. E.g. > the data has both 0 cases in 750000 person-years and 1000 cases in 5000 > person-years for Denmark, and in your model they are supposed to have > the same Poisson rate. > > summary.svyglm starts off with > > est.disp <- TRUE > > and AFAICS there is no way it can get set to FALSE. Knowing Thomas, > there is probably a perfectly good reason not to just set the dispersion > to 1, but I don't get it either... > >> >> Thanks >> Laust >> >> # loading survey >> library(survey) >> >> # creating data >> listc <- >> c("Denmark","Finland","Norway","Sweden","Denmark","Finland","Norway","Sweden") >> listw <- c(1,2,1,1,1,1,1,1) >> listd <- c(0,0,0,0,1000,1000,1000,2000) >> listt <- c(750000,500000,900000,1900000,5000,5000,5000,10000) >> list.cwdt <- c(listc, listw, listd, listt) >> country <- >> data.frame(country=listc,weight=listw,deaths=listd,yrs_at_risk=listt) >> >> # running a frequency weighted regression to get the correct point >> estimates for comparison >> glm <- glm(deaths ~ country + offset(log(yrs_at_risk)), >> weights=weight, data=country, family=poisson()) >> summary(glm) >> regTermTest(glm, ~ country) >> >> # running survey weighted regression >> svy <- svydesign(~0,,data=country, weight=~weight) >> svyglm <- svyglm(deaths ~ country + offset(log(yrs_at_risk)), >> design=svy, data=country, family=poisson()) >> summary(svyglm) >> # point estimates are correct, but standard error is way too large >> regTermTest(svyglm, ~ country) >> # test indicates no country differences >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > -- > O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.