Re: [R] crosstable and regression for survey data (weighted)

Pablo Domínguez Vaselli Fri, 22 Jun 2012 09:13:20 -0700

Regarding regression models, there's a bit of discussion on whether or not
it is necessary to take the sample design into account (for instance, SPSS
doesn't), so you can run them just normally without much remorse. Or get
your life complicated (see below).


Your xtabs call seems OK to me. However, regarding tables and totals, you
can expand cases as SPSS and most software does (frequency weights) with
this code:

mydata.x <- mydata[rep(1:nrow(mydata),mydata$sweight),]

Once your dataframe is expanded this way, any totals and crosstabulations
will be right without setting any count variable on xtabs or other
functions and using just about any normal call you want (i.e. aggregate(),
table(), etc.). This approach is memory-intensive, the dataframe will be as
large as the target population.

However, in order to properly deal with complex sample data you need the
survey package (I think this is the only sound approach to your modelling
problem). This package will enable you to calculate design effects,
variance estimators and regression modelling taking the survey design into
account without hitting the RAM as above.

In that case, you must first feed the design variables to a survey design
object, using something like:

> library(survey)
> mydesign <- svydesign(ids=~vill_neigh_code+clust, strata=~stratum,
weights=~sweight, data=mydata)

Do check the survey package's vignette and help files, this is tricky. It
will also help to have the neighbors population. You must also check their
nesting (that is, if the clusters ids reuse names across strata).

Note the survey package has special functions for just about anything
(including getting your frequencies), all of them start with "svy" such as
in "svytable" and return variance estimators (note your estimation's errors
will vary tab-wise in such a complex design. Survey example:

>data(api)
>xtabs(~sch.wide+stype, data=apipop)
>dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
>summary(dclus1)
>(tbl <- svytable(~sch.wide+stype, dclus1))

Once you've specified your survey design, you can fit a design-conscious
glm model using:

>mymodel <- svyglm(var1~var2+var3, design=mydesign, family=quasibinomial())


If you're out of time just use normal xtabs and glm!

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] crosstable and regression for survey data (weighted)

Reply via email to