Regarding regression models, there's a bit of discussion on whether or not it is necessary to take the sample design into account (for instance, SPSS doesn't), so you can run them just normally without much remorse. Or get your life complicated (see below).
Your xtabs call seems OK to me. However, regarding tables and totals, you can expand cases as SPSS and most software does (frequency weights) with this code: mydata.x <- mydata[rep(1:nrow(mydata),mydata$sweight),] Once your dataframe is expanded this way, any totals and crosstabulations will be right without setting any count variable on xtabs or other functions and using just about any normal call you want (i.e. aggregate(), table(), etc.). This approach is memory-intensive, the dataframe will be as large as the target population. However, in order to properly deal with complex sample data you need the survey package (I think this is the only sound approach to your modelling problem). This package will enable you to calculate design effects, variance estimators and regression modelling taking the survey design into account without hitting the RAM as above. In that case, you must first feed the design variables to a survey design object, using something like: > library(survey) > mydesign <- svydesign(ids=~vill_neigh_code+clust, strata=~stratum, weights=~sweight, data=mydata) Do check the survey package's vignette and help files, this is tricky. It will also help to have the neighbors population. You must also check their nesting (that is, if the clusters ids reuse names across strata). Note the survey package has special functions for just about anything (including getting your frequencies), all of them start with "svy" such as in "svytable" and return variance estimators (note your estimation's errors will vary tab-wise in such a complex design. Survey example: >data(api) >xtabs(~sch.wide+stype, data=apipop) >dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) >summary(dclus1) >(tbl <- svytable(~sch.wide+stype, dclus1)) Once you've specified your survey design, you can fit a design-conscious glm model using: >mymodel <- svyglm(var1~var2+var3, design=mydesign, family=quasibinomial()) If you're out of time just use normal xtabs and glm! [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.