On Wed, 12 Feb 2003, Thompson, Trevor wrote: > Hi, > > I have been experimenting with the new Survey package. Specifically, I was > trying to use some of the functions on the public-use survey data from NHIS > (2000 Sample Adult file). > > Error 1): The first error I get is when I try to specify the complex survey > design. > > nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df, > check.strata=TRUE) > Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data = > nhis.df, : > Clusters not nested in strata > > My data are sorted by strata, psu. Can someone tell me what the structure > has to be for a stratified sample with clustering? Looking at the code, it > appears to me that it does not allow more than 1 observation per psu [i.e. > any(sc > 1)].
The problem is probably that your id numbers for PSU start up again in each stratum (eg you have a PSU numbered 1 in each stratum). If so, you need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1 in different strata are really different PSUs > Error 2). If I go ahead and specify check.strata=FALSE, then svydesign runs > ok. I then tried using the svymean function. In the following example, if > I specify na.rm=TRUE, I get the error below: No, it doesn't run ok, it just doesn't report an error. > > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE) > Error in rowsum.default(x, strata) : Incorrect length for 'group' > > I traced this to the svyCprod call within svymean. SvyCprod calls rowsum > and the group argument ("strata") appears to be the full length of that > column rather than the subset with non-missing data. With missing data you do need to use the data stored in the design object, not a separate data frame, otherwise it will get confused. That is, you want svymean(~crc10yr, design=nhis.design, na.rm=TRUE) > Error 3). I then tried svymean on another variable with na.rm=FALSE. I got > the following error: > > > svymean(nhis.df$age, design=nhis.design) > Error in drop(rval) : names attribute must be the same length as the vector > > I also traced this error to a call to rowsum within the function svyCprod. > I'm not sure what names attribute this is referring to because the arguments > to rowsum and the rval object do not appear to have a names attribute. Does > anyone know what the problem here might be? This might be the same problem, in which case svymean(~age, design=nhis.design) should work. You should also make sure you have version 1.0 of `survey' rather than any of them 0.9-x versions that went up briefly on CRAN. If you tell me where to find the NHIS data I will look at them. There shouldn't be any special requirements on the format (other than using nest=TRUE if PSUs don't have globally unique ids). I've looked at data from some NCHS studies that are used as examples by Stata, and I don't have any of these problems. Incidentally, you should try writing to the package maintainer first, rather than the list. In this case it doesn't matter, since I read the list frequently, but it might in other cases. -thomas ______________________________________________ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help