Model based inference is strongly dependent upon having the correct model and has potential to have significant bias. There is a payoff with small sample sizes with the correct model. While it doesn't directly address the question of using SUDAAN vs. standard software, Hansen, Madow, and Tepping did simulations comparing 5 estimators with stratified samples, including both model based and design based as well as the simple unbiased stratified estimator. The model based estimators were substantially biased. One of the model based estimators had slightly smaller average (over replications) mean squared error than the simple estimator with smaller sample sizes. The model based estimators had the largest mean squared error among the 5 estimators in the larger sample size problems even though they had the smallest variances because of their bias.
--------------------- Hansen, M; Madow, W, and Tepping, B. "An evaluation of model-dependent and probability-sampling inferences in sample surveys". Journal of the American Statistical Association, 12/1983, Vol. 78, No. 384, pp.776-793. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH (508) 856-2625 Lecturer in Biostatistics (209) 391-7902 fax Academic Computing Services University of Massachusetts Medical School 55 Lake Avenue North [email protected] Worcester, MA 01655 USA ----- Original Message ----- From: <[email protected]> To: <[email protected]>; <[email protected]> Sent: Sunday, March 18, 2001 1:03 PM Subject: IMPUTE: Re: Survey analysis: "ordinary" survey software or multiple imputation > The argument against doing weighted analysis to account > for oversampling is a strong one, as weighted analyses > produce estimates with higher variance. Cluster sampling > is an altogether different issue. To get proper variances, > clustering must be taken into account. Fortunately, this can > often be simple, using the cluster bootstrap or the > cluster version of the Huber sandwich covariance estimator. > > Frank Harrell > > > Jan Brogger wrote: > > > > After I sent the original mail, I found this in the Encyclopedia of > > Biostatistics (2): > > > > "There is an ongoing debate as to whether the sample design must be > > considered when deriving statistical models (as opposed to estimates of > > means, proportions, totals, and ratios) based on sample survey data. > > Analysts interested in using statistical techniques such as linear > > regression, logistic regression, survival analysis, or categorical data > > analysis on survey data are divided as to whether they feel it is necessary > > to use specialized software. The model-based analysts argue that, as long > > as the model is specified correctly, they can proceed without recognizing > > aspects of the survey design (such as stratification, clustering, and > > unequal selection probabilities), and can therefore use standard > > statistical packages. The design-based analysts argue to the contrary that > > it is important to account for the survey design when estimating models. > > The debate between these two factions has been ongoing for quite awhile and > > is not likely to be resolved soon (Groves [14], Skinner et al. [29], Korn > > and Graubard [22], Hansen et al. [16]). A compromise position adopted by > > some is to use standard statistical software in modeling analyses, but to > > incorporate into the model the variables that were used to define the > > strata, the PSUs and the weights. " > > > > Most epidemiologists are mode builders, not population describers. If you > > do a "once-and-for-all" multiple imputation, you can account for many of > > the features of a two-stage survey (except that I don't know about the > > clustering thing). Am I right ? > > > > Small typo correction: > > > > "Case 5: instead of a simple random sample drawn from the non-responders, > > draw a _stratified sample_ with differential sampling probabilities, > > depending on Y. " > > should read > > "Case 5: instead of a simple random sample drawn from the non-responders, > > draw a _stratified sample_ from responders with differential sampling > > probabilities , > > depending on Y. " > > > > 1. Brogan DJ. Pitfalls of Using Standard Statistical Software Packages for > > Sample Survey Data. In: Armitage P and Colton P , eds. Encyclopedia of > > biostatistics. Chichester: John Wiley & Sons Ltd, 1998. > > http://www.fas.harvard.edu/~stats/survey-soft/blc_eob.html > > 2. Carlson BL. Software for Statistical Analysis of Sample Survey Data. In: > > Armitage P and Colton P , eds. Encyclopedia of biostatistics. Chichester: > > John Wiley & Sons Ltd, 1998. > > http://www.fas.harvard.edu/~stats/survey-soft/donna_brogan.html > > > > Yours, > > Jan Brogger > > -- > Frank E Harrell Jr Prof. of Biostatistics & Statistics > Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences > U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat >
