IMPUTE: Re: Missing data in surveys

Laaksonen Seppo Thu, 27 Nov 2003 01:33:51 -0800

Thanks a lot, Rod for agitating the discussion.

I give some comments that are not enough discussed by MI people. I am speaking
as a user of micro files (derived from sample surveys and/or censuses), not
exploiting 'biometrics files' where MI seems to be maybe too much used (if not
used enough in NSIs like Census Bureau).


* MI people are usually speaking only MI. For me, there is always first SI (of
course using best possible imputation model and an ideal way to make real
imputation either using model/estimated values (model-donor method) or values
derived from real donors (real-donor method)). For me, we have thus first to
concentrate on SI, in order to get as the unbiased estimates as possible, incl.
quantiles, relationships between variables etc., not only totals or means. When
we have an ideal SI structure, we may go on to multiple these for variance
estimation (VE) and maybe for robusting some figures if these are too uncertain.
But there are other types of techniques for VE as well, and many of these seem
to be very competitive if done well. An advantage of MI for VE is that it is
quite easy to do if we will have a random-based 'proper' technique (and even
without this, but the VE's may be whatever, i.e. completely
under/overestimated). But the big disadvantage (in addition to find a 'proper
technique' is that it is needed a quite big number of completed data sets in
order to get the reasonably robust VEs. Such a number as five data sets does
not seem to be enough at all with real micro data. Even for a rather simple
variable like employment status, we found that about 30-50 repetitions are
needed, otherwise there will be needed too much good luck in on order to
succeed. But when going to use more complex variables (skew distribution), I do
not know what happens.

* SO, I first am interested in good point estimates (as most users who I know),
and after that naturally, I hope to obtain ideal VEs as well.

* For many data sets, especially for business survey data with a small number
of large businesses, SI is already very difficult as found in various projects,
e.g. Euredit. There are also different criteria for the performance (see
http://www.cs.york.ac.uk/euredit) of SI, a method succeeds by one criterion,
another by another etc. (see also my papers in the Journal of Applied
Statistics, 2003, and Statistics in Transition, 2002). And finally, it will be
difficult for a user to decide what method to use in practice. The same problem
continues when going to VEs.

* I do not know all software packages of the area but for example the Solas
procedure for MI under regression imputation model, gives very stupid results
when using SI or MI for variables with skew distribution. A natural reason is
that the random technique for multiplying (proper) is not correct, the
distribution assumptions are not fitting with real data. I suppose that this
will be difficult for most micro data sets of NSIs.

* And finally, a standard user of NSI micro data, how many data sets he/she
wants to receive, one or fifty?. No-one has asked fifty yet. They even do not
like one with imputed values, except if this has not been told to them. They
just want to analyse the data, some of course understand to take into account a
sampling design and the adjustment techniques (weighting). It will be a long
way to go on to use MIs. MI people are too fanatic with their method. Please
speak at least that SI comes first and MI later, and thus MI being in certain
applications very practicable for VEs.

All the best
Seppo (Laaksonen)
Univ. of Helsinki and Statistics Finland




Rod Little  (24.11.2003  13:48):
>
>Happy holidays to all!
>
>At the recent FCSM conference I was discussant at a session on missing
>data. The main thrust of my discussion was that multiple imputation was
>underutilized by survey producers as a method for handling item
>nonresponse in surveys, given that Rubin proposed the original idea over
>30 years ago and software is now much more widely available. Fritz
>Scheuren thought the discussion might stimulate an interesting debate, and
>suggested that I post the discussion on the SRMS list serve. It is
>attached to this message in pdf form, and can also be accessed on my own
>web site at
>
>http://sitemaker.umich.edu/rlittle/files/fcsmdisc.pdf
>
>Some of my remarks are specific to the papers in the session, and hence
>depend on context, but I welcome any comments!
>
>An important issue in survey nonresponse adjustments is how to deal with
>the design variables, and in particular the role of the sampling weights.
>A common approach for unit nonresponse is to multiply the sampling weight
>for each respondent by the nonresponse weight, computed within adjustment
>cells as the sum of the sampling weights for respondents and
>nonrespondents divided by the sum of the sampling weights for respondents.
>In the imputation setting this corresponds to imputing nonrespondent
>values by the sample-weighted respondent mean in the adjustment cell.
>
>In a paper recently published in Statistics in Medicine, "On weighting the
>rates in nonresponse weights", Sonya Vartivarian and I show by simulations
>that this approach is generally biased when the survey outcome is related
>to the design variables. The correct approach is to include the design
>variables as covariates when creating adjustment cells. A draft of this
>paper can be accessed at
>
>http://sitemaker.umich.edu/rlittle/files/vartivrev.pdf
>
>If too many cells result with this approach, they can be reduced by
>methods such as those described in my 2002 and 2003 JSM SRM Proceedings
>papers with Sonya. These papers are available from my web site
>
>http://sitemaker.umich.edu/rlittle
>
>by clicking on the "link to download recent papers".
>
>Rod Little
>
>References:
>
>Little, R.J. and Vartivarian, S. (2003). On weighting the rates in
>nonresponse weights. Statistics in Medicine 22, 1589-1599.
>
>Vartivarian, S. and Little, R.J.A. (2002). On the Formation of Weighting
>Adjustment Cells for Unit Nonresponse. American Statistical Association
>2002, Proceedings of the Survey Research Methods Section, 3553-3558.
>
>Vartivarian, S. and Little, R.J. A. (2003). Weighting adjustments for unit
>nonresponse with Multiple outcome variables.To appear in American
>Statistical Association 2003, Proceedings of the Survey Research Methods
>Section.
>
>_______________________________________________________________________________
____
>Roderick Little
>Richard D. Remington Collegiate Professor                  (734) 936-1003
>Department of Biostatistics                          Fax:  (734) 763-2215
>U-M School of Public Health
>M4045 SPH II                            [EMAIL PROTECTED]
>1420 Washington Hgts                    http://www.sph.umich.edu/~rlittle/
>Ann Arbor, MI 48109-2029
>

IMPUTE: Re: Missing data in surveys

Reply via email to