Happy holidays to all!

At the recent FCSM conference I was discussant at a session on missing
data. The main thrust of my discussion was that multiple imputation was
underutilized by survey producers as a method for handling item
nonresponse in surveys, given that Rubin proposed the original idea over
30 years ago and software is now much more widely available. Fritz
Scheuren thought the discussion might stimulate an interesting debate, and
suggested that I post the discussion on the SRMS list serve. It is
attached to this message in pdf form, and can also be accessed on my own
web site at

http://sitemaker.umich.edu/rlittle/files/fcsmdisc.pdf

Some of my remarks are specific to the papers in the session, and hence
depend on context, but I welcome any comments!

An important issue in survey nonresponse adjustments is how to deal with
the design variables, and in particular the role of the sampling weights.
A common approach for unit nonresponse is to multiply the sampling weight
for each respondent by the nonresponse weight, computed within adjustment
cells as the sum of the sampling weights for respondents and
nonrespondents divided by the sum of the sampling weights for respondents.
In the imputation setting this corresponds to imputing nonrespondent
values by the sample-weighted respondent mean in the adjustment cell.

In a paper recently published in Statistics in Medicine, "On weighting the
rates in nonresponse weights", Sonya Vartivarian and I show by simulations
that this approach is generally biased when the survey outcome is related
to the design variables. The correct approach is to include the design
variables as covariates when creating adjustment cells. A draft of this
paper can be accessed at

http://sitemaker.umich.edu/rlittle/files/vartivrev.pdf

If too many cells result with this approach, they can be reduced by
methods such as those described in my 2002 and 2003 JSM SRM Proceedings
papers with Sonya. These papers are available from my web site

http://sitemaker.umich.edu/rlittle

by clicking on the "link to download recent papers".

Rod Little

References:

Little, R.J. and Vartivarian, S. (2003). On weighting the rates in
nonresponse weights. Statistics in Medicine 22, 1589-1599.

Vartivarian, S. and Little, R.J.A. (2002). On the Formation of Weighting
Adjustment Cells for Unit Nonresponse. American Statistical Association
2002, Proceedings of the Survey Research Methods Section, 3553-3558.

Vartivarian, S. and Little, R.J. A. (2003). Weighting adjustments for unit
nonresponse with Multiple outcome variables.To appear in American
Statistical Association 2003, Proceedings of the Survey Research Methods
Section.

___________________________________________________________________________________
Roderick Little
Richard D. Remington Collegiate Professor                  (734) 936-1003
Department of Biostatistics                          Fax:  (734) 763-2215
U-M School of Public Health                         
M4045 SPH II                            [email protected]
1420 Washington Hgts                    http://www.sph.umich.edu/~rlittle/
Ann Arbor, MI 48109-2029
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fcsmdisc.pdf
Type: application/pdf
Size: 24707 bytes
Desc: 
Url : 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20031124/748dfeb5/fcsmdisc.pdf
From arturopastori <@t> yahoo.com  Tue Nov 25 04:13:16 2003
From: arturopastori <@t> yahoo.com (Arturo Pastori)
Date: Sun Jun 26 08:25:01 2005
Subject: IMPUTE: Inter-group cross method
Message-ID: <[email protected]>

Dear All,
 
I received from a an external reviewer of a placebo-controlled trial protocol, 
a comment related to the use of the inter-group cross method for missing data 
imputation.  The trial is aimed at demonstrating superiority of the active 
treatment over placebo with primary efficacy measurements expected to be fairly 
constant over time.   The reviewer recommended to use, in addition to LOCF the 
inter-group cross method as a sensitivity analysis.

The statistical reviewer described this method stating that missing data in the 
placebo group being replaced by a random  value from completer's data in the 
treated group, while missing data in the treated group would be replaced by a 
random sample from the completer's data in the placebo group.
 
I have no experience of this method and not really keen on the underlying 
principle as it seems to bias estimates of the treatment effects toward the 
null hypothesis of no difference. 
 
Does anyone have a reference or experience of this method?
 
Regards,
 


Arturo Pastori
Senior Consulting Statistician

---------------------------------
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20031125/6eebd0cc/attachment.htm
From Seppo.Laaksonen <@t> stat.fi  Thu Nov 27 02:28:00 2003
From: Seppo.Laaksonen <@t> stat.fi (Laaksonen Seppo)
Date: Sun Jun 26 08:25:01 2005
Subject: IMPUTE: Re: Missing data in surveys 
In-Reply-To: <pine.wnt.4.21.0311240648150.1644-101...@little-home>
References: <pine.wnt.4.21.0311240648150.1644-101...@little-home>
Message-ID: <[email protected]>

Thanks a lot, Rod for agitating the discussion.

I give some comments that are not enough discussed by MI people. I am speaking
as a user of micro files (derived from sample surveys and/or censuses), not
exploiting 'biometrics files' where MI seems to be maybe too much used (if not
used enough in NSIs like Census Bureau).

* MI people are usually speaking only MI. For me, there is always first SI (of
course using best possible imputation model and an ideal way to make real
imputation either using model/estimated values (model-donor method) or values
derived from real donors (real-donor method)). For me, we have thus first to
concentrate on SI, in order to get as the unbiased estimates as possible, incl.
quantiles, relationships between variables etc., not only totals or means. When
we have an ideal SI structure, we may go on to multiple these for variance
estimation (VE) and maybe for robusting some figures if these are too uncertain.
But there are other types of techniques for VE as well, and many of these seem
to be very competitive if done well. An advantage of MI for VE is that it is
quite easy to do if we will have a random-based 'proper' technique (and even
without this, but the VE's may be whatever, i.e. completely
under/overestimated). But the big disadvantage (in addition to find a 'proper
technique' is that it is needed a quite big number of completed data sets in
order to get the reasonably robust VEs. Such a number as five data sets does
not seem to be enough at all with real micro data. Even for a rather simple
variable like employment status, we found that about 30-50 repetitions are
needed, otherwise there will be needed too much good luck in on order to
succeed. But when going to use more complex variables (skew distribution), I do
not know what happens.

* SO, I first am interested in good point estimates (as most users who I know),
and after that naturally, I hope to obtain ideal VEs as well.

* For many data sets, especially for business survey data with a small number
of large businesses, SI is already very difficult as found in various projects,
e.g. Euredit. There are also different criteria for the performance (see
http://www.cs.york.ac.uk/euredit) of SI, a method succeeds by one criterion,
another by another etc. (see also my papers in the Journal of Applied
Statistics, 2003, and Statistics in Transition, 2002). And finally, it will be
difficult for a user to decide what method to use in practice. The same problem
continues when going to VEs.

* I do not know all software packages of the area but for example the Solas
procedure for MI under regression imputation model, gives very stupid results
when using SI or MI for variables with skew distribution. A natural reason is
that the random technique for multiplying (proper) is not correct, the
distribution assumptions are not fitting with real data. I suppose that this
will be difficult for most micro data sets of NSIs.

* And finally, a standard user of NSI micro data, how many data sets he/she
wants to receive, one or fifty?. No-one has asked fifty yet. They even do not
like one with imputed values, except if this has not been told to them. They
just want to analyse the data, some of course understand to take into account a
sampling design and the adjustment techniques (weighting). It will be a long
way to go on to use MIs. MI people are too fanatic with their method. Please
speak at least that SI comes first and MI later, and thus MI being in certain
applications very practicable for VEs.

All the best
Seppo (Laaksonen)
Univ. of Helsinki and Statistics Finland




Rod Little  (24.11.2003  13:48):
>
>Happy holidays to all!
>
>At the recent FCSM conference I was discussant at a session on missing
>data. The main thrust of my discussion was that multiple imputation was
>underutilized by survey producers as a method for handling item
>nonresponse in surveys, given that Rubin proposed the original idea over
>30 years ago and software is now much more widely available. Fritz
>Scheuren thought the discussion might stimulate an interesting debate, and
>suggested that I post the discussion on the SRMS list serve. It is
>attached to this message in pdf form, and can also be accessed on my own
>web site at
>
>http://sitemaker.umich.edu/rlittle/files/fcsmdisc.pdf
>
>Some of my remarks are specific to the papers in the session, and hence
>depend on context, but I welcome any comments!
>
>An important issue in survey nonresponse adjustments is how to deal with
>the design variables, and in particular the role of the sampling weights.
>A common approach for unit nonresponse is to multiply the sampling weight
>for each respondent by the nonresponse weight, computed within adjustment
>cells as the sum of the sampling weights for respondents and
>nonrespondents divided by the sum of the sampling weights for respondents.
>In the imputation setting this corresponds to imputing nonrespondent
>values by the sample-weighted respondent mean in the adjustment cell.
>
>In a paper recently published in Statistics in Medicine, "On weighting the
>rates in nonresponse weights", Sonya Vartivarian and I show by simulations
>that this approach is generally biased when the survey outcome is related
>to the design variables. The correct approach is to include the design
>variables as covariates when creating adjustment cells. A draft of this
>paper can be accessed at
>
>http://sitemaker.umich.edu/rlittle/files/vartivrev.pdf
>
>If too many cells result with this approach, they can be reduced by
>methods such as those described in my 2002 and 2003 JSM SRM Proceedings
>papers with Sonya. These papers are available from my web site
>
>http://sitemaker.umich.edu/rlittle
>
>by clicking on the "link to download recent papers".
>
>Rod Little
>
>References:
>
>Little, R.J. and Vartivarian, S. (2003). On weighting the rates in
>nonresponse weights. Statistics in Medicine 22, 1589-1599.
>
>Vartivarian, S. and Little, R.J.A. (2002). On the Formation of Weighting
>Adjustment Cells for Unit Nonresponse. American Statistical Association
>2002, Proceedings of the Survey Research Methods Section, 3553-3558.
>
>Vartivarian, S. and Little, R.J. A. (2003). Weighting adjustments for unit
>nonresponse with Multiple outcome variables.To appear in American
>Statistical Association 2003, Proceedings of the Survey Research Methods
>Section.
>
>_______________________________________________________________________________
____
>Roderick Little
>Richard D. Remington Collegiate Professor                  (734) 936-1003
>Department of Biostatistics                          Fax:  (734) 763-2215
>U-M School of Public Health
>M4045 SPH II                            [email protected]
>1420 Washington Hgts                    http://www.sph.umich.edu/~rlittle/
>Ann Arbor, MI 48109-2029
>

Reply via email to