Hello, does anyone know if there are plans for an update of Little and 
Rubin's Missing Data book (aka Missing Data bible)? -- Possibly Rod or Don 
if they are reading this... :)
Thanks,
cd


*** Note: NEW MAILING ADDRESS ***
____________________________________________________________

Constantine Daskalakis,  ScD
Assistant Professor,
Biostatistics Section, Division of Clinical Pharmacology,
Thomas Jefferson University,
125 S. 9th St. #402, Philadelphia, PA 19107
    Tel: 215-955-5695
    Fax: 215-955-5681
    Email:  [email protected]
____________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20010417/44d99a05/attachment.htm
From m.bohlig <@t> worldnet.att.net  Sun Apr 22 00:48:03 2001
From: m.bohlig <@t> worldnet.att.net (E. Michael Bohlig)
Date: Sun Jun 26 08:24:58 2005
Subject: IMPUTE: Need help, imputing for dissertation (longish)
Message-ID: <[email protected]>

Imputers,
I am beginning to analyze data for my dissertation.  The first thing I
have to do is deal with missing data.  However, before I pose my
questions, let me give a brief description of my data.  I am analyzing a
problem behavior scale for children and adolescents that consists of 120
items rated on a 3-point scale.  Of these 120 items, 101 items make up 8
subscales and 10 of these double-load.  The remaining 19 are not
assigned to any of the subscales.  The number of items per subscale
range from 8 to 25.  My sample size is 754.  Of these, 24% are missing
one or more items with a maximum of 8 items missing for any one person
(2 respondents with missing data on 8 items).  However, no more than
2.25% (17/754) of responses is missing on any given item.  Within
subscale, the percent of items with missing data across persons ranges
from 10% (2/20 items) to 38% (3/8 items).

The data are quite skewed.  Among the 101 items that are assigned to
subscales, 46 items were rated as not observed (response = 0) by 80% or
more of the respondents; 20 items were scored as not observed by at
least 90% of the respondents.  The highest response category (response =
2) was not reported by any of the respondents for two of the 101 items.
Only 5% of the respondents used this category for about 45% of the items
and up to 10% of the respondents used the highest category on just over
70% of the items.

To deal with these missing data, I have been considering using multiple
imputation.  I do not have access to S-Plus so I cannot use Schafer?s
CAT macro.  I have read, however, that imputation using MCMC methods can
be robust to violations of normality so using a normal model may provide
adequate results (SEMNET discussion list, Oct 2000; Schafer, 1997; PROC
MI Procedure documentation, SAS, 2000).  Although the MI Procedure in
SAS is currently experimental, my data are already in SAS data sets so I
plan to use this software to impute the missing data.

After imputing my data I will be conducting confirmatory factor analyses
and item response theory analyses.  The IRT model I will be implementing
is the Graded Response Model (Samejima, 1969) which generates one slope
parameter and a separate difficulty parameter for each threshold (number
of response options ? 1).  Since there are 3 response options on the
instrument, I will be estimating a total of 3 parameters.

Now for my questions.

1)  Given the limited range of response options (0 ? 2) and the skewed
nature of the data, is the use of MCMC estimation under the assumption
of normality not appropriate?

2)  Assuming that I can proceed with the imputation using a normal
model, should I impute within subscale, or should I impute using the
full instrument?

3)  If I should impute within subscale, how do I deal with the items
that are assigned to more than one subscale?

4)  After imputing the missing data and I have several complete-data
data sets, how should I combine the parameter estimates in the IRT
analysis?  I will have three parameters per item to estimate.  How do I
determine the between-imputation variance, the within-imputation
variance, and the total variance?  How do I determine the relative
efficiency?

Any advice or words of wisdom will be greatly appreciated.
Thanks in advance,
Michael Bohlig

Reply via email to