[Impute] RE: imputing missing X's in survival data with time-varying covariates

gregorich Fri Dec 28 13:37:15 2007

Hi.

Months ago, I posted a query about imputing missing X's in survival data
with 
time-varying covariates.  I am *not* interested in imputing events or
event times.   
Although I did not mention it in the original post, my goal was to fit a
survival model 
that allowed for 'late entry' (left-truncation).   I have copied my
original post below.


Summarizing the complicating factors related to this application.
. time-varying covariates with missing values
. survival data, so for each respondent, the number of annual
self-report 
  assessments modeled will depend upon her event/censor time.
. eventually fitting a discrete time survival model (DTSM) that allows
for 
  'late entry' (I did not mention this in my original post)

I received helpful responses. 

Jon Mohr suggested use of Schafer's PAN program. At the time, I didn't
think 
PAN would help, but in retrospect I may unfairly dismissed that option.
Still, 
I did not pursue it.   

Paul Swank suggested use of Mplus to fit a discrete time survival model 
(DTSM) via FIML. At the time, Paul did not know that I was aiming to 
fit a survival model with late-entry. Unfortunately, the Mplus DTSM
model 
does not allow for late entry (at least that was true at the time of my
inquiry).

Because my prospects were not looking so good, David Judkins suggested
an 
alternative to MI: carrying-forward the last observed covariate value,
when 
missing.  I considered that option as a 'last resort'.


Eventually, I realized that PROC MI's monotone data MCMC method offers
a solution. It imputes just enough data to create a monotone missing
data 
pattern.  Working with a 'wide' data set (i.e., one record per
respondent), this
method will impute missing values occurring from baseline to the
event/censor
time.  It will not attempt to impute values beyond the event/censor
time. 
The imputed data can then be reshaped to have a 'long' format (one
record
per person per observation period) and modeled with PROC PHREG, which
will effectively deal with late-entry. 

Steve

------------------------------------------------------------------


> _____________________________________________ 
> From:         Gregorich, Steven  
> Sent: Wednesday, May 23, 2007 3:18 PM
> To:   IMPUTE post
> Cc:   Gregorich, Steven
> Subject:      imputing missing X's in survival data with time-varying
> covariates
> 
> Hi.
> 
> I'm looking for suggestions/literature on methods for imputing
> missing X-values (explanatory variables) in survival data with
> time-varying covariates. I am not focusing on imputation of 
> events and event times. 
> 
> Basically, we are modeling time to surgery.  Given the 
> inclusion of time-varying covariates, the number of 
> repeated assessments modeled for any particular patient 
> will depend upon her event/censor time.  The imputation 
> model should account for intra-person correlation of 
> response across repeated assessments.  
> 
> When performing multiple imputation on repeated measures 
> data with fixed assessments for all participants, I usual fit the
> imputation model to the 'wide' data set (one record of data per 
> participant) and subsequently reshape the data into 'long'
> format (one record per person-assessment) for substantive 
> modeling.  However, for survival data with time-varying 
> covariates, this method is not an attractive option because it 
> would require imputing X values for occasions that occur 
> after observed events--that would constitute a 
> misspecification of the imputation model (even though 
> such imputed values would be ignored in subsequent 
> modeling).  
> 
> I've searched the literature some, but so far no luck.
> 
> Any suggestions?
> 
> Thanks in advance.
> 
> Steve
> 
> ------------------------------------------------------------------
>  Steve Gregorich
>  University of California, San Francisco
>  Department of Medicine
>  3333 California Street, Suite 335, Box 0856
>  San Francisco, CA 94143-0856
>    (FedEx and UPS use zip code 94118)
>  [email protected]
> ------------------------------------------------------------------
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20071228/0e8dab5f/attachment.htm
From rhampton <@t> hsc.unt.edu  Mon Dec 31 20:22:32 2007
From: rhampton <@t> hsc.unt.edu (Raquel Hampton)
Date: Mon Dec 31 20:23:15 2007
Subject: [Impute] Rounding option on PROC MI and choosing a final MI dataset
Message-ID: <[email protected]>

Happy New Year (almost) All:

I am new to the listserv and have spent the past 3 hours ormore
reviewing previous post and dont believe there has been an answer to
this specific question.

First some background - I have a dataset from a survey with several
psychosocial measures - at least 200.  The dataset is primarily
complete, each item with only 1 or 2% missing, but when using all the
items together (like in regression) 15% of the dataset is dropped due to
missing data.

I am new to multiple imputation, specifically PROC MI and MIANALYZE and
have been reading whatever I can get my hands on, but still feel very
"foggy" and unclear about the procedure and its assumptions.

My first question is: there is a round option for PROC MI, but I read in
an article (Horton, N.J., Lipsitz, S.P., & Parzen, M. (2003). A
potential for bias when rounding in multiple imputation. The American
Statistician 57(4), 229-232) that using the round option for categorical
data (the items have nominal responses, ranging from 1 to 5) produces
bias estimates, though logical.  So what can be done? I only have access
to SAS and STATA, but I am not very familar with STATA.  Will this not
be such a problem since the proportion of missing for each individual
item is small?

Finally, I would like to use the imputed data as a "full" dataset,
versus using MIANALYZE to synthesize all the imputed datasets. Again, I
read that it is "improper" to choose one of the imputed datasets.
Conversely, I also read that selecting an imputed dataset is
"acceptable" if the estimates can be considered "close".  Or I could
choose to only impute 1 dataset - what is correct?

Thank you!


** Confidentiality Notice: This e-mail and any files transmitted with it are 
confidential to the extent permitted by law and intended solely for the use of 
the individual or entity to whom they are addressed. If you have received this 
e-mail in error please notify the originator of the message and destroy all 
copies. **

[Impute] RE: imputing missing X's in survival data with time-varying covariates

Reply via email to