Re: Imputing panel data, constraining correlations at long lags

Juned Siddique Tue, 20 Sep 2011 07:52:40 -0700

Hi Paul,

If you use a Bayesian approach like Proc MI for the problem below, the 
posterior correlation between wave 1 and 3 is just the prior correlation. So 
one approach might be to use an informative prior for the covariance matrix 
which you can do in Proc MI.

-Juned

From: Impute -- Imputations in Data Analysis 
[mailto:[email protected]] On Behalf Of Paul von Hippel
Sent: Tuesday, September 20, 2011 8:21 AM
To: [email protected]
Subject: Re: Imputing panel data, constraining correlations at long lags

Thanks, Dave. You've come up with a nicely simplified version of my problem. 
Suppose I had only three waves of data, with every subject missing either wave 
1 (your pattern A) or wave 3 (your pattern B). Ordinarily I would put the data 
in wide format --

A O1 O2 M3
B M1 O2 O3

-- and impute using a multivariate normal model. However, I don't think that 
would work in this case because the MVN model would want to estimate the 
correlation between wave 1 and wave 3, and there are no cases where both wave 1 
and wave 3 are observed.

However, if I could tell the software that this was, say, an AR(1) process -- 
or, equivalently, that partial correlation between waves 1 and 3 is zero -- I'd 
be in business.

This could be done using MVN software that allowed me to impose constraints on 
the covariance matrix, or imputation software for serially correlated data. 
Does such software exist?

Best,
Paul

________________________________
From: David Judkins <[email protected]<mailto:[email protected]>>
To: 
[email protected]<mailto:[email protected]>
Sent: Tuesday, September 20, 2011 7:25 AM
Subject: Re: Imputing panel data, constraining correlations at long lags
Paul,

This sounds pretty challenging.  Reminds me of Andrew Gelman's JSM talk and 
1998 JASA paper on imputation of questions not asked.

It also reminds me of a remark some speaker made this year at JSM about almost 
all natural processes being Markov chains. Not sure I buy that, but I think he 
meant that if you have a rich enough state vector, then one past observation is 
all you need.  Of course, that would be trivially true if the state vector 
contained lagged latent values.   In this case,I doubt your state vector is 
rich enough to compensate for the brevity of the student-level time series, but 
I guess you have to work with what you have.

Whatever you do I imagine will involve a lot of custom programming.  However, 
you might be able to Raghu's IVEware on a series of specially reshaped versions 
of your data.  For example, to impute year 3 for subject a and year 1 for 
subject B, you might create a a dataset with only A and B records in it shaped 
like this:

A O1 O2 M3
B M1 O2 O3

Once that was done, you could proceed to imputing Year 4 on A and B records and 
Year 2 on C records with a dataset shaped from B and C records as

A O2 I3 M4
B O2 O3 M4
C M2 O3 O4

And so on.  At the end of that, you would have 4 observed/imputed years per 
subject.

There should then be a way to generalize to more than 4 per subject.  Not very 
elegant, but it might work.

--Dave
________________________________
From: Impute -- Imputations in Data Analysis 
[[email protected]] on behalf of Paul von Hippel 
[[email protected]]
Sent: Monday, September 19, 2011 5:58 PM
To: 
[email protected]<mailto:[email protected]>
Subject: Imputing panel data, constraining correlations at long lags
I have panel data where different students are tested for overlapping 2-year 
periods.

  *   Subject A is observed for years 1 & 2.
  *   Subject B is observed for years 2 & 3.
  *   Subject C is observed for years 3 & 4.
  *   etc up to year 12 (of school)
For each observed year there are three separate test occasions (fall, winter, 
spring) and two subjects (reading, math).

It seems to me I can impute the  missing test scores provided I am willing to 
assume something about lags that are 2 years are longer. For example, I could 
assume that the partial correlation at lags of 2 years or longer is zero. This 
is not an unreasonable assumption since the correlations at shorter lags are 
very strong (.8-.9).

Is there software that will allow me to do this conveniently?

My usual strategy is to reshape the data from long to wide and then impute 
using a multivariate normal model. There are several packages that will permit 
this; however, I am not aware of software that will let me constrain the 
covariance matrix in the way I have described.

I have not used imputation software that are tailored for panel data -- such as 
Schafer et al's PAN package, recently ported from S-Plus to R. I could try 
that, provided there is a convenient way to restrict the long lags.

Thanks!

--
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712

mobile, preferred (614) 282-8963
office (512) 232-3650

Re: Imputing panel data, constraining correlations at long lags

Reply via email to