There are two possible ways to conceptualize this problem and use one of the MI 
software. Suppose that R stands for reading and M stands for math. F, W, S 
stands for Fall Winter and Spring and number stands for the year.

Option 1: Arrange the data as

Subject-A  RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2
Subject-B                                                 RF2 RW2 RS2 MF2 MW2 
MS2 RF3 RW3 RS3 MF3 MW3 MS3
Subject-C
Subject-D

This approach will create a n x 72 completed data matrix. You can drop the 
imputations in the non-administered portion of the data set for some analysis 
or retain them, especially, in cross sectional analysis. The partial 
correlation between ab1 and cd3 will be practically zero when IVEware is used. 
We have tested this by using IVEware on "file-matching" pattern of missing data.

Option 2:

Though not sure, one may be able to use the following structure under some 
assumptions:

Subject A:   RF1 RW1 RS1 MF1 MW1 MS1 RF2 RW2 RS2 MF2 MW2 MS2 Year=1.5
Subject B:   RF2 RW2 RS2 MF2 MW2 MS2 RF3 RW3 RS3 MF3 MW3 MS3 Year=2.5
Subject C:   RF3 RW3 RS3 MF3 MW3 MS3 RF4 RW4 RS4 MF4 MW4 MS4 Year=3.5

Use year as a covariate and possibly some interactions. This makes assumptions 
about the stability of regression relationship over time and the residual 
covariance matrix has a common 12 by 12 block diagonal matrices.

My own preference is to use the Option 1 if the sample size is large and use 
Option 2 is the sample size is small.

Interesting problem.

Raghu


From: Impute -- Imputations in Data Analysis 
[[email protected]] on behalf of Paul von Hippel 
[[email protected]]
Sent: Tuesday, September 20, 2011 10:53 AM
To: [email protected]
Subject: Re: Imputing panel data, constraining correlations at long lags

Thanks, I thought a little about this. It's not obvious to me what the prior 
would be. Any recommendations?

On Tue, Sep 20, 2011 at 9:41 AM, Juned Siddique 
<[email protected]<mailto:[email protected]>> wrote:
Hi Paul,

If you use a Bayesian approach like Proc MI for the problem below, the 
posterior correlation between wave 1 and 3 is just the prior correlation. So 
one approach might be to use an informative prior for the covariance matrix 
which you can do in Proc MI.

-Juned



From: Impute -- Imputations in Data Analysis 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Paul von Hippel
Sent: Tuesday, September 20, 2011 8:21 AM

To: 
[email protected]<mailto:[email protected]>
Subject: Re: Imputing panel data, constraining correlations at long lags

Thanks, Dave. You've come up with a nicely simplified version of my problem. 
Suppose I had only three waves of data, with every subject missing either wave 
1 (your pattern A) or wave 3 (your pattern B). Ordinarily I would put the data 
in wide format --

A O1 O2 M3
B M1 O2 O3

-- and impute using a multivariate normal model. However, I don't think that 
would work in this case because the MVN model would want to estimate the 
correlation between wave 1 and wave 3, and there are no cases where both wave 1 
and wave 3 are observed.

However, if I could tell the software that this was, say, an AR(1) process -- 
or, equivalently, that partial correlation between waves 1 and 3 is zero -- I'd 
be in business.

This could be done using MVN software that allowed me to impose constraints on 
the covariance matrix, or imputation software for serially correlated data. 
Does such software exist?

Best,
Paul


________________________________
From: David Judkins <[email protected]<mailto:[email protected]>>
To: 
[email protected]<mailto:[email protected]>
Sent: Tuesday, September 20, 2011 7:25 AM
Subject: Re: Imputing panel data, constraining correlations at long lags
Paul,

This sounds pretty challenging.  Reminds me of Andrew Gelman's JSM talk and 
1998 JASA paper on imputation of questions not asked.

It also reminds me of a remark some speaker made this year at JSM about almost 
all natural processes being Markov chains. Not sure I buy that, but I think he 
meant that if you have a rich enough state vector, then one past observation is 
all you need.  Of course, that would be trivially true if the state vector 
contained lagged latent values.   In this case,I doubt your state vector is 
rich enough to compensate for the brevity of the student-level time series, but 
I guess you have to work with what you have.

Whatever you do I imagine will involve a lot of custom programming.  However, 
you might be able to Raghu's IVEware on a series of specially reshaped versions 
of your data.  For example, to impute year 3 for subject a and year 1 for 
subject B, you might create a a dataset with only A and B records in it shaped 
like this:

A O1 O2 M3
B M1 O2 O3

Once that was done, you could proceed to imputing Year 4 on A and B records and 
Year 2 on C records with a dataset shaped from B and C records as

A O2 I3 M4
B O2 O3 M4
C M2 O3 O4

And so on.  At the end of that, you would have 4 observed/imputed years per 
subject.

There should then be a way to generalize to more than 4 per subject.  Not very 
elegant, but it might work.

--Dave
________________________________
From: Impute -- Imputations in Data Analysis 
[[email protected]<mailto:[email protected]>]
 on behalf of Paul von Hippel 
[[email protected]<mailto:[email protected]>]
Sent: Monday, September 19, 2011 5:58 PM
To: 
[email protected]<mailto:[email protected]>
Subject: Imputing panel data, constraining correlations at long lags
I have panel data where different students are tested for overlapping 2-year 
periods.

  *   Subject A is observed for years 1 & 2.
  *   Subject B is observed for years 2 & 3.
  *   Subject C is observed for years 3 & 4.
  *   etc up to year 12 (of school)
For each observed year there are three separate test occasions (fall, winter, 
spring) and two subjects (reading, math).

It seems to me I can impute the  missing test scores provided I am willing to 
assume something about lags that are 2 years are longer. For example, I could 
assume that the partial correlation at lags of 2 years or longer is zero. This 
is not an unreasonable assumption since the correlations at shorter lags are 
very strong (.8-.9).

Is there software that will allow me to do this conveniently?

My usual strategy is to reshape the data from long to wide and then impute 
using a multivariate normal model. There are several packages that will permit 
this; however, I am not aware of software that will let me constrain the 
covariance matrix in the way I have described.

I have not used imputation software that are tailored for panel data -- such as 
Schafer et al's PAN package, recently ported from S-Plus to R. I could try 
that, provided there is a convenient way to restrict the long lags.

Thanks!

--
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712

mobile, preferred (614) 282-8963<tel:%28614%29%20282-8963>
office (512) 232-3650<tel:%28512%29%20232-3650>




--
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712

mobile, preferred (614) 282-8963
office (512) 232-3650

Reply via email to