Thanks, I thought a little about this. It's not obvious to me what the prior would be. Any recommendations?
On Tue, Sep 20, 2011 at 9:41 AM, Juned Siddique <[email protected]>wrote: > Hi Paul,**** > > ** ** > > If you use a Bayesian approach like Proc MI for the problem below, the > posterior correlation between wave 1 and 3 is just the prior correlation. So > one approach might be to use an informative prior for the covariance matrix > which you can do in Proc MI.**** > > ** ** > > -Juned**** > > ** ** > > ** ** > > ** ** > > *From:* Impute -- Imputations in Data Analysis [mailto: > [email protected]] *On Behalf Of *Paul von Hippel > *Sent:* Tuesday, September 20, 2011 8:21 AM > > *To:* [email protected] > *Subject:* Re: Imputing panel data, constraining correlations at long lags > **** > > ** ** > > Thanks, Dave. You've come up with a nicely simplified version of my > problem. Suppose I had only three waves of data, with every subject missing > either wave 1 (your pattern A) or wave 3 (your pattern B). Ordinarily I > would put the data in wide format -- **** > > ** ** > > A O1 O2 M3**** > > B M1 O2 O3**** > > ** ** > > -- and impute using a multivariate normal model. However, I don't think > that would work in this case because the MVN model would want to estimate > the correlation between wave 1 and wave 3, and there are no cases where both > wave 1 and wave 3 are observed.**** > > ** ** > > However, if I could tell the software that this was, say, an AR(1) process > -- or, equivalently, that partial correlation between waves 1 and 3 is zero > -- I'd be in business.**** > > ** ** > > This could be done using MVN software that allowed me to impose constraints > on the covariance matrix, or imputation software for serially correlated > data. Does such software exist?**** > > ** ** > > Best,**** > > Paul**** > > ** ** > > ** ** > ------------------------------ > > *From:* David Judkins <[email protected]> > *To:* [email protected] > *Sent:* Tuesday, September 20, 2011 7:25 AM > *Subject:* Re: Imputing panel data, constraining correlations at long lags > **** > > Paul,**** > > **** > > This sounds pretty challenging. Reminds me of Andrew Gelman's JSM talk and > 1998 JASA paper on imputation of questions not asked. **** > > **** > > It also reminds me of a remark some speaker made this year at JSM about > almost all natural processes being Markov chains. Not sure I buy that, but I > think he meant that if you have a rich enough state vector, then one past > observation is all you need. Of course, that would be trivially true if the > state vector contained lagged latent values. In this case,I doubt your > state vector is rich enough to compensate for the brevity of the > student-level time series, but I guess you have to work with what you have. > **** > > **** > > Whatever you do I imagine will involve a lot of custom programming. > However, you might be able to Raghu's IVEware on a series of specially > reshaped versions of your data. For example, to impute year 3 for subject a > and year 1 for subject B, you might create a a dataset with only A and B > records in it shaped like this:**** > > **** > > A O1 O2 M3**** > > B M1 O2 O3**** > > **** > > Once that was done, you could proceed to imputing Year 4 on A and B records > and Year 2 on C records with a dataset shaped from B and C records as**** > > **** > > A O2 I3 M4**** > > B O2 O3 M4**** > > C M2 O3 O4**** > > **** > > And so on. At the end of that, you would have 4 observed/imputed years per > subject. **** > > **** > > There should then be a way to generalize to more than 4 per subject. Not > very elegant, but it might work.**** > > **** > > --Dave**** > ------------------------------ > > *From:* Impute -- Imputations in Data Analysis [ > [email protected]] on behalf of Paul von Hippel [ > [email protected]] > *Sent:* Monday, September 19, 2011 5:58 PM > *To:* [email protected] > *Subject:* Imputing panel data, constraining correlations at long lags**** > > I have panel data where different students are tested for overlapping > 2-year periods. **** > > - Subject A is observed for years 1 & 2. **** > - Subject B is observed for years 2 & 3. **** > - Subject C is observed for years 3 & 4. **** > - etc up to year 12 (of school)**** > > For each observed year there are three separate test occasions (fall, > winter, spring) and two subjects (reading, math). > > It seems to me I can impute the missing test scores provided I am willing > to assume something about lags that are 2 years are longer. For example, I > could assume that the partial correlation at lags of 2 years or longer is > zero. This is not an unreasonable assumption since the correlations at > shorter lags are very strong (.8-.9). > > Is there software that will allow me to do this conveniently? > > My usual strategy is to reshape the data from long to wide and then impute > using a multivariate normal model. There are several packages that will > permit this; however, I am not aware of software that will let me constrain > the covariance matrix in the way I have described. > > I have not used imputation software that are tailored for panel data -- > such as Schafer et al's PAN package, recently ported from S-Plus to R. I > could try that, provided there is a convenient way to restrict the long > lags. > > Thanks! > > -- > Best wishes, > Paul von Hippel > Assistant Professor > LBJ School of Public Affairs > Sid Richardson Hall 3.251 > University of Texas, Austin > 2315 Red River, Box Y > Austin, TX 78712 > > mobile, preferred (614) 282-8963 > office (512) 232-3650**** > > ** ** > -- Best wishes, Paul von Hippel Assistant Professor LBJ School of Public Affairs Sid Richardson Hall 3.251 University of Texas, Austin 2315 Red River, Box Y Austin, TX 78712 mobile, preferred (614) 282-8963 office (512) 232-3650
