Hi Everyone, I am new to multiple imputation, and would like to get your advice about a project that I am working on. I am interested in testing a model involving a latent interaction term using 2sls. My data are drawn from a 4-wave 13-year longitudinal study of married couples. The sample consists of 154 couples. Data for the early years of marriage were gathered at three annual intervals beginning when couples were newlyweds in 1980-1981. Follow-up data then were gathered on a fourth occasion in 1994-1995.
Each phase of the study consisted of a primary face-to-face interview and a series of follow-up daily diary telephone interviews. Primary interviews were used to gather data on peoples perceptions of their partners traits and their feelings of marital satisfaction. The follow-up interviews were used to gather "quasi-observational" data on socioemotional behaviors in marriage, such as affection, negativity, and sexual expression. During the 13-year follow-up, data also were gathered on marital stability. In conjunction with the data on marital satisfaction, this made it possible to categorize couples into four marital outcome groups: happily married, unhappily married, early divorced, and later divorced. In my study, I would like to test whether peoples tendency to idealize their partner in the early years of marriage is associated with marital outcomes 13 years after couples were first married. In my analysis, I want to use 2sls to regress a manifest measure of the extent to which people perceive that their intimate partner is pleasant to be around on a latent measure of the partners tendency to engage in pleasant behavior, a latent measure of the partners tendency to engage in unpleasant behavior, and the interaction between the two. Then I want to save the residual in prediction as a measure of idealization for use in subsequent analyses. The latent measure of pleasant behavior and the latent measure of unpleasant behavior both have four indicators. The latent interaction has 10 indicators a scaling indicator that is formed by multiplying the scaling indicators of the latent measure of pleasant behavior and the latent measure of unpleasant behavior, and 9 additional indicators that are formed by multiplying each of the nonscaling indicators of the latent measure of pleasant behavior with each of the nonscaling indicators of the latent measure of unpleasant behavior. A 2sls analysis of the model described above consists of two steps. First, the scaling indicators for each latent are regressed on the non-scaling indicators for that latent as well as the non-scaling indicators from the other latents in the model and predicted values are obtained. Then the criterion is regressed on the predicted values for each of the scaling indicators. The basic idea is that because the non-scaling indicators are correlated with the scaling indicators, but not with the disturbance term in their measurement equations, they can be used to purge the scaling indicators of measurement error, thereby yielding estimates of the underlying latents. Anyway, Im trying to use multiple imputation to deal with missing observations in a data set that contains the variables that I will need to test my model for husbands and wives at three points in time during the early years of marriage. This data set consists of slightly more than 100 variables, roughly half of which are interaction terms. Ideally, I would like to be able to preserve associations in these variables across both gender and time. So my sense is that I need to avoid dividing my imputation data set into a number of smaller data sets (e.g., male variables at time 1, female variables at time 1, etc.). Unfortunately though, when I try to use this larger data set in both SAS Proc Mi and Norm I have trouble getting the EM algorithm to converge. In SAS, this data set also creates an error that causes the program to terminate. SAS isnt too specific about the nature of the problem. It just says things like "invalid operation" and "generic error." So Im hoping that people can give me some advice on what I should do next. The indicators for the latent predictors tend to be non-normal with a lot of positive skew. In addition, there are often one or two outliers in the high end of the distribution that are detached from the rest of the distribution. So, for example, if one looks at peoples reports of the average number of times per day that their spouse expresses physical affection toward them (e.g., kissing, hugging, cuddling), most scores are in the lower end of the distribution but there are some extremely high scores as well. So far, Ive tried removing the extreme scores from the data but this doesnt fix the problems with non-convergence and crashing that I get with the larger imputation data set. The next thing Im planning to do is to transform the variables in the hope of attaining multivariate normality. My guess is that this will not be successful, but Ill give it a try. My sense though is that the failure to attain multivariate normality may not be at the heart of the problem. According to the documentation for SAS Proc Mi this typically is not a problem unless a large percentage of the data are missing. My rate of missing information never exceeds 24%. Mind you, the SAS documentation doesnt say what constitutes a large amount of missing data. So maybe this is a lot. Another possibility might be large discrepancies in variance among the variables. Some of the behaviors in my data occur much more frequently than do others. For example, positive behaviors such as expressing physical affection tend to occur more frequently than do negative behaviors such as showing anger or impatience by snapping, yelling, or raising ones voice. So I was thinking that it might be possible to artificially increase the variance of some variables or decrease the variance of others. Maybe this could be accomplished by multiplying or dividing (or adding or subtracting) a constant. But I dont know if this sort of thing is normally done, or if large discrepancies in the frequency at which different behaviors occur is even likely to be part of the problem. At any rate, any advice that people could give me about what to do next would be greatly appreciated. Sorry for what is admittedly a long posting, but I felt that I would be most likely to get good advice if I took the time to give a detailed description of my project. Thanks, Paul --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030727/ab477799/attachment.htm From link <@t> umich.edu Mon Jul 28 15:44:28 2003 From: link <@t> umich.edu (Steve Peck) Date: Sun Jun 26 08:25:00 2005 Subject: IMPUTE: Re: combine before instead of after? References: <[email protected]> Message-ID: <[email protected]> ok - this makes sense to me -- thank you for the succint replies I now have 5 imputed versions of each of my regression models I would like to describe these models using standardized betas. I plan to use MIANALYSIS to combine the 5 results. The output from spss appears to write only the unstandardized betas (for input into MIANALYSIS). Any ideas about how I can get the Standardized beta estimates? Donald Rubin wrote: >Yup, wrong answer, unless the statistic is linear in all the missing data, >i.e., this could only work in your case if the only varible with >missingness is y. ANd even then, all the standard errors and tests are >wrong. Not a very successful path to follow. > > >On Fri, 25 Jul 2003, Steve Peck wrote: > > > >>Assuming a set of 20 continuous variables, >>are there specific reasons for *not* combining the >> results of 5 MI data sets before doing regression analyses >> (e.g., by computing value estimates by averaging across >> the 5 values per variable) instead of combing the parameter >> estimates that are generated from each of the 5 models run >> seperately)? >> >>thanks, >>Steve >> >> >> >> > > > -- Stephen C. Peck Senior Research Associate Social Science University of Michigan 204 S. State St. # 1239 Ann Arbor, MI 48109-1290 (734) 647-3683; fax (734) 936-7370 [email protected] http://www.rcgd.isr.umich.edu/garp/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030728/3094dac1/attachment.htm From newgardc <@t> ohsu.edu Wed Jul 30 12:13:40 2003 From: newgardc <@t> ohsu.edu (Craig Newgard) Date: Sun Jun 26 08:25:00 2005 Subject: IMPUTE: Re: Beginner Question about Nonconvergence of EM Algorithm Message-ID: <[email protected]> Paul, I'm not sure that I follow your full description of the question, but I have a few suggestions. For using MI when your hypothesis centers on testing an interaction term(s), Paul Allison (see below) offers a nice explanation of how to design your imputation model to maximize statistical efficiency of this term (i.e., parallel chains of MI, split on one of interaction terms). This design will need to be adjusted if both terms in the interaction have missing values. If you are having a difficult time getting the SAS proc MI algorithm to converge, you may want to try another program, such as IVEware, that uses more flexible models based on each variable being imputed (can download beta version IVEware for free). Craig Allison PD. (2001) Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136. Thousand Oaks, CA: Sage. Raghunathan TE, Lepkowski , Van Hoewyk J, Solenberger PW. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 2001;27:85-95. Craig D. Newgard, MD, MPH Assistant Professor Department of Emergency Medicine Department of Public Health & Preventative Medicine Oregon Health & Science University 3181 Sam Jackson Park Road Mail Code CR-114 Portland, OR 97201-3098 (503) 494-1668 (Office) (503) 494-4640 (Fax) [email protected] >>> Paul Miller <[email protected]> 07/27/03 08:19AM >>> Hi Everyone, I am new to multiple imputation, and would like to get your advice about a project that I am working on. I am interested in testing a model involving a latent interaction term using 2sls. My data are drawn from a 4-wave 13-year longitudinal study of married couples. The sample consists of 154 couples. Data for the early years of marriage were gathered at three annual intervals beginning when couples were newlyweds in 1980-1981. Follow-up data then were gathered on a fourth occasion in 1994-1995. Each phase of the study consisted of a primary face-to-face interview and a series of follow-up daily diary telephone interviews. Primary interviews were used to gather data on people's perceptions of their partner's traits and their feelings of marital satisfaction. The follow-up interviews were used to gather "quasi-observational" data on socioemotional behaviors in marriage, such as affection, negativity, and sexual expression. During the 13-year follow-up, data also were gathered on marital stability. In conjunction with the data on marital satisfaction, this made it possible to categorize couples into four marital outcome groups: happily married, unhappily married, early divorced, and later divorced. In my study, I would like to test whether people's tendency to idealize their partner in the early years of marriage is associated with marital outcomes 13 years after couples were first married. In my analysis, I want to use 2sls to regress a manifest measure of the extent to which people perceive that their intimate partner is pleasant to be around on a latent measure of the partner's tendency to engage in pleasant behavior, a latent measure of the partner's tendency to engage in unpleasant behavior, and the interaction between the two. Then I want to save the residual in prediction as a measure of idealization for use in subsequent analyses. The latent measure of pleasant behavior and the latent measure of unpleasant behavior both have four indicators. The latent interaction has 10 indicators * a scaling indicator that is formed by multiplying the scaling indicators of the latent measure of pleasant behavior and the latent measure of unpleasant behavior, and 9 addi tional indicators that are formed by multiplying each of the nonscaling indicators of the latent measure of pleasant behavior with each of the nonscaling indicators of the latent measure of unpleasant behavior. A 2sls analysis of the model described above consists of two steps. First, the scaling indicators for each latent are regressed on the non-scaling indicators for that latent as well as the non-scaling indicators from the other latents in the model and predicted values are obtained. Then the criterion is regressed on the predicted values for each of the scaling indicators. The basic idea is that because the non-scaling indicators are correlated with the scaling indicators, but not with the disturbance term in their measurement equations, they can be used to purge the scaling indicators of measurement error, thereby yielding estimates of the underlying latents. Anyway, I'm trying to use multiple imputation to deal with missing observations in a data set that contains the variables that I will need to test my model for husbands and wives at three points in time during the early years of marriage. This data set consists of slightly more than 100 variables, roughly half of which are interaction terms. Ideally, I would like to be able to preserve associations in these variables across both gender and time. So my sense is that I need to avoid dividing my imputation data set into a number of smaller data sets (e.g., male variables at time 1, female variables at time 1, etc.). Unfortunately though, when I try to use this larger data set in both SAS Proc Mi and Norm I have trouble getting the EM algorithm to converge. In SAS, this data set also creates an error that causes the program to terminate. SAS isn't too specific about the nature of the problem. It just says things like "invalid operation" and "generic error." So I'm hoping that people can give me some advice on what I should do next. The indicators for the latent predictors tend to be non-normal with a lot of positive skew. In addition, there are often one or two outliers in the high end of the distribution that are detached from the rest of the distribution. So, for example, if one looks at people's reports of the average number of times per day that their spouse expresses physical affection toward them (e.g., kissing, hugging, cuddling), most scores are in the lower end of the distribution but there are some extremely high scores as well. So far, I've tried removing the extreme scores from the data but this doesn't fix the problems with non-convergence and crashing that I get with the larger imputation data set. The next thing I'm planning to do is to transform the variables in the hope of attaining multivariate normality. My guess is that this will not be successful, but I'll give it a try. My sense though is that the failure to attain multivariate normality may not be at the heart of the problem. According to the documentation for SAS Proc Mi this typically is not a problem unless a large percentage of the data are missing. My rate of missing information never exceeds 24%. Mind you, the SAS documentation doesn't say what constitutes a large amount of missing data. So maybe this is a lot. Another possibility might be large discrepancies in variance among the variables. Some of the behaviors in my data occur much more frequently than do others. For example, positive behaviors such as expressing physical affection tend to occur more frequently than do negative behaviors such as showing anger or impatience by snapping, yelling, or raising one's voice. So I was thinking that it might be possible to artificially increase the variance of some variables or decrease the variance of others. Maybe this could be accomplished by multiplying or dividing (or adding or subtracting) a constant. But I don't know if this sort of thing is normally done, or if large discrepancies in the frequency at which different behaviors occur is even likely to be part of the problem. At any rate, any advice that people could give me about what to do next would be greatly appreciated. Sorry for what is admittedly a long posting, but I felt that I would be most likely to get good advice if I took the time to give a detailed description of my project. Thanks, Paul Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20030730/3546fa10/attachment.htm From G.Raab <@t> napier.ac.uk Wed Jul 30 18:22:13 2003 From: G.Raab <@t> napier.ac.uk (Raab, Gillian) Date: Sun Jun 26 08:25:00 2005 Subject: IMPUTE: Re: Beginner Question about Nonconvergence of EM Algo rithm Message-ID: <[email protected]> I agree with Craig's suggestion to use IVEware. It is much better than norm etc for categorical data. I assume that you were using the facilities in MI that pull back the imputed normal values to categories. When I tried to use this a couple of years back using SAS 8.1 (and also 8.2) there were some nasty bugs that would result in errors in a few variables, but not always and sporadically. What happened was that sometimes I would find a handful of non-missing data had been replaced. I sent all kinds of messages to SAS about this but did not get much joy from them, just saying that the procedure was unsupported, so I gave up and fudged that problem in some other way. HAs anyone else on this list had similar problems? And are they fixed now? SInce then I have found IVEware much better. It will do everything that MI does and more, though it is a bit harder to use. I started to write some notes as a bit of an idiots guide to it, but they are still incomplete. The only problem I faced with it ws the need to ensure that all continuous variables were approximately centred to avoid computational difficulties. Good luck Gillian Raab Edinburgh
