Re: [R] Is there a good package for multiple imputation of missing values in R?
Robert, Try Amelia, which can be used with Zelig for post-imputation estimation. I find it a very helpful combination. Shige On Mon, Jun 30, 2008 at 3:02 PM, Robert A. LaBudde [EMAIL PROTECTED] wrote: I'm looking for a package that has a start-of-the-art method of imputation of missing values in a data frame with both continuous and factor columns. I've found transcan() in 'Hmisc', which appears to be possibly suited to my needs, but I haven't been able to figure out how to get a new data frame with the imputed values replaced (I don't have Herrell's book). Any pointers would be appreciated. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a good package for multiple imputation of missing values in R?
Robert A. LaBudde wrote: I'm looking for a package that has a start-of-the-art method of imputation of missing values in a data frame with both continuous and factor columns. I've found transcan() in 'Hmisc', which appears to be possibly suited to my needs, but I haven't been able to figure out how to get a new data frame with the imputed values replaced (I don't have Herrell's book). Any pointers would be appreciated. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire In Hmisc the aregImpute function works much better than transcan for multiple imputation. The fit.mult.impute function will draw the imputed values to fit a regression model multiple times and average the regression coefficient estimates. Type ?aregImpute to find out how to get an imputed dataset if not using fit.mult.impute. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a good package for multiple imputation of missing values in R?
At 03:02 AM 6/30/2008, Robert A. LaBudde wrote: I'm looking for a package that has a start-of-the-art method of imputation of missing values in a data frame with both continuous and factor columns. I've found transcan() in 'Hmisc', which appears to be possibly suited to my needs, but I haven't been able to figure out how to get a new data frame with the imputed values replaced (I don't have Herrell's book). Any pointers would be appreciated. Thanks to paulandpen, Frank and Shige for suggestions. I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'. I still haven't mastered the methodology for using aregImpute() in 'Hmisc' based on the help information. I think I'll have to get hold of Frank's book to see how it's used in a complete example. 'Amelia' and 'norm' appear to be focused solely on continuous, multivariate normal variables, but my needs typically involve datasets with both factors and continuous variables. The function mice() in 'mice' appears to best suit my needs, and the help file was intelligible, and it works on both factors and continuous variables. For those in the audience with similar issues, here is a code snippet showing how some of these functions work ('felon' is a data frame with categorical and continuous predictors of the binary variable 'hired'): library('mice') #missing data imputation library for md.pattern(), mice(), complete() names(felon) #show variable names md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars library('Hmisc') #package for na.pattern() and impute() na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars #simple imputation can be done by felon2- felon #make copy felon2$felony- impute(felon2$felony) #impute NAs (most frequent) felon2$gender- impute(felon2$gender) #impute NAs felon2$natamer- impute(felon2$natamer) #impute NAs na.pattern(felon2[,1:4]) #show no NAs left in these vars fit2- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial) summary(fit2) #better, multiple imputation can be done via mice(): imp- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations) for (iSet in 1:5) { #show results for the 5 imputation datasets fit- glm(hired ~ felony + gender + natamer, data=complete(imp, iSet), family=binomial) #fit to iSet-th realization print(summary(fit)) } Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a good package for multiple imputation of missing values in R?
Robert A LaBudde wrote: At 03:02 AM 6/30/2008, Robert A. LaBudde wrote: I'm looking for a package that has a start-of-the-art method of imputation of missing values in a data frame with both continuous and factor columns. I've found transcan() in 'Hmisc', which appears to be possibly suited to my needs, but I haven't been able to figure out how to get a new data frame with the imputed values replaced (I don't have Herrell's book). Any pointers would be appreciated. Thanks to paulandpen, Frank and Shige for suggestions. I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'. I still haven't mastered the methodology for using aregImpute() in 'Hmisc' based on the help information. I think I'll have to get hold of Frank's book to see how it's used in a complete example. It's not in the book; it will be in the 2nd edition someday Frank 'Amelia' and 'norm' appear to be focused solely on continuous, multivariate normal variables, but my needs typically involve datasets with both factors and continuous variables. The function mice() in 'mice' appears to best suit my needs, and the help file was intelligible, and it works on both factors and continuous variables. For those in the audience with similar issues, here is a code snippet showing how some of these functions work ('felon' is a data frame with categorical and continuous predictors of the binary variable 'hired'): library('mice') #missing data imputation library for md.pattern(), mice(), complete() names(felon) #show variable names md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars library('Hmisc') #package for na.pattern() and impute() na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars #simple imputation can be done by felon2- felon #make copy felon2$felony- impute(felon2$felony) #impute NAs (most frequent) felon2$gender- impute(felon2$gender) #impute NAs felon2$natamer- impute(felon2$natamer) #impute NAs na.pattern(felon2[,1:4]) #show no NAs left in these vars fit2- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial) summary(fit2) #better, multiple imputation can be done via mice(): imp- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations) for (iSet in 1:5) { #show results for the 5 imputation datasets fit- glm(hired ~ felony + gender + natamer, data=complete(imp, iSet), family=binomial) #fit to iSet-th realization print(summary(fit)) } Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.