Re: [R] Is there a good package for multiple imputation of missing values in R?

2008-06-30 Thread Shige Song
Robert,

Try Amelia, which can be used with Zelig for post-imputation estimation. I
find it a very helpful combination.

Shige

On Mon, Jun 30, 2008 at 3:02 PM, Robert A. LaBudde [EMAIL PROTECTED] wrote:

 I'm looking for a package that has a start-of-the-art method of imputation
 of missing values in a data frame with both continuous and factor columns.

 I've found transcan() in 'Hmisc', which appears to be possibly suited to my
 needs, but I haven't been able to figure out how to get a new data frame
 with the imputed values replaced (I don't have Herrell's book).

 Any pointers would be appreciated.
 
 Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
 Least Cost Formulations, Ltd.URL: http://lcfltd.com/
 824 Timberlake Drive Tel: 757-467-0954
 Virginia Beach, VA 23464-3239Fax: 757-467-2947

 Vere scire est per causas scire

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a good package for multiple imputation of missing values in R?

2008-06-30 Thread Frank E Harrell Jr

Robert A. LaBudde wrote:
I'm looking for a package that has a start-of-the-art method of 
imputation of missing values in a data frame with both continuous and 
factor columns.


I've found transcan() in 'Hmisc', which appears to be possibly suited to 
my needs, but I haven't been able to figure out how to get a new data 
frame with the imputed values replaced (I don't have Herrell's book).


Any pointers would be appreciated.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire



In Hmisc the aregImpute function works much better than transcan for 
multiple imputation.  The fit.mult.impute function will draw the imputed 
values to fit a regression model multiple times and average the 
regression coefficient estimates.  Type ?aregImpute to find out how to 
get an imputed dataset if not using fit.mult.impute.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a good package for multiple imputation of missing values in R?

2008-06-30 Thread Robert A LaBudde

At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
I'm looking for a package that has a start-of-the-art method of 
imputation of missing values in a data frame with both continuous 
and factor columns.


I've found transcan() in 'Hmisc', which appears to be possibly 
suited to my needs, but I haven't been able to figure out how to get 
a new data frame with the imputed values replaced (I don't have 
Herrell's book).


Any pointers would be appreciated.


Thanks to paulandpen, Frank and Shige for suggestions.

I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.

I still haven't mastered the methodology for using aregImpute() in 
'Hmisc' based on the help information. I think I'll have to get hold 
of Frank's book to see how it's used in a complete example.


'Amelia' and 'norm' appear to be focused solely on continuous, 
multivariate normal variables, but my needs typically involve 
datasets with both factors and continuous variables.


The function mice() in 'mice' appears to best suit my needs, and the 
help file was intelligible, and it works on both factors and 
continuous variables.


For those in the audience with similar issues, here is a code snippet 
showing how some of these functions work ('felon' is a data frame 
with categorical and continuous predictors of the binary variable 'hired'):


library('mice') #missing data imputation library for md.pattern(), 
mice(), complete()

names(felon)  #show variable names
md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

library('Hmisc')  #package for na.pattern() and impute()
na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

#simple imputation can be done by
felon2- felon  #make copy
felon2$felony- impute(felon2$felony) #impute NAs (most frequent)
felon2$gender- impute(felon2$gender) #impute NAs
felon2$natamer- impute(felon2$natamer) #impute NAs
na.pattern(felon2[,1:4]) #show no NAs left in these vars
fit2- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
summary(fit2)

#better, multiple imputation can be done via mice():
imp- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
for (iSet in 1:5) {  #show results for the 5 imputation datasets
  fit- glm(hired ~ felony + gender + natamer,
data=complete(imp, iSet), family=binomial)  #fit to iSet-th realization
  print(summary(fit))
}


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a good package for multiple imputation of missing values in R?

2008-06-30 Thread Frank E Harrell Jr

Robert A LaBudde wrote:

At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
I'm looking for a package that has a start-of-the-art method of 
imputation of missing values in a data frame with both continuous and 
factor columns.


I've found transcan() in 'Hmisc', which appears to be possibly suited 
to my needs, but I haven't been able to figure out how to get a new 
data frame with the imputed values replaced (I don't have Herrell's 
book).


Any pointers would be appreciated.


Thanks to paulandpen, Frank and Shige for suggestions.

I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.

I still haven't mastered the methodology for using aregImpute() in 
'Hmisc' based on the help information. I think I'll have to get hold of 
Frank's book to see how it's used in a complete example.


It's not in the book; it will be in the 2nd edition someday
Frank



'Amelia' and 'norm' appear to be focused solely on continuous, 
multivariate normal variables, but my needs typically involve datasets 
with both factors and continuous variables.


The function mice() in 'mice' appears to best suit my needs, and the 
help file was intelligible, and it works on both factors and continuous 
variables.


For those in the audience with similar issues, here is a code snippet 
showing how some of these functions work ('felon' is a data frame with 
categorical and continuous predictors of the binary variable 'hired'):


library('mice') #missing data imputation library for md.pattern(), 
mice(), complete()

names(felon)  #show variable names
md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

library('Hmisc')  #package for na.pattern() and impute()
na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars

#simple imputation can be done by
felon2- felon  #make copy
felon2$felony- impute(felon2$felony) #impute NAs (most frequent)
felon2$gender- impute(felon2$gender) #impute NAs
felon2$natamer- impute(felon2$natamer) #impute NAs
na.pattern(felon2[,1:4]) #show no NAs left in these vars
fit2- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
summary(fit2)

#better, multiple imputation can be done via mice():
imp- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
for (iSet in 1:5) {  #show results for the 5 imputation datasets
  fit- glm(hired ~ felony + gender + natamer,
data=complete(imp, iSet), family=binomial)  #fit to iSet-th realization
  print(summary(fit))
}


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.