Re: [R] imputation in mice

2012-12-08 Thread David L Carlson
What do 

> str(data)
> summary(data)

show you? The str() function will show you what kind of variables you have
and the summary() command will indicate the range of the values and if there
are missing data. 

You seem to be overwriting your original data frame "data" (really a bad
name to use since data() is a function in R) after the imputation. Your code
does not show us where "data" comes from originally. The "weight" variable
also seems to exist in something called "lbdata." The error message suggests
that what is in "data" when you try to compute your propensity scores is not
what you think it is.

--
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Elizabeth Fuller Bettini
> Sent: Friday, December 07, 2012 10:55 PM
> To: r-help@r-project.org
> Subject: [R] imputation in mice
> 
> Hello!  If I understand this listserve correctly, I can email this
> address
> to get help when I am struggling with code.  If this is inaccurate,
> please
> let me know, and I will unsubscribe.
> I have been struggling with the same error message for a while, and I
> can't
> seem to get past it.
> Here is the issue:
> I am using a data set that uses -1:-9 to indicate various kinds of
> missing
> data.  I changed all of these to NA, regardless of the cause of the
> missing
> data. I am trying to do propensity score matching with this data, but
> it
> will not calculate the propensity scores, regardless of which method I
> have
> tried. I have tried the following methods:
> 1. Optimal propensity score matching, using the MatchIt library:
> m.out<-matchit(assignment~totalexp + yrschool+new+cert+age+STratio +
> percminority+urbanicity+povproblem+numthreats+numbattack+weight, data =
> data, distance="logit", method = "optimal", ratio = 1)
> 2. Nearest neighbor propensity score matching, using the MatchIt
> library:
> mout<-matchit(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> distance = "logit", replace = T, data = data, method = "nearest",
> m.order="largest", caliper = 0.10)
> 3. Just calculating the propensity scores using the glm function:
> ps.model = glm(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> family = "binomial", data = data)
> data$propensityscores = fitted(ps.model)
> 
> In each case, I have tried running the code after having performed zero
> imputations, 1 imputation, and 5 imputations.  A colleague looked at my
> code and assured me that I was doing the imputations correctly.
> However,
> even after performing the imputation, one of the continuous variables
> still
> has NAs.  This is the code that I am using for 5 imputations:
> library(mice)
> #Remove weights
> data$weight<-NULL
> #perform the imputation
> imputed.data = mice(data,  m = 5, diagnostics = F)
> #reinsert the weights
> imputed.data.final=complete(imputed.data)
> imputed.data.final$weight=lbdata$weight
> #rename the imputed dataset "data"
> data = imputed.data.final
> 
> When I perform optimal propensity score matching or nearest neighbor
> matching (regardless of how many imputations I perform), I get the
> following error:
> Error in matchit(assignment ~ totalexp + yrschool + new + cert + age +
> :
> Missing values exist in the data
> I tried running these with just two of the categorical covariates, but
> I
> still got this error, even though there is no missing data for those
> variables.
> 
> When I perform the glm function to get the propensity scores, I get
> this
> error, indicating that, for some reason, it is reducing the number of
> rows
> in my data set, which makes me think that it is doing list-wise
> deletion:
> Error in `$<-.data.frame`(`*tmp*`, "propensityscores", value =
> c(0.116801691392172,  :
> replacement has 15934 rows, data has 16844
> However, this method works if I remove the covariate that has missing
> data.
> 
> 
> So, I guess my question is, how do I get the code to impute for the
> variable that it is not imputing?  Or, do I just need to chuck this
> variable?  And, if I just need to chuck this variable, how do I get the
> optimal propensity score method to work?  Currently it doesn't work
> even
> when I chuck this variable.
> 
> Thank you for any help or advice!
> Liz
> 
>   [[alternative HTML version delet

[R] imputation in mice

2012-12-07 Thread Elizabeth Fuller Bettini
Hello!  If I understand this listserve correctly, I can email this address
to get help when I am struggling with code.  If this is inaccurate, please
let me know, and I will unsubscribe.
I have been struggling with the same error message for a while, and I can't
seem to get past it.
Here is the issue:
I am using a data set that uses -1:-9 to indicate various kinds of missing
data.  I changed all of these to NA, regardless of the cause of the missing
data. I am trying to do propensity score matching with this data, but it
will not calculate the propensity scores, regardless of which method I have
tried. I have tried the following methods:
1. Optimal propensity score matching, using the MatchIt library:
m.out<-matchit(assignment~totalexp + yrschool+new+cert+age+STratio +
percminority+urbanicity+povproblem+numthreats+numbattack+weight, data =
data, distance="logit", method = "optimal", ratio = 1)
2. Nearest neighbor propensity score matching, using the MatchIt library:
mout<-matchit(assignment~totalexp +
yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthreats+numbattack,
distance = "logit", replace = T, data = data, method = "nearest",
m.order="largest", caliper = 0.10)
3. Just calculating the propensity scores using the glm function:
ps.model = glm(assignment~totalexp +
yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthreats+numbattack,
family = "binomial", data = data)
data$propensityscores = fitted(ps.model)

In each case, I have tried running the code after having performed zero
imputations, 1 imputation, and 5 imputations.  A colleague looked at my
code and assured me that I was doing the imputations correctly.  However,
even after performing the imputation, one of the continuous variables still
has NAs.  This is the code that I am using for 5 imputations:
library(mice)
#Remove weights
data$weight<-NULL
#perform the imputation
imputed.data = mice(data,  m = 5, diagnostics = F)
#reinsert the weights
imputed.data.final=complete(imputed.data)
imputed.data.final$weight=lbdata$weight
#rename the imputed dataset "data"
data = imputed.data.final

When I perform optimal propensity score matching or nearest neighbor
matching (regardless of how many imputations I perform), I get the
following error:
Error in matchit(assignment ~ totalexp + yrschool + new + cert + age +  :
Missing values exist in the data
I tried running these with just two of the categorical covariates, but I
still got this error, even though there is no missing data for those
variables.

When I perform the glm function to get the propensity scores, I get this
error, indicating that, for some reason, it is reducing the number of rows
in my data set, which makes me think that it is doing list-wise deletion:
Error in `$<-.data.frame`(`*tmp*`, "propensityscores", value =
c(0.116801691392172,  :
replacement has 15934 rows, data has 16844
However, this method works if I remove the covariate that has missing data.


So, I guess my question is, how do I get the code to impute for the
variable that it is not imputing?  Or, do I just need to chuck this
variable?  And, if I just need to chuck this variable, how do I get the
optimal propensity score method to work?  Currently it doesn't work even
when I chuck this variable.

Thank you for any help or advice!
Liz

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.