Hello everybody,

this is my first request about R so I am sorry if I send it to a bad mail or if 
I am not very clear.

So my problem is about the use of rfImpute from randomForest package. I am 
interested in imputations of missing values and I read that randomForest can 
make it. So i write the following code :

set.seed(100);
library(mlbench)
library(randomForest)
data(BreastCancer)
summary(BreastCancer)
data=BreastCancer[,-1]
data=data[!is.na(data[,"Bare.nuclei"]),]
summary(data)


is.factor(data$Cl.thickness)# OK


##########selection of missing values######
x=1:nrow(data)
sample1=sample(x,70)
sample3=sample(x,70)
sample5=sample(x,70)


##########replace by missing values#########
data_missing=data
data_missing[sample1,1]=NA
data_missing[sample3,3]=NA
data_missing[sample5,5]=NA
summary(data_missing)


is.factor(data_missing$Cl.thickness)# OK


########imputation by random forest########
data_imputed <- rfImpute(Class ~ .,data_missing,iter=5,ntree=1000)


is.factor(data_imputed$Cl.thickness)# Not OK



And as you can see, rfImpute change the type of one explanatory variable. 
Before imputation, it was a factor. After it becomes a quantitative variable. 
So I don't understand what it happens. Maybe I should add an option in 
rfImpute...
If someone could help me to understand.

Thank you very much




_________________________________________________________________

? Lancez-vous !

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to