[R] RandomForest and Missing Values

Lorenzo Isella Mon, 28 Jan 2013 08:39:18 -0800

Dear All,
I would like to use a randomForest algorithm on a dataset.
The set is not particularly large/difficult to handle, but it has some
missing values (both factors and numerical values).
According to what I found


https://stat.ethz.ch/pipermail/r-help/2005-September/078880.html
https://stat.ethz.ch/pipermail/r-help/2007-January/123117.html

the randomForest package has a problem with missing data (essentially
you have to resort to some "trick" to introduce them into your dataset
--a median value, the most common factor, a linear interpolation
etc...).
Seen that I could not find a clear workaround for this (but I cannot
be the only one who has in mind to do a randomForest on a less than
perfect data set), can anyone help me out?
I am concerned about the consequences of introducing the missing
values into the data set.
The cforest function in the "Party" package does not seem to have this
limitation, but on the other hand the randomForest package has passed
the test of time....so should I drop it in this case?
Any suggestion is appreciated.
Cheers

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RandomForest and Missing Values

Reply via email to