Hi,
I have the following codes to replace missing using median, assuming missing
only occurs on continuous variables:

trn1<-read.table('trn1.fv', header=F, na.string='.', sep='|')

# median
m.trn1<-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T))

#replace
trn2<-trn1
for (each in 1:nrow(trn1)){
index.missing=which(is.na(trn1[each,]))
trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing])
}


Anyone can suggest some ways to improve it since replacing 10 takes 1.5 sec:
> system.time(for (each in 1:10){index.missing=which(is.na(trn1[each,]));
trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing]);})
[1] 1.53 0.00 1.53 0.00 0.00


Another general question is
are there some packages in R doing missing handling?

Thanks,

--
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to