Hi, I have the following codes to replace missing using median, assuming missing only occurs on continuous variables:
trn1<-read.table('trn1.fv', header=F, na.string='.', sep='|') # median m.trn1<-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T)) #replace trn2<-trn1 for (each in 1:nrow(trn1)){ index.missing=which(is.na(trn1[each,])) trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing]) } Anyone can suggest some ways to improve it since replacing 10 takes 1.5 sec: > system.time(for (each in 1:10){index.missing=which(is.na(trn1[each,])); trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html