Thanks Tyler This function has some useful features
Tim From: Tyler Rinker [mailto:tyler_rin...@hotmail.com] Sent: Tuesday, April 19, 2011 3:52 PM To: tesutton; r-help@r-project.org Subject: RE: [R] Simple Missing cases Function I use the following code/function which gives me some quick descriptives about each variable (ie. n of missing values, % missing, case #'s missing, etc.): Fairly quick, maybe not pretty but effective on either single variables or entire data sets. NAhunter<-function(dataset) { find.NA<-function(variable) { if(is.numeric(variable)){ n<-length(variable) mean<-mean(variable, na.rm=T) median<-median(variable, na.rm=T) sd<-sd(variable, na.rm=T) NAs<-is.na(variable) total.NA<-sum(NAs) percent.missing<-total.NA/n descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing) rownames(descriptives)<-c(" ") Case.Number<-1:n Missing.Values<-ifelse(NAs>0,"Missing Value"," ") missing.value<-data.frame(Case.Number,Missing.Values) missing.values<-missing.value[ which(Missing.Values=='Missing Value'),] list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1]) } else{ n<-length(variable) NAs<-is.na(variable) total.NA<-sum(NAs) percent.missing<-total.NA/n descriptives<-data.frame(n,total.NA,percent.missing) rownames(descriptives)<-c(" ") Case.Number<-1:n Missing.Values<-ifelse(NAs>0,"Missing Value"," ") missing.value<-data.frame(Case.Number,Missing.Values) missing.values<-missing.value[ which(Missing.Values=='Missing Value'),] list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1]) } } dataset<-data.frame(dataset) options(scipen=100) options(digits=2) lapply(dataset,find.NA) } > From: tesut...@hku.hk > To: r-help@r-project.org > Date: Tue, 19 Apr 2011 15:29:08 +0800 > Subject: [R] Simple Missing cases Function > > Dear all > > > > I have written a function to perform a very simple but useful task which I > do regularly. It is designed to show how many values are missing from each > variable in a data.frame. In its current form it works but is slow because I > have used several loops to achieve this simple task. > > > > Can anyone see a more efficient way to get the same results? Or is there > existing function which does this? > > > > Thanks for your help > > Tim > > > > Function: > > miss <- function (data) > > { > > miss.list <- list(NA) > > for (i in 1:length(data)) { > > miss.list[[i]] <- table(is.na(data[i])) > > } > > for (i in 1:length(miss.list)) { > > if (length(miss.list[[i]]) == 2) { > > miss.list[[i]] <- miss.list[[i]][2] > > } > > } > > for (i in 1:length(miss.list)) { > > if (names(miss.list[[i]]) == "FALSE") { > > miss.list[[i]] <- 0 > > } > > } > > data.frame(names(data), as.numeric(miss.list)) > > } > > > > Example: > > data(ToothGrowth) > > data.m <- ToothGrowth > > data.m$supp[sample(1:nrow(data.m), size=25)] <- NA > > miss(data.m) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.