Thanks Tyler

This function has some useful features

Tim

 

From: Tyler Rinker [mailto:tyler_rin...@hotmail.com] 
Sent: Tuesday, April 19, 2011 3:52 PM
To: tesutton; r-help@r-project.org
Subject: RE: [R] Simple Missing cases Function

 

I use the following code/function which gives me some quick descriptives
about each variable (ie. n of missing values, % missing, case #'s missing,
etc.):
Fairly quick, maybe not pretty but effective on either single variables or
entire data sets.
 
NAhunter<-function(dataset)
{
find.NA<-function(variable)
{
if(is.numeric(variable)){
n<-length(variable)
mean<-mean(variable, na.rm=T)
median<-median(variable, na.rm=T)
sd<-sd(variable, na.rm=T)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing Value'),]
list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING
VALUES"=missing.values[,1])
}
else{
n<-length(variable)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing Value'),]
list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING
VALUES"=missing.values[,1])
}
}
dataset<-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}

 
> From: tesut...@hku.hk
> To: r-help@r-project.org
> Date: Tue, 19 Apr 2011 15:29:08 +0800
> Subject: [R] Simple Missing cases Function
> 
> Dear all
> 
> 
> 
> I have written a function to perform a very simple but useful task which I
> do regularly. It is designed to show how many values are missing from each
> variable in a data.frame. In its current form it works but is slow because
I
> have used several loops to achieve this simple task. 
> 
> 
> 
> Can anyone see a more efficient way to get the same results? Or is there
> existing function which does this?
> 
> 
> 
> Thanks for your help
> 
> Tim
> 
> 
> 
> Function:
> 
> miss <- function (data) 
> 
> {
> 
> miss.list <- list(NA)
> 
> for (i in 1:length(data)) {
> 
> miss.list[[i]] <- table(is.na(data[i]))
> 
> }
> 
> for (i in 1:length(miss.list)) {
> 
> if (length(miss.list[[i]]) == 2) {
> 
> miss.list[[i]] <- miss.list[[i]][2]
> 
> }
> 
> }
> 
> for (i in 1:length(miss.list)) {
> 
> if (names(miss.list[[i]]) == "FALSE") {
> 
> miss.list[[i]] <- 0
> 
> }
> 
> }
> 
> data.frame(names(data), as.numeric(miss.list))
> 
> }
> 
> 
> 
> Example:
> 
> data(ToothGrowth)
> 
> data.m <- ToothGrowth
> 
> data.m$supp[sample(1:nrow(data.m), size=25)] <- NA
> 
> miss(data.m)
> 
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to