On Oct 23, 2012, at 11:17 AM, Lopez, Dan wrote: > Hi, > > Is there a function I can use on my dataframe to give me a concise summary of > variables that are NA,blank,etc? Basically all Null values, Empty strings, > white space, blank values. Ideally it would look something like the below: > > # it should only includes the fields with NAs, blanks, etc. Added bonus would > be to include column Index. > #Valid Records = records that are not NA, blank,etc > #ColIndex - what place is column in the original dataframe...1,2,3, ...xth > > Valid Records Null (NA?) Empty String White Space > Blank Value ColIndex
Would a "Valid Record" be defined by grep([^ ], column)? ... i.e. has a non-space character in it What is a "ColIndex"? How is an "Empty String" different than "White Space" or a "Blank Value" > Var1 52 8 > 2 > Var2 40 20 > 10 10 > 3 > Var3 58 > 2 > 20 > .. > I generally use describe from package:Hmisc. There are other versions of describe in other packages. It's not going to classify items composed entirely of a varying number of spaces and other non-character items like tabs as a single group. And it's unclear what you will use as an operational definition to separate blanks and white-space. You will probably need to code that yourself. You might want to look at the code for Hmisc::describe as a starting point. > I now there is summary() but I am not sure if that always displays NAs and > blanks especially with factor variables that have several levels (lumps them > in 'Other' when I run the entire dataframe). > In these instances I can run the individual field separately and see all > levels but that would be inefficient to do for a dataframe with over 50 > variables. How were you going to "run the individual field"? If you show us code, there might be more rapid progress. It would probably be very easy to turn that into a function that could then be "run" with `lapply`. > > -- David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.