> On Jul 29, 2016, at 5:52 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > >> On Jul 29, 2016, at 5:08 PM, Jun Shen <jun.shen...@gmail.com> wrote: >> >> Thanks Jeff/David for the reply. I wasn't clear in the previous message. the >> problem of using na.omit is it will omit the whole row where there is at >> least one NA, even when some variables do have non-NA values. > > Did you actually run the example I offered, or did you just guess at what > would happen and complained? When applied only to a vector there is no such > thing as a "column". > > What you are describing would only have happened if `na.omit` were applied to > an object that was a dataframe. That was not what was offered in the example.
And then I looked at the code again and realized you were not looping over the columns as I thought was happening. So what you wnat is: do.stats <- function(data, stats.func, summary.var) as.data.frame(signif(sapply(stats.func,function(func) mapply( func, lapply( data[summary.var], na.omit) )), 3)) -- David > > -- > David. >> >> For example: let's define a new function >> N <- function(x) length(x[!is.na(x)]) >> >> test <- >> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100)) >> test$CL[1] <- NA >> >> do.stats(test, stats.func=c('mean','sd','median','min','max','N'), >> summary.var=c('CL','V1', 'V2','ALPHA')) >> >> gives >> >> mean sd median min max N >> CL -0.0232 0.918 -0.0786 -2.14 3.14 99 >> V1 -0.0410 0.936 -0.1160 -2.86 2.67 99 >> V2 -0.1760 0.978 -0.1490 -2.31 2.15 99 >> ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99 >> >> >> there is one non-missing value in V1,V2 and ALPHA is omitted. >> >> >> On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >>> On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen...@gmail.com> wrote: >>> >>> Because in reality the NA may appear in one variable but not others. For >>> example for ID=1, CL may be NA but not for others, For ID=2, V1 may be NA >>> etc. To keep all the IDs and all the variables in one data frame, it's >>> inevitable to see some NA >> >> That doesn't seem to acknowledge Newmiller's advice. In particular this >> would have seemed to an obvious response to that suggestion: >> >> do.stats <- function(data, stats.func, summary.var) >> as.data.frame(signif(sapply(stats.func,function(func) >> mapply( func, na.omit( data[summary.var]) )), 3)) >> >> >> And please also heed the advice in the Posting Guide to use plain text. >> >> -- >> David. >> >> >> >>> >>> On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >>> wrote: >>> >>>> Why not remove it yourself before passing it to those functions? >>>> -- >>>> Sent from my phone. Please excuse my brevity. >>>> >>>> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen...@gmail.com> wrote: >>>>> Dear list, >>>>> >>>>> I write a small function to calculate multiple stats on multiple >>>>> variables >>>>> and export in a format exactly the way I want. Everything seems fine >>>>> until >>>>> NA appears in the data. >>>>> >>>>> Here is my function: >>>>> >>>>> do.stats <- function(data, stats.func, summary.var) >>>>> as.data.frame(signif(sapply(stats.func,function(func) >>>>> mapply(func,data[summary.var])),3)) >>>>> >>>>> A test dataset: >>>>> test <- >>>> >>>>> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100)) >>>>> >>>>> a command like the following >>>>> do.stats(test, stats.func=c('mean','sd','median','min','max'), >>>>> summary.var=c('CL','V1', 'V2','ALPHA')) >>>>> >>>>> gives me >>>>> >>>>> mean sd median min max >>>>> CL 0.1030 0.917 0.0363 -2.32 2.47 >>>>> V1 -0.0545 1.070 -0.2120 -2.21 2.70 >>>>> V2 0.0600 1.000 0.0621 -2.80 2.62 >>>>> ALPHA -0.0113 0.919 0.0284 -2.35 2.31 >>>>> >>>>> >>>>> However if I have a NA in the data >>>>> test$CL[1] <- NA >>>>> >>>>> The same command run gives me >>>>> mean sd median min max >>>>> CL * NA NA NA NA NA* >>>>> V1 -0.0545 1.070 -0.2120 -2.21 2.70 >>>>> V2 0.0600 1.000 0.0621 -2.80 2.62 >>>>> ALPHA -0.0113 0.919 0.0284 -2.35 2.31 >>>>> >>>>> I know this is because those functions (mean, sd etc.) all have >>>>> na.rm=F by default. How can I >>>>> >>>>> pass na.rm=T to all these functions without manually redefining those >>>>> stats functions >>>>> >>>>> Appreciate any comment. >>>>> >>>>> Thanks for your help. >>>>> >>>>> >>>>> Jun >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.