> On Jul 29, 2016, at 5:52 PM, David Winsemius <dwinsem...@comcast.net> wrote:
> 
> 
>> On Jul 29, 2016, at 5:08 PM, Jun Shen <jun.shen...@gmail.com> wrote:
>> 
>> Thanks Jeff/David for the reply. I wasn't clear in the previous message. the 
>> problem of using na.omit is it will omit the whole row where there is at 
>> least one NA, even when some variables do have non-NA values. 
> 
> Did you actually run the example I offered,  or did you just guess at what 
> would happen and complained? When applied only to a vector there is no such 
> thing as a "column". 
> 
> What you are describing would only have happened if `na.omit` were applied to 
> an object that was a dataframe. That was not what was offered in the example.

And then I looked at the code again and realized you were not looping over the 
columns as I thought was happening. So what you wnat is:

do.stats <- function(data, stats.func, summary.var)
         as.data.frame(signif(sapply(stats.func,function(func)
mapply( func, lapply( data[summary.var], na.omit) )), 3))

-- 
David


> 
> -- 
> David.
>> 
>> For example: let's define a new function
>> N <- function(x) length(x[!is.na(x)])
>> 
>> test <- 
>> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
>> test$CL[1] <- NA
>> 
>> do.stats(test, stats.func=c('mean','sd','median','min','max','N'), 
>> summary.var=c('CL','V1', 'V2','ALPHA'))
>> 
>> gives
>> 
>>         mean    sd  median   min  max  N
>> CL    -0.0232 0.918 -0.0786 -2.14 3.14 99
>> V1    -0.0410 0.936 -0.1160 -2.86 2.67 99
>> V2    -0.1760 0.978 -0.1490 -2.31 2.15 99
>> ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99
>> 
>> 
>> there is one non-missing value in V1,V2 and ALPHA is omitted.
>> 
>> 
>> On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <dwinsem...@comcast.net> 
>> wrote:
>> 
>>> On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen...@gmail.com> wrote:
>>> 
>>> Because in reality the NA may appear in one variable but not others. For
>>> example for ID=1, CL may be NA but not for others, For ID=2, V1 may be NA
>>> etc. To keep all the IDs and all the variables in one data frame, it's
>>> inevitable to see some NA
>> 
>> That doesn't seem to acknowledge Newmiller's advice. In particular this 
>> would have seemed to an obvious response to that suggestion:
>> 
>> do.stats <- function(data, stats.func, summary.var)
>>          as.data.frame(signif(sapply(stats.func,function(func)
>> mapply( func,  na.omit( data[summary.var]) )), 3))
>> 
>> 
>> And please also heed the advice in the Posting Guide to use plain text.
>> 
>> --
>> David.
>> 
>> 
>> 
>>> 
>>> On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
>>> wrote:
>>> 
>>>> Why not remove it yourself before passing it to those functions?
>>>> --
>>>> Sent from my phone. Please excuse my brevity.
>>>> 
>>>> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen...@gmail.com> wrote:
>>>>> Dear list,
>>>>> 
>>>>> I write a small function to calculate multiple stats on multiple
>>>>> variables
>>>>> and export in a format exactly the way I want. Everything seems fine
>>>>> until
>>>>> NA appears in the data.
>>>>> 
>>>>> Here is my function:
>>>>> 
>>>>> do.stats <- function(data, stats.func, summary.var)
>>>>>          as.data.frame(signif(sapply(stats.func,function(func)
>>>>> mapply(func,data[summary.var])),3))
>>>>> 
>>>>> A test dataset:
>>>>> test <-
>>>> 
>>>>> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
>>>>> 
>>>>> a command like the following
>>>>> do.stats(test, stats.func=c('mean','sd','median','min','max'),
>>>>> summary.var=c('CL','V1', 'V2','ALPHA'))
>>>>> 
>>>>> gives me
>>>>> 
>>>>>       mean    sd  median   min  max
>>>>> CL     0.1030 0.917  0.0363 -2.32 2.47
>>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>>>> 
>>>>> 
>>>>> However if I have a NA in the data
>>>>> test$CL[1] <- NA
>>>>> 
>>>>> The same command run gives me
>>>>>       mean    sd  median   min  max
>>>>> CL        * NA    NA      NA    NA   NA*
>>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>>>> 
>>>>> I know this is because those functions (mean, sd etc.) all have
>>>>> na.rm=F by default. How can I
>>>>> 
>>>>> pass na.rm=T to all these functions without manually redefining those
>>>>> stats functions
>>>>> 
>>>>> Appreciate any comment.
>>>>> 
>>>>> Thanks for your help.
>>>>> 
>>>>> 
>>>>> Jun
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to