I received two additional suggestions, one off-list, both appended below. Both helped me to learn a bit more about how to get what I want.

First, the aggregate() function is in package:stats, it provides the numbers I needed, but I don't like the output format as much as I liked the format from doBy:summaryBy(). Here it is:

aggregate(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, 
function(x) c(mean=mean(x), sd=sd(x), quantile(x), N=length(x)))
   Generation Zygosity    Sex Cohort ESstatus    Age.mean      Age.sd      
Age.0%     Age.25%     Age.50%     Age.75%    Age.100%       Age.N
1   Offspring       DZ Female     11       ES  17.7852830   0.3535863  
16.9300000  17.6000000  17.7750000  17.9650000  18.9200000 106.0000000
2      Parent       DZ Female     11       ES  44.6151240   5.1246314  
32.1700000  41.3400000  44.6800000  48.2800000  57.9500000 121.0000000
3   Offspring       MZ Female     11       ES  17.8762755   0.4506530  
16.8600000  17.6775000  17.8050000  18.1000000  19.1200000 196.0000000
4      Parent       MZ Female     11       ES  44.2347573   5.0214627  
29.5500000  40.6925000  44.1250000  47.7300000  56.7300000 206.0000000
5   Offspring       DZ   Male     11       ES  17.7614925   0.3467540  
17.1800000  17.5150000  17.7150000  18.0000000  18.7100000 134.0000000
6      Parent       DZ   Male     11       ES  44.6020635   4.5605484  
34.3100000  41.4475000  44.8900000  47.4975000  58.7500000 126.0000000
7   Offspring       MZ   Male     11       ES  17.7717436   0.3236917  
16.8400000  17.5800000  17.7900000  17.9700000  19.0200000 195.0000000
8      Parent       MZ   Male     11       ES  43.4078680   5.3507439  
31.2800000  39.9700000  43.4400000  46.4800000  64.6500000 197.0000000
9   Offspring       DZ Female     11    notES  18.1367901   0.5555968  
16.7600000  17.8525000  18.1900000  18.4575000  19.5000000 162.0000000
10     Parent       DZ Female     11    notES  42.5434579   4.3670998  
34.0300000  39.3450000  42.1100000  45.5500000  57.0600000 107.0000000
11  Offspring       MZ Female     11    notES  18.0573883   0.6103713  
16.7600000  17.6300000  18.0500000  18.4200000  19.7000000 291.0000000
12     Parent       MZ Female     11    notES  42.3198837   5.3622671  
30.3100000  38.6050000  41.8350000  46.0175000  56.5800000 172.0000000
13  Offspring       DZ   Male     11    notES  17.8766667   0.5187333  
16.8300000  17.4600000  17.8600000  18.2400000  19.0200000 153.0000000
14     Parent       DZ   Male     11    notES  42.7112102   4.9600561  
32.0500000  39.2400000  42.7600000  45.2700000  58.2000000 157.0000000
15  Offspring       MZ   Male     11    notES  17.8771831   0.6472397  
16.5600000  17.3300000  17.8550000  18.2100000  20.0100000 284.0000000
16     Parent       MZ   Male     11    notES  41.5636254   4.6564818  
32.1000000  38.0250000  41.3900000  44.6450000  65.2900000 331.0000000
17  Offspring       DZ Female     17    notES  17.4752880   0.4569588  
16.5600000  17.0700000  17.5900000  17.8700000  18.2900000 191.0000000
18     Parent       DZ Female     17    notES  46.3055882   4.9177705  
36.1000000  42.7275000  45.7650000  48.3350000  62.6900000  68.0000000
19  Offspring       MZ Female     17    notES  17.4106076   0.4956190  
16.5500000  16.9700000  17.3400000  17.8200000  18.4500000 395.0000000
20     Parent       MZ Female     17    notES  46.3649032   5.1770435  
34.8800000  42.4200000  45.9500000  49.4950000  63.1800000 155.0000000
21  Offspring       DZ   Male     17    notES  17.5041818   0.3915823  
16.7300000  17.1900000  17.5300000  17.8300000  18.5200000 165.0000000
22     Parent       DZ   Male     17    notES  46.7745763   4.0226198  
40.1800000  44.1250000  46.0000000  48.8200000  61.1200000  59.0000000
23  Offspring       MZ   Male     17    notES  17.4911446   0.3961757  
16.6500000  17.1775000  17.5000000  17.8100000  18.3500000 332.0000000
24     Parent       MZ   Male     17    notES  46.6929771   5.2421896  
34.4500000  43.1500000  45.8900000  49.0050000  63.8000000 131.0000000

That's great but there are two things I didn't like: (1) There too many digits, especially on the integers in the last column. I thought five digits to the right of the decimal was more than enough but here we have seven, even for integers. (2) The ordering of levels within factors implied by the right side of the formula is not honored -- it looks like it used the order Cohort, ESstatus, Sex, Zygosity, Generation. Unlike doBy::summaryBy(), it does not accept an order=T argument (that is the default in doBy::summaryBy()).

One thing both suggestions taught me was to use names in function definitions so that I always get correct column headings on output. This was in the documentation for doBy::summaryBy(), but I didn't understand it when I first read it. Using that naming concept, I created this function:

descriptivefun <- function(x, ...){c(mean=mean(x, ...), sd=sd(x, ...), 
quantile(x, ...), N=sum(!is.na(x)), NAs=sum(is.na(x)))}

That will allow me to feed the na.rm=T argument to the mean, sd and quantile functions. By not naming the quantile function (e.g., not using q=quantile(x, ...)) I allow the builtin column names to be used unaltered (i.e., 0%, 25%, 50%, 75%, 100%). I also did not use length() because it will count NA values and I want to see the sample sizes used for mean, sd and quantile. To deal with that problem I created a function with output named "N" to count those sample sizes and one with output named "NAs" to count the number of NAs. Then I do this:

summaryBy(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, 
FUN=descriptivefun, na.rm=T)
   Generation Zygosity    Sex Cohort ESstatus Age.mean    Age.sd Age.0% Age.25% 
Age.50% Age.75% Age.100% Age.N Age.NAs
1   Offspring       DZ Female     11       ES 17.78528 0.3535863  16.93 17.6000 
 17.775 17.9650    18.92   106       0
2   Offspring       DZ Female     11    notES 18.13679 0.5555968  16.76 17.8525 
 18.190 18.4575    19.50   162       0
3   Offspring       DZ Female     17    notES 17.47529 0.4569588  16.56 17.0700 
 17.590 17.8700    18.29   191       0
4   Offspring       DZ   Male     11       ES 17.76149 0.3467540  17.18 17.5150 
 17.715 18.0000    18.71   134       0
5   Offspring       DZ   Male     11    notES 17.87667 0.5187333  16.83 17.4600 
 17.860 18.2400    19.02   153       0
6   Offspring       DZ   Male     17    notES 17.50418 0.3915823  16.73 17.1900 
 17.530 17.8300    18.52   165       0
7   Offspring       MZ Female     11       ES 17.87628 0.4506530  16.86 17.6775 
 17.805 18.1000    19.12   196       0
8   Offspring       MZ Female     11    notES 18.05739 0.6103713  16.76 17.6300 
 18.050 18.4200    19.70   291       0
9   Offspring       MZ Female     17    notES 17.41061 0.4956190  16.55 16.9700 
 17.340 17.8200    18.45   395       0
10  Offspring       MZ   Male     11       ES 17.77174 0.3236917  16.84 17.5800 
 17.790 17.9700    19.02   195       0
11  Offspring       MZ   Male     11    notES 17.87718 0.6472397  16.56 17.3300 
 17.855 18.2100    20.01   284       0
12  Offspring       MZ   Male     17    notES 17.49114 0.3961757  16.65 17.1775 
 17.500 17.8100    18.35   332       0
13     Parent       DZ Female     11       ES 44.61512 5.1246314  32.17 41.3400 
 44.680 48.2800    57.95   121       0
14     Parent       DZ Female     11    notES 42.54346 4.3670998  34.03 39.3450 
 42.110 45.5500    57.06   107       0
15     Parent       DZ Female     17    notES 46.30559 4.9177705  36.10 42.7275 
 45.765 48.3350    62.69    68       0
16     Parent       DZ   Male     11       ES 44.60206 4.5605484  34.31 41.4475 
 44.890 47.4975    58.75   126       0
17     Parent       DZ   Male     11    notES 42.71121 4.9600561  32.05 39.2400 
 42.760 45.2700    58.20   157       0
18     Parent       DZ   Male     17    notES 46.77458 4.0226198  40.18 44.1250 
 46.000 48.8200    61.12    59       0
19     Parent       MZ Female     11       ES 44.23476 5.0214627  29.55 40.6925 
 44.125 47.7300    56.73   206       0
20     Parent       MZ Female     11    notES 42.31988 5.3622671  30.31 38.6050 
 41.835 46.0175    56.58   172       0
21     Parent       MZ Female     17    notES 46.36490 5.1770435  34.88 42.4200 
 45.950 49.4950    63.18   155       0
22     Parent       MZ   Male     11       ES 43.40787 5.3507439  31.28 39.9700 
 43.440 46.4800    64.65   197       0
23     Parent       MZ   Male     11    notES 41.56363 4.6564818  32.10 38.0250 
 41.390 44.6450    65.29   331       0
24     Parent       MZ   Male     17    notES 46.69298 5.2421896  34.45 43.1500 
 45.890 49.0050    63.80   131       0

I think that output looks very nice. One thing that I don't understand is why my function produces %.5f output for every value but the doBy::summaryBy() function uses different formats in different columns. Compare the above output with this output:

descriptivefun(x$Age)
      mean         sd         0%        25%        50%        75%       100%    
      N        NAs
  28.49302   13.29077   16.55000   17.65000   18.23000   42.25500   65.29000 
4434.00000    0.00000

It's not a big deal, but it would be cool if I could tell doBy::summaryBy() how to format the numbers using something like format=c(rep("%.2f",7), rep("%d",2)).

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota



On Mon, 5 Aug 2013, David Carlson wrote:

This is a bit simpler. The function quantile() labels the output whereas fivenum() does not:

aggregate(Age ~ Generation + Zygosity + Sex + Cohort +
ESstatus, data=x,
   function(x) c(mean=mean(x), sd=sd(x), quantile(x)))


On Mon, 5 Aug 2013, Dr. Thomas W. MacFarland wrote:

Dear Dr. Miller:

Pasted below is syntax that should mostly answer your recent question to the R mailing list:

descriptivefun <- function(x, ...){
 c(m=mean(x, ...), sd=sd(x, ...), l=length(x))
 }

doBy::summaryBy(Final ~ Method.recode +
 ComCol.recode,
 data=Final.table,
 FUN=descriptivefun,
 na.rm=TRUE,
 keep.names=TRUE,
 order=TRUE)

I go into far more detail on this package::function and similar functions in my recent text on Twoway ANOVA,
http://www.springer.com/statistics/social+sciences+%26+law/book/978-1-4614-2133-7.

Best wishes.

Tom

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to