On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:
The documentation for `aggregate` makes it sound like
aggregate.formula should behave identically to aggregate.data.frame
(apart from the way the parameters are passed). But it looks like
aggregate.formula is quietly removing rows where any of the "output"
variables (those on the LHS of the formula) are NA. This differs
from how aggregate.data.frame works. Is this expected behavior?
Here are a couple of examples:
d <- data.frame(a=rep(1:2, each=2),
+ b=c(1,2,NA,3))
aggregate(d["b"], d["a"], mean)
a b
1 1 1.5
2 2 NA
aggregate(b ~ a, d, mean)
a b
1 1 1.5
2 2 3.0
It's removing whole rows even if just one of the columns is NA, i.e.:
d <- data.frame(a=rep(1:2, each=2),
+ b=c(1,2,NA,3),
+ c=c(NA,2,3,NA))
aggregate(cbind(b,c) ~ a, d, mean)
a b c
1 1 2 2
The help page for aggregate gives the calling defaults for
aggregate.formula as:
## S3 method for class 'formula' aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
So the description you give seems to be adhering to what I would have
expected (had I initially read the help page.)
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.