Re: [R] aggregate.formula implicitly removes rows containing NA

David Winsemius Tue, 11 Jan 2011 15:57:25 -0800


On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:

The documentation for `aggregate` makes it sound likeaggregate.formula should behave identically to aggregate.data.frame(apart from the way the parameters are passed). But it looks likeaggregate.formula is quietly removing rows where any of the "output"variables (those on the LHS of the formula) are NA. This differsfrom how aggregate.data.frame works. Is this expected behavior?
Here are a couple of examples:
d <- data.frame(a=rep(1:2, each=2),
+                 b=c(1,2,NA,3))
aggregate(d["b"], d["a"], mean)
 a   b
1 1 1.5
2 2  NA
aggregate(b ~ a, d, mean)
 a   b
1 1 1.5
2 2 3.0

It's removing whole rows even if just one of the columns is NA, i.e.:
d <- data.frame(a=rep(1:2, each=2),
+                 b=c(1,2,NA,3),
+                 c=c(NA,2,3,NA))
aggregate(cbind(b,c) ~ a, d, mean)
 a b c
1 1 2 2

The help page for aggregate gives the calling defaults foraggregate.formula as:## S3 method for class 'formula' aggregate(formula, data, FUN, ...,subset, na.action = na.omit)So the description you give seems to be adhering to what I would haveexpected (had I initially read the help page.)

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate.formula implicitly removes rows containing NA

Reply via email to