Can someone please tell me what is up with na.action in aggregate? My (somewhat) reproducible example: (I say somewhat because some lines wouldn't run in a separate session, more below)
set.seed(100) dat=data.frame( x1=sample(c(NA,'m','f'), 100, replace=TRUE), x2=sample(c(NA, 1:10), 100, replace=TRUE), x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), x4=sample(c(NA,T,F), 100, replace=TRUE), y=sample(c(rep(NA,5), rnorm(95)))) dat ## The total from dat: sum(dat$y, na.rm=T) ## The total from aggregate: sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This line gave an error in a separate R instance ## The aggregate formula is excluding NA ## So, let's try to include NAs sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## The aggregate formula is STILL excluding NA ## In fact, the formula doesn't seem to notice the na.action sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man chew')$y) ## Hmmmm... that error surprised me (since the previous two things ran) ## So, let's try to change the global options ## (not mentioned in the help, but after reading the help ## 100 times, I thought I would go above and beyond to avoid ## any r list flames from people complaining ## that I didn't read the help... but that's a separate topic) options(na.action ="na.pass") sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## (NAs are still omitted) ## Even more frustrating... ## Why don't any of these work??? sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) ## This does work, but in my real data set, I want NA to really be NA for(j in 1:4) dat[is.na(dat[,j]),j] = 'NA' sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## My first session info # #> sessionInfo() #R version 2.12.0 (2010-10-15) #Platform: i386-pc-mingw32/i386 (32-bit) # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base # #other attached packages: # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 # #loaded via a namespace (and not attached): # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8 #[5] tools_2.12.0 I tried running that example in a different version of R, with and I got completely different results The other version of R wouldn't recognize the formula at all.. My other version of R: # My second session info #> sessionInfo() #R version 2.10.1 (2009-12-14) #i386-pc-mingw32 # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base #> # PS: Also, I have read the help on aggregate, factor, as.factor, and several other topics. If I missed something, please let me know. Some people like to reply to questions by telling the sender that R has documentation. Please don't. The R help archives are littered with reminders, friendly and otherwise, of R's documentation. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.