look at just your data that is in that first id category and I bet you can
figure it out!

> myData[myData$id=='0m11',]
    var1  var2   id
10 30.79 32.15 0m11
11 30.79 32.39 0m11
12 30.94    NA 0m11

aggregate performs the na.rm step on the entire row thus, a mean of 30.79.
 data.table and plyr perform the na.rm on each column.


Justin

On Tue, Nov 29, 2011 at 12:21 PM, Juliet Hannah <juliet.han...@gmail.com>wrote:

> I am calculating the mean of each column grouped by the variable 'id'.
> I do this using aggregate, data.table, and plyr. My aggregate results
> do not match the other two, and I am trying to figure out what is
> incorrect with my syntax. Any suggestions? Thanks.
>
> Here is the data.
>
> myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
> 30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11,
> 30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75,
> 30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37,
> 29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25,
> 30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59,
> NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66,
> NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44,
> 31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56,
> NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c("0m4", "0m4", "0m4",
> "0m5", "0m5", "0m5", "0m6", "0m6", "0m6", "0m11", "0m11", "0m11",
> "0m12", "0m12", "0m12", "205m1", "205m1", "205m1", "205m4", "205m4",
> "205m4", "205m5", "205m5", "205m5", "205m6", "205m6", "205m6",
> "205m7", "205m7", "205m7", "600m1", "600m1", "600m1", "600m3",
> "600m3", "600m3", "600m4", "600m4", "600m4", "600m5", "600m5",
> "600m5", "600m7", "600m7", "600m7")), .Names = c("var1", "var2",
> "id"), row.names = c(NA, -45L), class = "data.frame")
>
> > head(myData)
>   var1  var2  id
> 1 31.59 33.78 0m4
> 2 32.21 33.25 0m4
> 3 31.78    NA 0m4
> 4 31.34 32.05 0m5
> 5 31.61 32.59 0m5
> 6 31.61    NA 0m5
>
>
>
> results1 <- aggregate(. ~  id ,data=myData,FUN=mean,na.rm=T)
>  head(results1,1)
> #    id  var1  var2
> # 1 0m11 30.79 32.27
>
> library(data.table)
> mydt <- data.table(myData)
> setkey(mydt,id)
> results2 <- mydt[,lapply(.SD,mean,na.rm=TRUE),by=id]
>  head(results2,1)
> #       id  var1  var2
> # [1,] 0m11 30.84 32.27
>
> library(plyr)
> results3 <- ddply(myData,.(id),colwise(mean),na.rm=TRUE)
>  head(results3,1)
> #    id  var1  var2
> # 1 0m11 30.84 32.27
>
> > sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] plyr_1.6         data.table_1.7.3
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to