Re: [R] means, SD's and tapply
Hi Christopher, i think you have the same problem like me today :) see this http://r.789695.n4.nabble.com/group-by-in-data-frame-tc3324240.html post i think you can find there the solution zem -- View this message in context: http://r.789695.n4.nabble.com/means-SD-s-and-tapply-tp3325158p3325191.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] means, SD's and tapply
chris, it seems like you need the plyr package, esp ddply. for example: stems353 <- data.frame(Time = rep(c("Modern", "Old"), 4), SizeClass = rep(c("class1","class2"), each = 4), Species = rep(c("a","b"), each = 4), Stems = seq(1,8,1)) ddply(stems353, .(Species, SizeClass, Time), summarise, mean = mean(Stems) ) On Friday, February 25, 2011 at 2:09 PM, Christopher R. Dolanc wrote: > I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a snippet > of my data > > > stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO > Class1 3 > 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA > Class1 0 > 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA > Class1 0 > 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA > Class1 0 > 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME > Class1 0 > 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE > Class1 15 > 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE > Class1 0 > 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU > Class1 0 > 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA > Class1 0 > 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" and > "SizeClass". I can get R to give me this for means by species: > > > tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMO PIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWI Salix > 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 > > > > but I really need to see each species by SizeClass and Time so that each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > > > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean) > Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbol > > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to work > > > mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA > > > > Thanks, > Chris Dolanc > > -- > Christopher R. Dolanc > PhD Candidate > Ecology Graduate Group > University of California, Davis > Lab Phone: (530) 752-2644 (Barbour lab) > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] means, SD's and tapply
Hi: On Fri, Feb 25, 2011 at 12:09 PM, Christopher R. Dolanc < crdol...@ucdavis.edu> wrote: > I'm trying to use tapply to output means and SD or SE for my data but > seem to be limited by how many times I can subset it. Here's a snippet > of my data > > > stems353[1:10,] > Time DataSource Plot Elevation Aspect Slope Type Species > SizeClass Stems > 1 ModernCameron 70F221 1730ESE20 ConiferABCO > Class1 3 > 2 ModernCameron 70F221 1730ESE20 ConiferABMA > Class1 0 > 3 ModernCameron 70F221 1730ESE20 HardwoodACMA > Class1 0 > 4 ModernCameron 70F221 1730ESE20 HardwoodAECA > Class1 0 > 5 ModernCameron 70F221 1730ESE20 HardwoodARME > Class1 0 > 6 ModernCameron 70F221 1730ESE20 ConiferCADE > Class115 > 7 ModernCameron 70F221 1730ESE20 HardwoodCELE > Class1 0 > 8 ModernCameron 70F221 1730ESE20 HardwoodCONU > Class1 0 > 9 ModernCameron 70F221 1730ESE20 ConiferJUCA > Class1 0 > 10 ModernCameron 70F221 1730ESE20 ConiferJUOC > Class1 0 > > I'd like to see means/SD of "Stems" stratified by "Species", "Time" and > "SizeClass". I can get R to give me this for means by species: > > > tapply(stems353$Stems, stems353$Species, mean) > ABCO ABMA ACMA AECA > ARME CADE CELE > 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 > 0.4684844193 0.0063739377 > CONU JUCA JUOC LIDE > PIAL PICO PIJE > 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 > 1.5651558074 0.2315864023 > PILA PIMOPIMO2 PIPO > PISA POTR PSME > 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 > 0.0506373938 0.2000708215 > QUCH QUDO QUDU QUKE > QULO QUWISalix > 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 > 0.0548866856 0.0003541076 > SEGI TSME > 0.0021246459 0.5017705382 > > > There are several approaches here, including the aggregate() function in base R, the doBy package or the plyr package, among others: # Requires R 2.11.0 or above: aggregate(Stems ~ Species + Time + SizeClass, data = stems353, FUN = mean) # To get more than one output per group, one can use either of the above packages: library(plyr) ddply(stems353, .(Species, Time, SizeClass), summarise, avgStems = mean(Stems), sdStems = sd(Stems)) library(doBy) f <- function(x) c(mean = mean(x), sd = sd(x)) summaryBy(Stems ~ Species + Time + SizeClass, data = stems353, FUN = f) # Another possibility is package data.table: dt <- data.table(stems353,key = 'Species, Time, SizeClass') dt[, list(avgStems = mean(Stems), sdStems = sd(Stems)), by = 'Species, Time, SizeClass'] All of this is untested, so caveat emptor. Other possibilities include package sqldf, if you are comfortable with SQL syntax, package remix or package Hmisc. In other words, R has a number of efficient ways to summarize data. HTH, Dennis > > but I really need to see each species by SizeClass and Time so that each > value would be labeled something like "ABCOSizeClass1TimeModern". > Adding 2 variables to the function doesn't seem to work > > > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, > stems353$Time, mean) > Error in match.fun(FUN) : > 'stems353$SizeClass' is not a function, character or symbol > > I've already created proper subsets for each of these groups, e.g. one > subset is called "stems353ABCO1" and I can run analyses on this. But, > trying to extract means straight from those subsets doesn't seem to work > > > mean(stems353ABCO1) > [1] NA > Warning message: > In mean.default(stems353ABCO1) : > argument is not numeric or logical: returning NA > > > > Thanks, > Chris Dolanc > > -- > Christopher R. Dolanc > PhD Candidate > Ecology Graduate Group > University of California, Davis > Lab Phone: (530) 752-2644 (Barbour lab) > > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] means, SD's and tapply
On Feb 25, 2011, at 3:09 PM, Christopher R. Dolanc wrote: I'm trying to use tapply to output means and SD or SE for my data but seem to be limited by how many times I can subset it. Here's a snippet of my data stems353[1:10,] Time DataSource Plot Elevation Aspect Slope Type Species SizeClass Stems 1 ModernCameron 70F221 1730ESE20 ConiferABCO Class1 3 2 ModernCameron 70F221 1730ESE20 ConiferABMA Class1 0 3 ModernCameron 70F221 1730ESE20 HardwoodACMA Class1 0 4 ModernCameron 70F221 1730ESE20 HardwoodAECA Class1 0 5 ModernCameron 70F221 1730ESE20 HardwoodARME Class1 0 6 ModernCameron 70F221 1730ESE20 ConiferCADE Class115 7 ModernCameron 70F221 1730ESE20 HardwoodCELE Class1 0 8 ModernCameron 70F221 1730ESE20 HardwoodCONU Class1 0 9 ModernCameron 70F221 1730ESE20 ConiferJUCA Class1 0 10 ModernCameron 70F221 1730ESE20 ConiferJUOC Class1 0 I'd like to see means/SD of "Stems" stratified by "Species", "Time" and "SizeClass". I can get R to give me this for means by species: tapply(stems353$Stems, stems353$Species, mean) ABCO ABMA ACMA AECA ARME CADE CELE 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 0.4684844193 0.0063739377 CONU JUCA JUOC LIDE PIAL PICO PIJE 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 1.5651558074 0.2315864023 PILA PIMOPIMO2 PIPO PISA POTR PSME 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 0.0506373938 0.2000708215 QUCH QUDO QUDU QUKE QULO QUWISalix 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 0.0548866856 0.0003541076 SEGI TSME 0.0021246459 0.5017705382 but I really need to see each species by SizeClass and Time so that each value would be labeled something like "ABCOSizeClass1TimeModern". Adding 2 variables to the function doesn't seem to work tapply(stems353$Stems, stems353$Species, stems353$SizeClass, stems353$Time, mean) Some functions let you put an arbitrary number of items after the first (aggregate() always confuses me because it _does_ this) but tapply expects them to be in a list or vector, so try: with( stems353, tapply(Stems, list(Species, SizeClass, Time) , mean) ) with() improves readability Error in match.fun(FUN) : 'stems353$SizeClass' is not a function, character or symbol The third item in your arguments got matched to what tapply was expecting to be a function name. I've already created proper subsets for each of these groups, e.g. one subset is called "stems353ABCO1" and I can run analyses on this. But, trying to extract means straight from those subsets doesn't seem to work mean(stems353ABCO1) [1] NA Warning message: In mean.default(stems353ABCO1) : argument is not numeric or logical: returning NA David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] means, SD's and tapply
I'm trying to use tapply to output means and SD or SE for my data but seem to be limited by how many times I can subset it. Here's a snippet of my data > stems353[1:10,] Time DataSource Plot Elevation Aspect Slope Type Species SizeClass Stems 1 ModernCameron 70F221 1730ESE20 ConiferABCO Class1 3 2 ModernCameron 70F221 1730ESE20 ConiferABMA Class1 0 3 ModernCameron 70F221 1730ESE20 HardwoodACMA Class1 0 4 ModernCameron 70F221 1730ESE20 HardwoodAECA Class1 0 5 ModernCameron 70F221 1730ESE20 HardwoodARME Class1 0 6 ModernCameron 70F221 1730ESE20 ConiferCADE Class115 7 ModernCameron 70F221 1730ESE20 HardwoodCELE Class1 0 8 ModernCameron 70F221 1730ESE20 HardwoodCONU Class1 0 9 ModernCameron 70F221 1730ESE20 ConiferJUCA Class1 0 10 ModernCameron 70F221 1730ESE20 ConiferJUOC Class1 0 I'd like to see means/SD of "Stems" stratified by "Species", "Time" and "SizeClass". I can get R to give me this for means by species: > tapply(stems353$Stems, stems353$Species, mean) ABCO ABMA ACMA AECA ARME CADE CELE 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 0.4684844193 0.0063739377 CONU JUCA JUOC LIDE PIAL PICO PIJE 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 1.5651558074 0.2315864023 PILA PIMOPIMO2 PIPO PISA POTR PSME 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 0.0506373938 0.2000708215 QUCH QUDO QUDU QUKE QULO QUWISalix 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 0.0548866856 0.0003541076 SEGI TSME 0.0021246459 0.5017705382 > but I really need to see each species by SizeClass and Time so that each value would be labeled something like "ABCOSizeClass1TimeModern". Adding 2 variables to the function doesn't seem to work > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, stems353$Time, mean) Error in match.fun(FUN) : 'stems353$SizeClass' is not a function, character or symbol I've already created proper subsets for each of these groups, e.g. one subset is called "stems353ABCO1" and I can run analyses on this. But, trying to extract means straight from those subsets doesn't seem to work > mean(stems353ABCO1) [1] NA Warning message: In mean.default(stems353ABCO1) : argument is not numeric or logical: returning NA > Thanks, Chris Dolanc -- Christopher R. Dolanc PhD Candidate Ecology Graduate Group University of California, Davis Lab Phone: (530) 752-2644 (Barbour lab) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.