Re: [R] means, SD's and tapply

2011-02-25 Thread zem

Hi Christopher,

i think you have the same problem like me today :) 
see this  
http://r.789695.n4.nabble.com/group-by-in-data-frame-tc3324240.html post 
i think you can find there the solution

zem
-- 
View this message in context: 
http://r.789695.n4.nabble.com/means-SD-s-and-tapply-tp3325158p3325191.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] means, SD's and tapply

2011-02-25 Thread Scott Chamberlain
chris, it seems like you need the plyr package, esp ddply. for example:

stems353 <- data.frame(Time = rep(c("Modern", "Old"), 4),
SizeClass = rep(c("class1","class2"), each = 4),
Species = rep(c("a","b"), each = 4), 
Stems = seq(1,8,1))

ddply(stems353, .(Species, SizeClass, Time), summarise, 
mean = mean(Stems)
)

On Friday, February 25, 2011 at 2:09 PM, Christopher R. Dolanc wrote: 
> I'm trying to use tapply to output means and SD or SE for my data but 
> seem to be limited by how many times I can subset it. Here's a snippet 
> of my data
> 
> > stems353[1:10,]
>  Time DataSource Plot Elevation Aspect Slope Type Species 
> SizeClass Stems
> 1 Modern Cameron 70F221 1730 ESE 20 Conifer ABCO 
> Class1 3
> 2 Modern Cameron 70F221 1730 ESE 20 Conifer ABMA 
> Class1 0
> 3 Modern Cameron 70F221 1730 ESE 20 Hardwood ACMA 
> Class1 0
> 4 Modern Cameron 70F221 1730 ESE 20 Hardwood AECA 
> Class1 0
> 5 Modern Cameron 70F221 1730 ESE 20 Hardwood ARME 
> Class1 0
> 6 Modern Cameron 70F221 1730 ESE 20 Conifer CADE 
> Class1 15
> 7 Modern Cameron 70F221 1730 ESE 20 Hardwood CELE 
> Class1 0
> 8 Modern Cameron 70F221 1730 ESE 20 Hardwood CONU 
> Class1 0
> 9 Modern Cameron 70F221 1730 ESE 20 Conifer JUCA 
> Class1 0
> 10 Modern Cameron 70F221 1730 ESE 20 Conifer JUOC 
> Class1 0
> 
> I'd like to see means/SD of "Stems" stratified by "Species", "Time" and 
> "SizeClass". I can get R to give me this for means by species:
> 
> > tapply(stems353$Stems, stems353$Species, mean)
>  ABCO ABMA ACMA AECA 
> ARME CADE CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 
> 0.4684844193 0.0063739377
>  CONU JUCA JUOC LIDE 
> PIAL PICO PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 
> 1.5651558074 0.2315864023
>  PILA PIMO PIMO2 PIPO 
> PISA POTR PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 
> 0.0506373938 0.2000708215
>  QUCH QUDO QUDU QUKE 
> QULO QUWI Salix
> 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 
> 0.0548866856 0.0003541076
>  SEGI TSME
> 0.0021246459 0.5017705382
> >
> 
> but I really need to see each species by SizeClass and Time so that each 
> value would be labeled something like "ABCOSizeClass1TimeModern". 
> Adding 2 variables to the function doesn't seem to work
> 
> > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, 
> stems353$Time, mean)
> Error in match.fun(FUN) :
>  'stems353$SizeClass' is not a function, character or symbol
> 
> I've already created proper subsets for each of these groups, e.g. one 
> subset is called "stems353ABCO1" and I can run analyses on this. But, 
> trying to extract means straight from those subsets doesn't seem to work
> 
> > mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
>  argument is not numeric or logical: returning NA
> >
> 
> Thanks,
> Chris Dolanc
> 
> -- 
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)
> 
> 
>  [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] means, SD's and tapply

2011-02-25 Thread Dennis Murphy
Hi:

On Fri, Feb 25, 2011 at 12:09 PM, Christopher R. Dolanc <
crdol...@ucdavis.edu> wrote:

> I'm trying to use tapply to output means and SD or SE for my data but
> seem to be limited by how many times I can subset it.  Here's a snippet
> of my data
>
>  > stems353[1:10,]
>  Time DataSource   Plot Elevation Aspect Slope Type Species
> SizeClass Stems
> 1  ModernCameron 70F221  1730ESE20  ConiferABCO
> Class1 3
> 2  ModernCameron 70F221  1730ESE20  ConiferABMA
> Class1 0
> 3  ModernCameron 70F221  1730ESE20 HardwoodACMA
> Class1 0
> 4  ModernCameron 70F221  1730ESE20 HardwoodAECA
> Class1 0
> 5  ModernCameron 70F221  1730ESE20 HardwoodARME
> Class1 0
> 6  ModernCameron 70F221  1730ESE20  ConiferCADE
> Class115
> 7  ModernCameron 70F221  1730ESE20 HardwoodCELE
> Class1 0
> 8  ModernCameron 70F221  1730ESE20 HardwoodCONU
> Class1 0
> 9  ModernCameron 70F221  1730ESE20  ConiferJUCA
> Class1 0
> 10 ModernCameron 70F221  1730ESE20  ConiferJUOC
> Class1 0
>
> I'd like to see means/SD of "Stems" stratified by "Species", "Time" and
> "SizeClass".  I can get R to give me this for means by species:
>
>  > tapply(stems353$Stems, stems353$Species, mean)
> ABCO ABMA ACMA AECA
> ARME CADE CELE
> 0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
> 0.4684844193 0.0063739377
> CONU JUCA JUOC LIDE
> PIAL PICO PIJE
> 0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
> 1.5651558074 0.2315864023
> PILA PIMOPIMO2 PIPO
> PISA POTR PSME
> 0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
> 0.0506373938 0.2000708215
> QUCH QUDO QUDU QUKE
> QULO QUWISalix
> 0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076
> 0.0548866856 0.0003541076
> SEGI TSME
> 0.0021246459 0.5017705382
>  >
>

There are several approaches here, including the aggregate() function in
base R, the doBy package or the plyr package, among others:

# Requires R 2.11.0 or above:
aggregate(Stems ~ Species + Time + SizeClass, data = stems353, FUN = mean)

# To get more than one output per group, one can use either of the above
packages:

library(plyr)
ddply(stems353, .(Species, Time, SizeClass), summarise, avgStems =
mean(Stems), sdStems = sd(Stems))

library(doBy)
f <- function(x) c(mean = mean(x), sd = sd(x))
summaryBy(Stems ~ Species + Time + SizeClass, data = stems353, FUN = f)

# Another possibility is package data.table:
dt <- data.table(stems353,key = 'Species, Time, SizeClass')
dt[, list(avgStems = mean(Stems), sdStems = sd(Stems)), by = 'Species, Time,
SizeClass']

All of this is untested, so caveat emptor. Other possibilities include
package sqldf, if you are comfortable with SQL syntax, package remix or
package Hmisc. In other words, R has a number of efficient ways to summarize
data.

HTH,
Dennis

>
> but I really need to see each species by SizeClass and Time so that each
> value would be labeled something like "ABCOSizeClass1TimeModern".
> Adding 2 variables to the function doesn't seem to work
>
>  > tapply(stems353$Stems, stems353$Species, stems353$SizeClass,
> stems353$Time, mean)
> Error in match.fun(FUN) :
>   'stems353$SizeClass' is not a function, character or symbol
>
> I've already created proper subsets for each of these groups, e.g. one
> subset is called "stems353ABCO1" and I can run analyses on this.  But,
> trying to extract means straight from those subsets doesn't seem to work
>
>  > mean(stems353ABCO1)
> [1] NA
> Warning message:
> In mean.default(stems353ABCO1) :
>   argument is not numeric or logical: returning NA
>  >
>
> Thanks,
> Chris Dolanc
>
> --
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] means, SD's and tapply

2011-02-25 Thread David Winsemius


On Feb 25, 2011, at 3:09 PM, Christopher R. Dolanc wrote:


I'm trying to use tapply to output means and SD or SE for my data but
seem to be limited by how many times I can subset it.  Here's a  
snippet

of my data


stems353[1:10,]

 Time DataSource   Plot Elevation Aspect Slope Type Species
SizeClass Stems
1  ModernCameron 70F221  1730ESE20  ConiferABCO
Class1 3
2  ModernCameron 70F221  1730ESE20  ConiferABMA
Class1 0
3  ModernCameron 70F221  1730ESE20 HardwoodACMA
Class1 0
4  ModernCameron 70F221  1730ESE20 HardwoodAECA
Class1 0
5  ModernCameron 70F221  1730ESE20 HardwoodARME
Class1 0
6  ModernCameron 70F221  1730ESE20  ConiferCADE
Class115
7  ModernCameron 70F221  1730ESE20 HardwoodCELE
Class1 0
8  ModernCameron 70F221  1730ESE20 HardwoodCONU
Class1 0
9  ModernCameron 70F221  1730ESE20  ConiferJUCA
Class1 0
10 ModernCameron 70F221  1730ESE20  ConiferJUOC
Class1 0

I'd like to see means/SD of "Stems" stratified by "Species", "Time"  
and

"SizeClass".  I can get R to give me this for means by species:


tapply(stems353$Stems, stems353$Species, mean)

ABCO ABMA ACMA AECA
ARME CADE CELE
0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382
0.4684844193 0.0063739377
CONU JUCA JUOC LIDE
PIAL PICO PIJE
0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365
1.5651558074 0.2315864023
PILA PIMOPIMO2 PIPO
PISA POTR PSME
0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125
0.0506373938 0.2000708215
QUCH QUDO QUDU QUKE
QULO QUWISalix
0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076
0.0548866856 0.0003541076
SEGI TSME
0.0021246459 0.5017705382




but I really need to see each species by SizeClass and Time so that  
each

value would be labeled something like "ABCOSizeClass1TimeModern".
Adding 2 variables to the function doesn't seem to work


tapply(stems353$Stems, stems353$Species, stems353$SizeClass,

stems353$Time, mean)


Some functions let you put an arbitrary number of items after the  
first (aggregate() always confuses me because it _does_ this)  but  
tapply expects them to be in a list or vector, so try:


with( stems353, tapply(Stems, list(Species, SizeClass, Time) , mean) )

with() improves readability


Error in match.fun(FUN) :
  'stems353$SizeClass' is not a function, character or symbol


The third item in your arguments got matched to what tapply was  
expecting to be a function name.




I've already created proper subsets for each of these groups, e.g. one
subset is called "stems353ABCO1" and I can run analyses on this.  But,
trying to extract means straight from those subsets doesn't seem to  
work



mean(stems353ABCO1)

[1] NA
Warning message:
In mean.default(stems353ABCO1) :
  argument is not numeric or logical: returning NA







David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] means, SD's and tapply

2011-02-25 Thread Christopher R. Dolanc
I'm trying to use tapply to output means and SD or SE for my data but 
seem to be limited by how many times I can subset it.  Here's a snippet 
of my data

 > stems353[1:10,]
  Time DataSource   Plot Elevation Aspect Slope Type Species 
SizeClass Stems
1  ModernCameron 70F221  1730ESE20  ConiferABCO
Class1 3
2  ModernCameron 70F221  1730ESE20  ConiferABMA
Class1 0
3  ModernCameron 70F221  1730ESE20 HardwoodACMA
Class1 0
4  ModernCameron 70F221  1730ESE20 HardwoodAECA
Class1 0
5  ModernCameron 70F221  1730ESE20 HardwoodARME
Class1 0
6  ModernCameron 70F221  1730ESE20  ConiferCADE
Class115
7  ModernCameron 70F221  1730ESE20 HardwoodCELE
Class1 0
8  ModernCameron 70F221  1730ESE20 HardwoodCONU
Class1 0
9  ModernCameron 70F221  1730ESE20  ConiferJUCA
Class1 0
10 ModernCameron 70F221  1730ESE20  ConiferJUOC
Class1 0

I'd like to see means/SD of "Stems" stratified by "Species", "Time" and 
"SizeClass".  I can get R to give me this for means by species:

 > tapply(stems353$Stems, stems353$Species, mean)
 ABCO ABMA ACMA AECA 
ARME CADE CELE
0.7305240793 0.8569405099 0.0003541076 0.0010623229 0.0017705382 
0.4684844193 0.0063739377
 CONU JUCA JUOC LIDE 
PIAL PICO PIJE
0.0017705382 0.0003541076 0.0959631728 0.0138101983 0.3905807365 
1.5651558074 0.2315864023
 PILA PIMOPIMO2 PIPO 
PISA POTR PSME
0.1774079320 0.1880311615 0.0311614731 0.6735127479 0.0237252125 
0.0506373938 0.2000708215
 QUCH QUDO QUDU QUKE 
QULO QUWISalix
0.0474504249 0.1203966006 0.00 0.2071529745 0.0003541076 
0.0548866856 0.0003541076
 SEGI TSME
0.0021246459 0.5017705382
 >

but I really need to see each species by SizeClass and Time so that each 
value would be labeled something like "ABCOSizeClass1TimeModern".  
Adding 2 variables to the function doesn't seem to work

 > tapply(stems353$Stems, stems353$Species, stems353$SizeClass, 
stems353$Time, mean)
Error in match.fun(FUN) :
   'stems353$SizeClass' is not a function, character or symbol

I've already created proper subsets for each of these groups, e.g. one 
subset is called "stems353ABCO1" and I can run analyses on this.  But, 
trying to extract means straight from those subsets doesn't seem to work

 > mean(stems353ABCO1)
[1] NA
Warning message:
In mean.default(stems353ABCO1) :
   argument is not numeric or logical: returning NA
 >

Thanks,
Chris Dolanc

-- 
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.