Re: [R] Calculation of group summaries

Søren Højsgaard Fri, 15 Jul 2005 02:59:05 -0700

Perhaps I lost track of what the original question was, but on my homepage 
http://genetics.agrsci.dk/~sorenh/misc/ there is a package called doBy in which 
there is a function called summaryBy (which mimics proc summary from sas). For 
example
        summaryBy(cbind(Weight,Feed)~Evit+Cu,  data=dietox12, 
FUN=c(mean,myvar))  
Søren



-----Oprindelig meddelelse-----
Fra: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] På vegne af Gabor Grothendieck
Sendt: 15. juli 2005 04:43
Til: [EMAIL PROTECTED]
Cc: R-Help
Emne: Re: [R] Calculation of group summaries

1. Try using more spaces so your code is easier to read.

2. Use data.frame to define your data frame (since the method in your post 
creates data frames of factors rather than the desired classes).  

3. Given the appropriate function, f, a single 'by' statement rbind'ed 
together, as shown, will create the result.

nsites <- 6
yearList <- 1999:2001
fakesub <- data.frame(
        year = rep(yearList, nsites/length(yearList), each = 11),
        site_id  = rep(c('site1','site2'), each = 11*nsites),
        visit_no = rep(1, 11*2*nsites),
        transect = rep(LETTERS[1:11], nsites, each = 2),
        transdir = rep(c('LF','RT'), 11*nsites),
        undercut = abs(rnorm(11*2*nsites, 10)),
        angle    = runif(11*2*nsites, 0, 180)
)


f <- function(x) cbind(year = x[1,1], site_id = x[1,2], visit_no = x[1,3], 
        mean = mean(x[,6]), sd = sd(x[,6]), length = length(x[,6])) 
do.call("rbind", by(fakesub, fakesub[,1:3], f))





On 7/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Several people suggested specific functions (by, tapply, sapply and 
> others); thanks for not blowing off a simple question regarding how to 
> do the following SQL in R:
> >   select year,
> >          site_id,
> >          visit_no,
> >          mean(undercut) AS meanUndercut,
> >          count(undercut) AS nUndercut,
> >          std(undercut) AS stdUndercut
> >   from channelMorphology
> >   group by year, site_id, visit_no
> >   ;
> 
> I'd spent quite a bit of time with the suggested functions earlier but 
> had no luck as I'd misread the docs and put the entire dataframe where 
> it only wants the columns to be processed.  Sometimes it's the 
> simplest of things.
> 
> This has lead to another confoundment-- sd() acts differently than
> mean() for some reason, at least with R 1.9.0.  For some reason, means 
> generate NA results and a warning message for each group:
> 
>  argument is not numeric or logical: returning NA in:
> mean.default(data[x, ], ...)
> 
> Of course, the argument is numeric, or there'd be no sd value.  Or 
> more likely, I'm still missing something really basic. If I wrap the 
> value in
> as.numeric() things work fine.  Why should I have to do this for mean 
> and median, but not sd? The code below should reproduce this error
> 
>  # Fake data for demo:
>  nsites<-6
>  yearList<-1999:2001
>  fakesub<-as.data.frame(cbind(
>                 year     =rep(yearList,nsites/length(yearList),each=11)
>                ,site_id  =rep(c('site1','site2'),each=11*nsites)
>                ,visit_no =rep(1,11*2*nsites)
>                ,transect =rep(LETTERS[1:11],nsites,each=2)
>                ,transdir =rep(c('LF','RT'),11*nsites)
>                ,undercut =abs(rnorm(11*2*nsites,10))
>                ,angle    =runif(11*2*nsites,0,180)
>                ))
> 
>  # Create group summaries:
>  sdmets<-by(fakesub$undercut
>            ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
>            ,sd
>            )
>  nmets<-by(fakesub$undercut
>           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
>           ,length
>           )
>  xmets<-by(fakesub$undercut
>           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
>           ,mean
>           )
>   xmets<-by(as.numeric(fakesub$undercut)
>           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
>           ,mean
>           )
> 
>  # Put site id values (year, site_id and visit_no) into results:
>  # List unique id combinations as a list of lists.  Then  # reorganize 
> that into 3 vectors for final results.
>  # Certainly, there MUST be a better way...
>  foo<-strsplit(unique(paste(fakesub$year
>                            ,fakesub$site_id
>                            ,fakesub$visit_no
>                            ,sep='#'))
>               ,split='#'
>               )
>  year<-list()
>  for(i in 1:length(foo)) {year<-rbind(year,foo[[i]][1])}
>  site_id<-list()
>  for(i in 1:length(foo)) {site_id<-rbind(site_id,foo[[i]][2])}
>  visit_no<-list()
>  for(i in 1:length(foo)) {visit_no<-rbind(visit_no,foo[[i]][3])}
> 
>  # Final result, more or less
>  data.frame(cbind(a=year,b=site_id,c=visit_no,sdmets,nmets,xmets))
> 
> 
> cur
> 
> --
> Curt Seeliger, Data Ranger
> CSC, EPA/WED contractor
> 541/754-4638
> [EMAIL PROTECTED]
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Calculation of group summaries

Reply via email to