Thanks Josh. I built on your example and ended up with the code below--if you or anyone sees any issues please let me know. It would be great if there were a slicker way to get these kinds of summary stats in R, but this gets the job done.
# takes data frame z with weights w and data x, returns weighted mean, weighted SE, and N msenw = function(z){ N = length(na.omit(z)$response) i = which(!is.na(z$response)) return( c( W.M = weighted.mean(z$response, z$weights, na.rm=T), W.SE = sqrt(wtd.var(z$response, weights = z$weights))/sqrt(sum(z$weights[i])), N=N ) ) } library(doBy) library(Hmisc) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) xy <- by(mydata, mydata$group, msenw) data.frame( group = names(c(xy)), do.call(rbind, xy) ) ## can be extended to other data using: xy <- by(data.frame(response = mydata$response, weights = mydata$weights), mydata$group, msenw) Solomon Messing www.stanford.edu/~messing On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote: > Dear Solomon, > > On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing > <solomon.mess...@gmail.com> wrote: >> Dear Soren and R users: >> >> I am trying to use the summaryBy function with weights. Is this possible? >> An example that illustrates what I am trying to do follows: >> >> library(doBy) >> ## make up some data >> response = rnorm(100) >> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) >> weights = runif(100, 0, 1) >> mydata = data.frame(response,group,weights) >> >> ## run summaryBy without weights: >> summaryBy(response~group, data = mydata, FUN = mean) >> >> ## attempt to run summaryBy with weights, throws error >> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) >> >> ## throws the error: >> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : >> # arguments must have same length >> >> My guess is that summaryBy is not giving weighted.mean() each group of >> weights, but instead is passing all of the weights in the data set each time >> it calls weighted.mean(). > > Yes, of course. It has no way of knowing that the weights should also > be being broken down by group....they are not in the formula. > >> Do you know if there is some way to get summaryBy to pass weights to >> weighted.mean() only for each group? > > Ideally there would be a way to pass more than one variable to a > function (e.g., response and weights) or just an entire object > (mydata) broken down by group. Then you would just make a wrapper > function to pass the right values to the x and w arguments of > weighted.mean. Instead here is a somewhat hacked version: > > library(doBy) > ## make up some data (easier) > mydata <- data.frame(response = rnorm(100), > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > ## manually compute weighted mean > tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) > tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) > tmp ## weighted means > > ## here's the 'problem', if you will, even with +, they are passed > one at a time > summaryBy(response + weights ~ group, data = mydata, FUN = str) > summaryBy(mydata ~ group, data = mydata, FUN = str) > > ## here is an option using by(): > xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, > z$weights)) > xy > ## if you don't like the formatting.... > data.frame(group = names(c(xy)), weighted.mean = c(xy)) > > HTH, > > Josh > >> >> I suspect this functionality would be a tremendous benefit to R users who >> regularly work with weighted data, such as myself. >> >> Thanks, >> >> Solomon Messing >> www.stanford.edu/~messing >> >> PS I know this basic example can be done using lapply(split(...)) approach >> referenced here: >> >> http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg12349.html >> >> but for more complex tasks the lapply approach will mean writing a lot of >> extra code to run everything and then to get things formatted as nicely as >> summaryBy() was designed to do. >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.