Thanks Josh.  I built on your example and ended up with the code below--if you 
or anyone sees any issues please let me know.  It would be great if there were 
a slicker way to get these kinds of summary stats in R, but this gets the job 
done.

# takes data frame z with weights w and data x, returns weighted mean, weighted 
SE, and N
msenw = function(z){
        N = length(na.omit(z)$response)
        i = which(!is.na(z$response)) 
        return( 
                        c( W.M = weighted.mean(z$response, z$weights, na.rm=T), 
                        W.SE = sqrt(wtd.var(z$response, weights = 
z$weights))/sqrt(sum(z$weights[i])), 
                        N=N ) )
}

library(doBy)
library(Hmisc)
## make up some data (easier)
mydata <- data.frame(response = rnorm(100),
                group = rep(1:5, each = 20), weights = runif(100, 0, 1)) 

xy <- by(mydata, mydata$group, msenw)
data.frame( group = names(c(xy)), do.call(rbind, xy) )

## can be extended to other data using:
xy <- by(data.frame(response = mydata$response, weights = mydata$weights), 
mydata$group, msenw)


Solomon Messing
www.stanford.edu/~messing



On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote:

> Dear Solomon,
> 
> On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
> <solomon.mess...@gmail.com> wrote:
>> Dear Soren and R users:
>> 
>> I am trying to use the summaryBy function with weights.  Is this possible?  
>> An example that illustrates what I am trying to do follows:
>> 
>> library(doBy)
>> ## make up some data
>> response = rnorm(100)
>> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
>> weights = runif(100, 0, 1)
>> mydata = data.frame(response,group,weights)
>> 
>> ## run summaryBy without weights:
>> summaryBy(response~group, data = mydata, FUN = mean)
>> 
>> ## attempt to run summaryBy with weights, throws error
>> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
>> 
>> ## throws the error:
>> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
>> #                                       arguments must have same length
>> 
>> My guess is that summaryBy is not giving weighted.mean() each group of 
>> weights, but instead is passing all of the weights in the data set each time 
>> it calls weighted.mean().
> 
> Yes, of course.  It has no way of knowing that the weights should also
> be being broken down by group....they are not in the formula.
> 
>>  Do you know if there is some way to get summaryBy to pass weights to 
>> weighted.mean() only for each group?
> 
> Ideally there would be a way to pass more than one variable to a
> function (e.g., response and weights) or just an entire object
> (mydata) broken down by group.  Then you would just make a wrapper
> function to pass the right values to the x and w arguments of
> weighted.mean.  Instead here is a somewhat hacked version:
> 
> library(doBy)
> ## make up some data (easier)
> mydata <- data.frame(response = rnorm(100),
> group = rep(1:5, each = 20), weights = runif(100, 0, 1))
> 
> ## manually compute weighted mean
> tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum)
> tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum))
> tmp ## weighted means
> 
> ## here's the 'problem', if you will, even with  +, they are passed
> one at a time
> summaryBy(response + weights ~ group, data = mydata, FUN = str)
> summaryBy(mydata ~ group, data = mydata, FUN = str)
> 
> ## here is an option using by():
> xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, 
> z$weights))
> xy
> ## if you don't like the formatting....
> data.frame(group = names(c(xy)), weighted.mean = c(xy))
> 
> HTH,
> 
> Josh
> 
>> 
>> I suspect this functionality would be a tremendous benefit to R users who 
>> regularly work with weighted data, such as myself.
>> 
>> Thanks,
>> 
>> Solomon Messing
>> www.stanford.edu/~messing
>> 
>> PS I know this basic example can be done using lapply(split(...)) approach 
>> referenced here:
>> 
>> http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg12349.html
>> 
>> but for more complex tasks the lapply approach will mean writing a lot of 
>> extra code to run everything and then to get things formatted as nicely as 
>> summaryBy() was designed to do.
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to