Dear Solomon,

On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
<solomon.mess...@gmail.com> wrote:
> Dear Soren and R users:
>
> I am trying to use the summaryBy function with weights.  Is this possible?  
> An example that illustrates what I am trying to do follows:
>
> library(doBy)
> ## make up some data
> response = rnorm(100)
> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
> weights = runif(100, 0, 1)
> mydata = data.frame(response,group,weights)
>
> ## run summaryBy without weights:
> summaryBy(response~group, data = mydata, FUN = mean)
>
> ## attempt to run summaryBy with weights, throws error
> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
>
> ## throws the error:
> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
> #                                       arguments must have same length
>
> My guess is that summaryBy is not giving weighted.mean() each group of 
> weights, but instead is passing all of the weights in the data set each time 
> it calls weighted.mean().

Yes, of course.  It has no way of knowing that the weights should also
be being broken down by group....they are not in the formula.

>  Do you know if there is some way to get summaryBy to pass weights to 
> weighted.mean() only for each group?

Ideally there would be a way to pass more than one variable to a
function (e.g., response and weights) or just an entire object
(mydata) broken down by group.  Then you would just make a wrapper
function to pass the right values to the x and w arguments of
weighted.mean.  Instead here is a somewhat hacked version:

library(doBy)
## make up some data (easier)
mydata <- data.frame(response = rnorm(100),
 group = rep(1:5, each = 20), weights = runif(100, 0, 1))

## manually compute weighted mean
tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum)
tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum))
tmp ## weighted means

## here's the 'problem', if you will, even with  +, they are passed
one at a time
summaryBy(response + weights ~ group, data = mydata, FUN = str)
summaryBy(mydata ~ group, data = mydata, FUN = str)

## here is an option using by():
xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights))
xy
## if you don't like the formatting....
data.frame(group = names(c(xy)), weighted.mean = c(xy))

HTH,

Josh

>
> I suspect this functionality would be a tremendous benefit to R users who 
> regularly work with weighted data, such as myself.
>
> Thanks,
>
> Solomon Messing
> www.stanford.edu/~messing
>
> PS I know this basic example can be done using lapply(split(...)) approach 
> referenced here:
>
> http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg12349.html
>
> but for more complex tasks the lapply approach will mean writing a lot of 
> extra code to run everything and then to get things formatted as nicely as 
> summaryBy() was designed to do.
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to