Jason Turner <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] wrote:
> 
> > How do I go about generating a WEIGHTED mean (and standard error) of a
> > variable (e.g., expenditures) for each level of a categorical variable
> > (e.g., geographic region)?  I'm looking for something comparable to PROC
> > MEANS in SAS with both a class and weight statement.
> 
> That's two questions.
> 1) to apply a weighted mean to a vector, see ?weighted.mean
 
> 2) to apply a function to data grouped by categorical variable, you
> probably need "by" or "tapply".  See the help pages and examples for
> both.

Three actually. Noone seems to have answered how to get the SD, and
that's a little more tricky.  

The simplest (well, the quickest) way to get the weighted SD is to do
a weighted regression analysis with just an intercept term:

x <- c(3,4,5); w <- c(2,5,7) # just for testing
summary(lm(x~1,weight=w))$sigma

# this is the weighted sum of squares on N-1 DF

wss <- sum((x-m)^2*w)
sqrt(wss/2)


Notice however that SAS also does frequency weighting where
(x=2.7,w=5) means that there are five observations of 2.7. 

In that case, the brute-force approach is 


sd(rep(x,w))

# which is the same as

sqrt(wss/13) # sum(w)-1 DF

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])             FAX: (+45) 35327907

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to