Weighting can be confusing: There are three standard forms of weighting which 
you need to be careful not to mix up, and I suspect that the imputation weights 
are really a 4th version. 

First, there is case (replication) vs. precision weighting. A weight of 10 
means one of

- I have 10 observations identical to this one
- This observation has a variance of sigma^2/10 as if it were the average of 10 
observations.

There are also sampling weights:

- For each observation like this, I have 10 similar observations in the 
population (and I want to estimate a population parameter like the national 
average income or the percentage of votes at a hypothetical general election). 

What R does in lm/glm is precision weights. Notice that when the variance is 
estimated from data, the weights are really only relative: if all observations 
are weighted equally (all 10, say), you get a 10-fold increase in the estimated 
sigma^2 and a tenfold decrease in the unscaled variance-covariance matrix. So 
the net result is that the standard errors are the same (but they won't be if 
the weights are unequal).

The three weighting schemes share the same formula for the estimates, but 
differ both in the estimated variance and df, and in the formula for the 
standard errors. 

Sampling weights are the domain of the survey package, but I don't think it 
does replication weights (someone called Thomas may chime in and educate me 
otherwise). I'm not quite sure, but I think you can get from a 
precision-weighted analysis to a case-weighted one just by adjusting the DF for 
error (changing the residual df to df+sum(w)-n, and sigma^2 proportionally).

Imputation weights look like the opposite of case weights: You give 10 
observations when in fact you have only one. An educated guess would be that 
you could do something similar as for case weights -- in this case sum(w) will 
be much less than n, so you will decrease the residual rather than increase it. 
I get this nagging feeling that it might still not be quite right, though -- in 
the cases where the imputations actually differ, do we get the extra 
variability of the variance right? Or maybe we don't need to care. There is a 
literature on the subject....

On May 25, 2012, at 09:21 , ilai wrote:

> I'm confused (I bet David is too). First and last models are "the
> same", what do SE's have to do with anything ?
> 
> naive <- glm(extra ~ group, data=sleep)
> imputWrong <- glm(extra ~ group, data=sleep10)
> imput <- glm(extra ~ group, data=sleep10,weights=rep(0.1,nrow(sleep10)))
> lapply(list(naive,imputWrong,imput),anova)
> sapply(list(naive,imuptWrong,imput),function(x) vcov(x)[1,1]/vcov(x)[2,2])
> # or another way to see it  (adjust for the DF)
> coef(summary(naive))[2,2] - sqrt(198)/sqrt(18) * coef(summary(imput))[2,2]
> coef(summary(naive))[2,2] - sqrt(198)/sqrt(18) * 
> coef(summary(imputWrong))[2,2]
> 
> Are you sure you are interpreting Wood et al. correctly ? (I haven't
> read it, this is not rhetorical)
> 
> On Wed, May 23, 2012 at 7:49 PM, Steve Taylor <steve.tay...@aut.ac.nz> wrote:
>> Re:
>> coef(summary(glm(extra ~ group, data=sleep[ rep(1:nrow(sleep), 10L), ] )))
>> 
>> Your (corrected) suggestion is the same as one of mine, and doesn't do what 
>> I'm looking for.
>> 
>> 
>> -----Original Message-----
>> From: David Winsemius [mailto:dwinsem...@comcast.net]
>> Sent: Tuesday, 22 May 2012 3:37p
>> To: Steve Taylor
>> Cc: r-help@r-project.org
>> Subject: Re: [R] glm(weights) and standard errors
>> 
>> 
>> On May 21, 2012, at 10:58 PM, Steve Taylor wrote:
>> 
>>> Is there a way to tell glm() that rows in the data represent a certain
>>> number of observations other than one?  Perhaps even fractional
>>> values?
>>> 
>>> Using the weights argument has no effect on the standard errors.
>>> Compare the following; is there a way to get the first and last models
>>> to produce the same results?
>>> 
>>> data(sleep)
>>> coef(summary(glm(extra ~ group, data=sleep))) coef(summary(glm(extra ~
>>> group, data=sleep,
>>> weights=rep(10L,nrow(sleep)))))
>> 
>> Here's a reasonably simple way to do it:
>> 
>> coef(summary(glm(extra ~ group, data=sleep[ rep(10L,nrow(sleep)), ] )))
>> 
>> 
>> --
>> David.
>> 
>>> sleep10 = sleep[rep(1:nrow(sleep),10),] coef(summary(glm(extra ~
>>> group, data=sleep10))) coef(summary(glm(extra ~ group, data=sleep10,
>>> weights=rep(0.1,nrow(sleep10)))))
>>> 
>>> My reason for asking is so that I can fit a model to a stacked
>>> multiple imputation data set, as suggested by:
>>> 
>>> Wood, A. M., White, I. R. and Royston, P. (2008), How should variable
>>> selection be performed with multiply imputed data?.
>>> Statist. Med., 27: 3227-3246. doi: 10.1002/sim.3177
>>> 
>>> Other suggestions would be most welcome.
>>> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to