Dear all, I'm struggling with weighted least squares, where something that I had assumed to be true appears not to be the case. Take the following data set as an example:
df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 + rnorm(100, sd=15) I had expected that: summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df))) would be equivalent, but they are not. I suspect the difference is how the degrees of freedom is calculated - I had expected it to be sum(weights), but seems to be sum(weights > 0). This seems unintuitive to me: summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50))) What am I missing? And what is the usual way to do a linear regression when you have aggregated data? Thanks, Hadley ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.