Re: [R] Weighted least squares

2007-06-11 Thread Tim Hesterberg
As John noted, there are different kinds of weights, and 
different terminology:
* inverse-variance weights (accuracy weights)
* case weights (frequencies, counts)
* sampling weights (selection probability weights)

I'll add:
* inverse-variance weights, where var(y for observation) = 1/weight
  (as opposed to just being inversely proportional to the weight)
* weights used as part of an algorithm (e.g. for robust estimation,
  or glm's using iteratively-reweighted least-squares).

For linear regression, the type of weights don't affect regression
coefficient calculation, but do affect inferences such as standard errors
for the regression coefficients, degrees of freedom for variance
estimates, etc.  

lm() inferences assume the first type.  
Other formulae are appropriate for inferences for types 2-4.
Combinations of types 1-4 require other formulae; this gets nontrivial.
For the 5th type, inferences need to be handled by the algorithm that
is using weighted linear regression.

Tim Hesterberg

John Fox wrote:
I think that the problem is that the term weights has different meanings,
which, although they are related, are not quite the same. 

The weights used by lm() are (inverse-)variance weights, reflecting the
variances of the errors, with observations that have low-variance errors
therefore being accorded greater weight in the resulting WLS regression.
What you have are sometimes called case weights, and I'm unaware of a
general way of handling them in R, although you could regenerate the
unaggregated data. As you discovered, you get the same coefficients with
case weights as with variance weights, but different standard errors.
Finally, there are sampling weights, which are inversely proportional to
the probability of selection; these are accommodated by the survey package. 

To complicate matters, this terminology isn't entirely standard.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-09 Thread hadley wickham
Thanks John,

That's just the explanation I was looking for. I had hoped that there
would be a built in way of dealing with them with R, but obviously
not.

Given that explanation, it stills seems to me that the way R
calculates n is suboptimal, as demonstrated by my second example:

summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))

the weights are only very slightly different but the estimates of
residual standard error are quite different (20 vs 14 in my run)

Hadley

On 5/8/07, John Fox [EMAIL PROTECTED] wrote:
 Dear Hadley,

 I think that the problem is that the term weights has different meanings,
 which, although they are related, are not quite the same.

 The weights used by lm() are (inverse-)variance weights, reflecting the
 variances of the errors, with observations that have low-variance errors
 therefore being accorded greater weight in the resulting WLS regression.
 What you have are sometimes called case weights, and I'm unaware of a
 general way of handling them in R, although you could regenerate the
 unaggregated data. As you discovered, you get the same coefficients with
 case weights as with variance weights, but different standard errors.
 Finally, there are sampling weights, which are inversely proportional to
 the probability of selection; these are accommodated by the survey package.

 To complicate matters, this terminology isn't entirely standard.

 I hope this helps,
  John

 
 John Fox, Professor
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
 

  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham
  Sent: Tuesday, May 08, 2007 5:09 AM
  To: R Help
  Subject: [R] Weighted least squares
 
  Dear all,
 
  I'm struggling with weighted least squares, where something
  that I had assumed to be true appears not to be the case.
  Take the following data set as an example:
 
  df - data.frame(x = runif(100, 0, 100)) df$y - df$x + 1 +
  rnorm(100, sd=15)
 
  I had expected that:
 
  summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y
  ~ x, data=rbind(df,df)))
 
  would be equivalent, but they are not.  I suspect the
  difference is how the degrees of freedom is calculated - I
  had expected it to be sum(weights), but seems to be
  sum(weights  0).  This seems unintuitive to me:
 
  summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
  summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
  What am I missing?  And what is the usual way to do a linear
  regression when you have aggregated data?
 
  Thanks,
 
  Hadley
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-09 Thread hadley wickham
On 5/9/07, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote:
 http://en.wikipedia.org/wiki/Weighted_least_squares gives a formulaic
 description of what you have said.

Except it doesn't describe what I think is important in my case - how
do you calculate the degrees of freedom/n for weighted linear
regression?

 I believe the original poster has converted something like this

 y x
 0   1.1
 0   2.2
 0   2.2
 0   2.2
 1   3.3
 1   3.3
 2   4.4
  ...

 into something like the following

 y x freq
 0   1.11
 0   2.23
 1   3.32
 2   4.41
  ...

Exactly!  Thanks for providing that example.


 Now, the variance of means of each row in table above is ZERO because
 the individual elements that comprise each row are identical. Therefore
 your method of using inverse-variance will not work here.

 Then is it valid then to use lm( y ~ x, weights=freq ) ?

 Regards, Adai



 S Ellison wrote:
  Hadley,
 
  You asked
  .. what is the usual way to do a linear
  regression when you have aggregated data?
 
  Least squares generally uses inverse variance weighting. For aggregated 
  data fitted as mean values, you just need the variances for the _means_.
 
  So if you have individual means x_i and sd's s_i that arise from aggregated 
  data with n_i observations in group i, the natural weighting is by inverse 
  squared standard error of the mean. The appropriate weight for x_i would 
  then be n_i/(s_i^2). In R, that's n/(s^2), as n and s would be vectors with 
  the same length as x. If all the groups had the same variance, or nearly 
  so, s is a scalar; if they have the same number of observations, n is a 
  scalar.
 
  Of course, if they have the same variance and same number of observations, 
  they all have the same weight and you needn't weight them at all: see 
  previous posting!
 
  Steve E
 
 
 
  ***
  This email and any attachments are confidential. Any use, co...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-09 Thread John Fox
Dear Adai,

 -Original Message-
 From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 08, 2007 8:38 PM
 To: S Ellison
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; R-help@stat.math.ethz.ch
 Subject: Re: [R] Weighted least squares
 
 http://en.wikipedia.org/wiki/Weighted_least_squares gives a 
 formulaic description of what you have said.
 
 I believe the original poster has converted something like this
 
   y x
   0   1.1
   0   2.2
   0   2.2
   0   2.2
   1   3.3
   1   3.3
   2   4.4
  ...
 
 into something like the following
 
   y x freq
   0   1.11
   0   2.23
   1   3.32
   2   4.41
  ...
 
 Now, the variance of means of each row in table above is ZERO 
 because the individual elements that comprise each row are 
 identical. Therefore your method of using inverse-variance 
 will not work here.
 
 Then is it valid then to use lm( y ~ x, weights=freq ) ?

No, because the weights argument gives inverse-variance weights not case
weights.

Regards,
 John

 
 Regards, Adai
 
 
 
 S Ellison wrote:
  Hadley,
  
  You asked
  .. what is the usual way to do a linear regression when you have 
  aggregated data?
  
  Least squares generally uses inverse variance weighting. 
 For aggregated data fitted as mean values, you just need the 
 variances for the _means_. 
  
  So if you have individual means x_i and sd's s_i that arise 
 from aggregated data with n_i observations in group i, the 
 natural weighting is by inverse squared standard error of the 
 mean. The appropriate weight for x_i would then be 
 n_i/(s_i^2). In R, that's n/(s^2), as n and s would be 
 vectors with the same length as x. If all the groups had the 
 same variance, or nearly so, s is a scalar; if they have the 
 same number of observations, n is a scalar. 
  
  Of course, if they have the same variance and same number 
 of observations, they all have the same weight and you 
 needn't weight them at all: see previous posting!
  
  Steve E
  
  
  
  ***
  This email and any attachments are confidential. Any use, 
  co...{{dropped}}
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
  
  
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-09 Thread John Fox
Dear Hadley,

 -Original Message-
 From: hadley wickham [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, May 09, 2007 2:21 AM
 To: John Fox
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: [R] Weighted least squares
 
 Thanks John,
 
 That's just the explanation I was looking for. I had hoped 
 that there would be a built in way of dealing with them with 
 R, but obviously not.
 
 Given that explanation, it stills seems to me that the way R 
 calculates n is suboptimal, as demonstrated by my second example:
 
 summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) 
 summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
 the weights are only very slightly different but the 
 estimates of residual standard error are quite different (20 
 vs 14 in my run)
 

Observations with 0 weight are literally excluded, while those with very
small weight (relative to others) don't contribute much to the fit.
Consequently you get very similar coefficients but different numbers of
observations.

I hope this helps,
 John

 Hadley
 
 On 5/8/07, John Fox [EMAIL PROTECTED] wrote:
  Dear Hadley,
 
  I think that the problem is that the term weights has different 
  meanings, which, although they are related, are not quite the same.
 
  The weights used by lm() are (inverse-)variance weights, 
 reflecting 
  the variances of the errors, with observations that have 
 low-variance 
  errors therefore being accorded greater weight in the 
 resulting WLS regression.
  What you have are sometimes called case weights, and I'm 
 unaware of 
  a general way of handling them in R, although you could 
 regenerate the 
  unaggregated data. As you discovered, you get the same coefficients 
  with case weights as with variance weights, but different 
 standard errors.
  Finally, there are sampling weights, which are inversely 
  proportional to the probability of selection; these are 
 accommodated by the survey package.
 
  To complicate matters, this terminology isn't entirely standard.
 
  I hope this helps,
   John
 
  
  John Fox, Professor
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
 
   -Original Message-
   From: [EMAIL PROTECTED] 
   [mailto:[EMAIL PROTECTED] On Behalf Of hadley 
   wickham
   Sent: Tuesday, May 08, 2007 5:09 AM
   To: R Help
   Subject: [R] Weighted least squares
  
   Dear all,
  
   I'm struggling with weighted least squares, where 
 something that I 
   had assumed to be true appears not to be the case.
   Take the following data set as an example:
  
   df - data.frame(x = runif(100, 0, 100)) df$y - df$x + 1 + 
   rnorm(100, sd=15)
  
   I had expected that:
  
   summary(lm(y ~ x, data=df, weights=rep(2, 100))) 
 summary(lm(y ~ x, 
   data=rbind(df,df)))
  
   would be equivalent, but they are not.  I suspect the 
 difference is 
   how the degrees of freedom is calculated - I had expected 
 it to be 
   sum(weights), but seems to be sum(weights  0).  This seems 
   unintuitive to me:
  
   summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) 
   summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
  
   What am I missing?  And what is the usual way to do a linear 
   regression when you have aggregated data?
  
   Thanks,
  
   Hadley
  
   __
   R-help@stat.math.ethz.ch mailing list 
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-09 Thread S Ellison

 Adaikalavan Ramasamy [EMAIL PROTECTED] 09/05/2007 01:37:31 
..the variance of means of each row in table above is ZERO because 
the individual elements that comprise each row are identical. 
... Then is it valid then to use lm( y ~ x, weights=freq ) ?

ermmm... probably not, because if that heppened I'd strongly suspect we'd 
substantially violated some assumptions. 

We are given a number of groups of identical observations. But we are seeking a 
solution to a problem that posits an underlying variance. If it's not visible 
within the groups,  where is it? Has it disappeared in numerical precision, or 
is something else going on?

If we did this regression, we would see identical residuals for all members of 
a group. That would imply that the variance arises entirely from between-group 
effects and not at all from within-group effects. To me, that would in turn 
imply that the number of observations in the group is irrelevant; we should be 
using use unweighted regression on the group 'means' in this situation if we're 
using least squares at all. 

If we genuinely have independent observations and by some coincidence they have 
the same value within available precision, we might be justified in saying we 
can't see the variance within groups, but we can estimate it from the residual 
variance. That would be equivalent to assuming constant variance, and my 
n/(s^2) reduces to n except for a scaling factor. Using n alone would then be 
consistent with one's assumptions, I think. On the kind of data I get, though 
(mostly chemical measurement with continuous scales), I'd have considerable 
difficulty justifying that assumption. And if I didn't have that kind of data 
(or a reasonable approximation thereto) I'd be wondering whether I should be 
using linear regression at all. 

S

***
This email and any attachments are confidential. Any use, co...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weighted least squares

2007-05-08 Thread hadley wickham
Dear all,

I'm struggling with weighted least squares, where something that I had
assumed to be true appears not to be the case.  Take the following
data set as an example:

df - data.frame(x = runif(100, 0, 100))
df$y - df$x + 1 + rnorm(100, sd=15)

I had expected that:

summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))

would be equivalent, but they are not.  I suspect the difference is
how the degrees of freedom is calculated - I had expected it to be
sum(weights), but seems to be sum(weights  0).  This seems
unintuitive to me:

summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))

What am I missing?  And what is the usual way to do a linear
regression when you have aggregated data?

Thanks,

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy
See below.

hadley wickham wrote:
 Dear all,
 
 I'm struggling with weighted least squares, where something that I had
 assumed to be true appears not to be the case.  Take the following
 data set as an example:
 
 df - data.frame(x = runif(100, 0, 100))
 df$y - df$x + 1 + rnorm(100, sd=15)
 
 I had expected that:
 
 summary(lm(y ~ x, data=df, weights=rep(2, 100)))
 summary(lm(y ~ x, data=rbind(df,df)))

You assign weights to different points according to some external 
quality or reliability measure not number of times the data point was 
measured.

Look at the estimates and standard error of the two models below:

  coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) )
  Estimate Std. Error   t value Pr(|t|)
  (Intercept) 1.940765 3.30348066  0.587491 5.582252e-01
  x   0.982610 0.05893262 16.673448 2.264258e-30

  coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) )
  Estimate Std. Errort value Pr(|t|)
  (Intercept) 1.940765 2.32408609  0.8350659 4.046871e-01
  x   0.982610 0.04146066 23.6998165 1.012067e-59

You can see that they have same coefficient estimates but the second one 
  has smaller variances.

The repeated values artificially deflates the variance and thus inflates 
the precision. This is why you cannot treat replicate data as 
independent observations.


 would be equivalent, but they are not.  I suspect the difference is
 how the degrees of freedom is calculated - I had expected it to be
 sum(weights), but seems to be sum(weights  0).  This seems
 unintuitive to me:
 
 summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
 summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
 What am I missing?  And what is the usual way to do a linear
 regression when you have aggregated data?

I would be best to use the individual data points instead of aggregated 
data as it allows you to estimate the within-group variations as well.

If you had individual data points, you could try something as follows. 
Please check the codes as I am no expert in the area of repeated measures.

  x  - runif(100, 0, 100)
  y1 - x + rnorm(100, mean=1, sd=15)
  y2 - y1 + rnorm(100, sd=5)

  df - data.frame( y=c(y1, y2),
x=c(x,x),
subject=factor(rep( paste(p, 1:100, sep=), 2 ) ))

  library(nlme)
  summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )

Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related 
material for more information. Hope this helps.

 Thanks,
 
 Hadley

Regards, Adai

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread hadley wickham
On 5/8/07, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote:
 See below.

 hadley wickham wrote:
  Dear all,
 
  I'm struggling with weighted least squares, where something that I had
  assumed to be true appears not to be the case.  Take the following
  data set as an example:
 
  df - data.frame(x = runif(100, 0, 100))
  df$y - df$x + 1 + rnorm(100, sd=15)
 
  I had expected that:
 
  summary(lm(y ~ x, data=df, weights=rep(2, 100)))
  summary(lm(y ~ x, data=rbind(df,df)))

 You assign weights to different points according to some external
 quality or reliability measure not number of times the data point was
 measured.

That is one type of weighting - but what if I have already aggregated
data?  That is a perfectly valid type of weighting too.

 Look at the estimates and standard error of the two models below:

   coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) )
   Estimate Std. Error   t value Pr(|t|)
   (Intercept) 1.940765 3.30348066  0.587491 5.582252e-01
   x   0.982610 0.05893262 16.673448 2.264258e-30

   coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) )
   Estimate Std. Errort value Pr(|t|)
   (Intercept) 1.940765 2.32408609  0.8350659 4.046871e-01
   x   0.982610 0.04146066 23.6998165 1.012067e-59

 You can see that they have same coefficient estimates but the second one
   has smaller variances.

 The repeated values artificially deflates the variance and thus inflates
 the precision. This is why you cannot treat replicate data as
 independent observations.

Hardly artificially - I have repeated observations.

  would be equivalent, but they are not.  I suspect the difference is
  how the degrees of freedom is calculated - I had expected it to be
  sum(weights), but seems to be sum(weights  0).  This seems
  unintuitive to me:
 
  summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
  summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
  What am I missing?  And what is the usual way to do a linear
  regression when you have aggregated data?

 I would be best to use the individual data points instead of aggregated
 data as it allows you to estimate the within-group variations as well.

There is no within group variation - these are observations that occur
with same values many times in the dataset, so have been aggregated
into the a contingency table-like format.

 If you had individual data points, you could try something as follows.
 Please check the codes as I am no expert in the area of repeated measures.

   x  - runif(100, 0, 100)
   y1 - x + rnorm(100, mean=1, sd=15)
   y2 - y1 + rnorm(100, sd=5)

   df - data.frame( y=c(y1, y2),
 x=c(x,x),
 subject=factor(rep( paste(p, 1:100, sep=), 2 ) ))

   library(nlme)
   summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )

 Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related
 material for more information. Hope this helps.

I'm not interested in a mixed model, and I don't have individual data points.

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy
Sorry, you did not explain that your weights correspond to your 
frequency in the original post. I assumed they were repeated 
measurements with within group variation.

I was merely responding to your query why the following differed.
summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))

Let me also clarify my statement about artificial. If one treats 
repeated observations as independent, then they obtain estimates with 
inflated precision. I was not calling your data artificial in any way.

Using frequency as weights may be valid. Your data points appear to 
arise from discrete distribution, so I am not entirely sure if you can 
use the linear model which assumes the errors are normally distributed.

Regards, Adai



hadley wickham wrote:
 On 5/8/07, Adaikalavan Ramasamy [EMAIL PROTECTED] wrote:
 See below.

 hadley wickham wrote:
  Dear all,
 
  I'm struggling with weighted least squares, where something that I had
  assumed to be true appears not to be the case.  Take the following
  data set as an example:
 
  df - data.frame(x = runif(100, 0, 100))
  df$y - df$x + 1 + rnorm(100, sd=15)
 
  I had expected that:
 
  summary(lm(y ~ x, data=df, weights=rep(2, 100)))
  summary(lm(y ~ x, data=rbind(df,df)))

 You assign weights to different points according to some external
 quality or reliability measure not number of times the data point was
 measured.
 
 That is one type of weighting - but what if I have already aggregated
 data?  That is a perfectly valid type of weighting too.
 
 Look at the estimates and standard error of the two models below:

   coefficients( summary(f.w - lm(y ~ x, data=df, weights=rep(2, 100))) )
   Estimate Std. Error   t value Pr(|t|)
   (Intercept) 1.940765 3.30348066  0.587491 5.582252e-01
   x   0.982610 0.05893262 16.673448 2.264258e-30

   coefficients( summary( f.u - lm(y ~ x, data=rbind(df,df) ) ) )
   Estimate Std. Errort value Pr(|t|)
   (Intercept) 1.940765 2.32408609  0.8350659 4.046871e-01
   x   0.982610 0.04146066 23.6998165 1.012067e-59

 You can see that they have same coefficient estimates but the second one
   has smaller variances.

 The repeated values artificially deflates the variance and thus inflates
 the precision. This is why you cannot treat replicate data as
 independent observations.
 
 Hardly artificially - I have repeated observations.
 
  would be equivalent, but they are not.  I suspect the difference is
  how the degrees of freedom is calculated - I had expected it to be
  sum(weights), but seems to be sum(weights  0).  This seems
  unintuitive to me:
 
  summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
  summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
  What am I missing?  And what is the usual way to do a linear
  regression when you have aggregated data?

 I would be best to use the individual data points instead of aggregated
 data as it allows you to estimate the within-group variations as well.
 
 There is no within group variation - these are observations that occur
 with same values many times in the dataset, so have been aggregated
 into the a contingency table-like format.
 
 If you had individual data points, you could try something as follows.
 Please check the codes as I am no expert in the area of repeated 
 measures.

   x  - runif(100, 0, 100)
   y1 - x + rnorm(100, mean=1, sd=15)
   y2 - y1 + rnorm(100, sd=5)

   df - data.frame( y=c(y1, y2),
 x=c(x,x),
 subject=factor(rep( paste(p, 1:100, sep=), 2 ) ))

   library(nlme)
   summary( lme( y ~ x, random = ~ 1 | subject, data=df ) )

 Try reading Pinheiro and Bates (http://tinyurl.com/yvvrr7) or related
 material for more information. Hope this helps.
 
 I'm not interested in a mixed model, and I don't have individual data 
 points.
 
 Hadley
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread John Fox
Dear Hadley,

I think that the problem is that the term weights has different meanings,
which, although they are related, are not quite the same. 

The weights used by lm() are (inverse-)variance weights, reflecting the
variances of the errors, with observations that have low-variance errors
therefore being accorded greater weight in the resulting WLS regression.
What you have are sometimes called case weights, and I'm unaware of a
general way of handling them in R, although you could regenerate the
unaggregated data. As you discovered, you get the same coefficients with
case weights as with variance weights, but different standard errors.
Finally, there are sampling weights, which are inversely proportional to
the probability of selection; these are accommodated by the survey package. 

To complicate matters, this terminology isn't entirely standard.

I hope this helps,
 John


John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham
 Sent: Tuesday, May 08, 2007 5:09 AM
 To: R Help
 Subject: [R] Weighted least squares
 
 Dear all,
 
 I'm struggling with weighted least squares, where something 
 that I had assumed to be true appears not to be the case.  
 Take the following data set as an example:
 
 df - data.frame(x = runif(100, 0, 100)) df$y - df$x + 1 + 
 rnorm(100, sd=15)
 
 I had expected that:
 
 summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y 
 ~ x, data=rbind(df,df)))
 
 would be equivalent, but they are not.  I suspect the 
 difference is how the degrees of freedom is calculated - I 
 had expected it to be sum(weights), but seems to be 
 sum(weights  0).  This seems unintuitive to me:
 
 summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) 
 summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
 
 What am I missing?  And what is the usual way to do a linear 
 regression when you have aggregated data?
 
 Thanks,
 
 Hadley
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread S Ellison
Doubling the length of the data doubles the apparent number of observations. 
You would expect the standard error to reduce by sqrt(2) (which it just about 
does, though I'm not clear on why its not exact here)

Weights are not as simple as they look. You have given all your data the same 
weight, so the answer is independent of the weights (!). Try again with 
weights=rep(4,100) etc. Equal weights simply cancel out in the lm process. In 
fact, some linear regression algorithms rescale all weights to sum to 1; in 
others, weights are scaled to average 1; done 'naturally' the weights simply 
appear in two places which cancel out in the final covariance matrix 
calculation (eg in the weighted 'residual sd' and in the hessian for the 
chi-squared function, if I remember correctly). 

Bottom line - equal weights make no difference in lm, so choose what you like. 
1 is a good number, though.

Steve e

 hadley wickham [EMAIL PROTECTED] 08/05/2007 10:08:34 
Dear all,

I'm struggling with weighted least squares, where something that I had
assumed to be true appears not to be the case.  Take the following
data set as an example:


***
This email and any attachments are confidential. Any use, co...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread S Ellison
Hadley,

You asked
 .. what is the usual way to do a linear 
 regression when you have aggregated data?

Least squares generally uses inverse variance weighting. For aggregated data 
fitted as mean values, you just need the variances for the _means_. 

So if you have individual means x_i and sd's s_i that arise from aggregated 
data with n_i observations in group i, the natural weighting is by inverse 
squared standard error of the mean. The appropriate weight for x_i would then 
be n_i/(s_i^2). In R, that's n/(s^2), as n and s would be vectors with the same 
length as x. If all the groups had the same variance, or nearly so, s is a 
scalar; if they have the same number of observations, n is a scalar. 

Of course, if they have the same variance and same number of observations, they 
all have the same weight and you needn't weight them at all: see previous 
posting!

Steve E



***
This email and any attachments are confidential. Any use, co...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-05-08 Thread Adaikalavan Ramasamy
http://en.wikipedia.org/wiki/Weighted_least_squares gives a formulaic 
description of what you have said.

I believe the original poster has converted something like this

y x
0   1.1
0   2.2
0   2.2
0   2.2
1   3.3
1   3.3
2   4.4
 ...

into something like the following

y x freq
0   1.11
0   2.23
1   3.32
2   4.41
 ...

Now, the variance of means of each row in table above is ZERO because 
the individual elements that comprise each row are identical. Therefore 
your method of using inverse-variance will not work here.

Then is it valid then to use lm( y ~ x, weights=freq ) ?

Regards, Adai



S Ellison wrote:
 Hadley,
 
 You asked
 .. what is the usual way to do a linear 
 regression when you have aggregated data?
 
 Least squares generally uses inverse variance weighting. For aggregated data 
 fitted as mean values, you just need the variances for the _means_. 
 
 So if you have individual means x_i and sd's s_i that arise from aggregated 
 data with n_i observations in group i, the natural weighting is by inverse 
 squared standard error of the mean. The appropriate weight for x_i would then 
 be n_i/(s_i^2). In R, that's n/(s^2), as n and s would be vectors with the 
 same length as x. If all the groups had the same variance, or nearly so, s is 
 a scalar; if they have the same number of observations, n is a scalar. 
 
 Of course, if they have the same variance and same number of observations, 
 they all have the same weight and you needn't weight them at all: see 
 previous posting!
 
 Steve E
 
 
 
 ***
 This email and any attachments are confidential. Any use, co...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weighted Least Squares mit glm

2005-10-23 Thread v . schlecht
Hallo all,

I have a question concerning the weights used in the glm function.

I need to build a linear model (family=gaussian) with only one regressor. Sadly 
I have only 6 different sets:

y_i=alpha+beta*x_i , i=1,2,3,4,5.

i=1,2,3,4,5 has been observed 60 times, while i=6 has only been observed 30 
times. This is why I wanted to use a weighted least squares approach instead of 
least squares. 

I used the weight w1-c(1,1,1,1,1,0.5).

Can anybody tell me if this is correct?

THX. 

Machen Sie aus 14 Cent spielend bis zu 100 Euro!
Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot.
http://www.arcor.de/rd/emf-gaming-1

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Weighted least squares

2005-01-18 Thread Prof Brian Ripley
On Tue, 18 Jan 2005, Prof Brian Ripley wrote:
On Mon, 17 Jan 2005, Ming Hsu wrote:
I would like to run a weighted least squares with the the weighting matrix 
W.
This is generalized not weighted least squares if W really is a matrix and 
not a vector of case-by-case weights.

I ran the following two regressions,
(W^-1)Y = Xb + e
Y = WXb+ We
If W is a diagonal matrix, the weights for the second are W^(-2) and you used 
W below.

In both cases, E[bhat] = b.
I used the following commands in R
lm1 - lm(Y/W ~ X)
lm2 - lm(Y ~ W:X, weights = W)
where
Y - rnorm(10,1)
X - Y + rnorm(10,1)
W - 1:10
That W is not a matrix!
In lm2, I believe W is applied to the error term, resulting in WLS. 
However
the estimated coefficients in lm1 and lm2 are really different.  I tried 
glm
as well, but the result was the same as lm.

Any advice would be appreciated,
Please do check that you supply an example that agrees with your words.
Another difference is that your equations have no intercept, but the lm 
calls do, so you need to add +0 to their rhs.

Use lm.gls in MASS or gls in nlme for generalized least squares, if that is 
what you meant.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Weighted least squares

2005-01-17 Thread Ming Hsu
Hi,

I would like to run a weighted least squares with the the weighting matrix
W.  I ran the following two regressions,

(W^-1)Y = Xb + e
Y = WXb+ We

In both cases, E[bhat] = b.

I used the following commands in R

lm1 - lm(Y/W ~ X)
lm2 - lm(Y ~ W:X, weights = W)

where 

Y - rnorm(10,1)
X - Y + rnorm(10,1)
W - 1:10

In lm2, I believe W is applied to the error term, resulting in WLS.  However
the estimated coefficients in lm1 and lm2 are really different.  I tried glm
as well, but the result was the same as lm.

Any advice would be appreciated,

Hsu Ming

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Weighted least squares

2005-01-17 Thread Prof Brian Ripley
On Mon, 17 Jan 2005, Ming Hsu wrote:
I would like to run a weighted least squares with the the weighting 
matrix W.
This is generalized not weighted least squares if W really is a matrix and 
not a vector of case-by-case weights.

I ran the following two regressions,
(W^-1)Y = Xb + e
Y = WXb+ We
If W is a diagonal matrix, the weights for the second are W^(-2) and you 
used W below.

In both cases, E[bhat] = b.
I used the following commands in R
lm1 - lm(Y/W ~ X)
lm2 - lm(Y ~ W:X, weights = W)
where
Y - rnorm(10,1)
X - Y + rnorm(10,1)
W - 1:10
That W is not a matrix!
In lm2, I believe W is applied to the error term, resulting in WLS.  However
the estimated coefficients in lm1 and lm2 are really different.  I tried glm
as well, but the result was the same as lm.
Any advice would be appreciated,
Please do check that you supply an example that agrees with your words.
Use lm.gls in MASS or gls in nlme for generalized least squares, if that 
is what you meant.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] weighted least squares

2003-03-28 Thread Paul, David A

I apologize, in advance, for cross-posting this to 
the R listserv.  I have submitted this query twice 
to the S listserv (yesterday and this morning)and 
neither post has made it, not sure why.

When I run the code

gls.1 - 
gls(y ~ x, data = foo.frame, 
weights = varPower(form = ~ fitted(.)|group),
control = list(maxIter = 500, msMaxIter = 500), 
verbose = TRUE)

I get the message

Warning messages:
a database with name controlvals already in search 
list: access by where=controlvals may not work in: 
attach(controlvals)

and Splus tells me that Maximum number of iterations
reached without convergence.

What's going on?  I don't remember ever attaching a
database named controlvals.


Respectfully  Much Thanks in Advance,

 David Paul

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help