Re: [R] extend summary.lm for hccm?

2009-01-04 Thread Achim Voß

Dear John (and other readers of this mailing list),


thanks for your help. It now raises two further questions, one directly 
related to R and probably easy to answer, the other one a little off-topic.


John Fox wrote:

... (BTW, one
would not normally call summary.lm() directly, but rather use the
generic summary() function instead.) ...


Is there any difference for R between using summary.lm() and using 
summary()? Or is it just that in the second case, R recognizes that the 
input is lm and then calls summary.lm()?



That said, it's hard for me to understand why it's interesting to have
standard errors for the individual coefficients of a high-degree
polynomial, and I'd also be concerned about the sensibleness of fitting
a fifth-degree polynomial in the first place.


I am trying to estimate some Engel curves - functions of the 
relationship between income of a household and the demand share of 
certain goods. As I want to estimate them for only one good, the only 
restriction that arises from Gorman (1981) seems to be that in a pure 
Engel curve model (including only income and the demand share) the 
income share should be the sum of some multiplications of polynomials of 
the natural logarithm of the income.


I have not yet found a theoretical reason for a limit to the number of 
polynomials and I know to little maths to say if it's impossible to 
estimate the influence of x^5 if you've already included x to x^4. So I 
thought I might just compare different models with different numbers of 
polynomials using information criteria like Amemiya's Prediction Criterion.


I guess using x^1 to x^5 it will be hardly possible to estimate the 
influence of a single one of these five polynomials as each one of them 
could be approximated using the other four, but where to draw the line? 
So if anybody could tell me where to read how many polynomials to 
include at most, I'd be grateful.



Regards,
Achim



Gorman (1981) is: Gorman, W. M. (1981), "Some Engel Curves," in Essays 
in the Theory and Measurement of Consumer Behaviour in Honor of Sir 
Richard Stone


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extend summary.lm for hccm?

2009-01-03 Thread John Fox
Dear Achim,

I suspect that the problem, involving a fifth-degree raw polynomial, is
very ill-conditioned, and that the computation in linear.hypothesis()
fails because it is not as stable as lm() and summary.lm(). (BTW, one
would not normally call summary.lm() directly, but rather use the
generic summary() function instead.) A possible solution would be to
use a fifth-degree orthogonal polynomial, with the formula energyshare
~ poly(x.log, 5). (You don't need the 1 for the constant, since it's
implied.)

That said, it's hard for me to understand why it's interesting to have
standard errors for the individual coefficients of a high-degree
polynomial, and I'd also be concerned about the sensibleness of fitting
a fifth-degree polynomial in the first place.

I hope this helps,
 John


John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


 original message --
Achim Voß achim.voss at uni-muenster.de
Tue Dec 30 14:15:37 CET 2008

Hi!


I am trying to estimate Engel curves using a big sample (>42,000) using
lm and
taking heteroskedasticity into account by using the summaryHCCM posted
here by
John Fox (Mon Dec 25 16:01:59 CET 2006).

Having used the SIC (with MASS stepAIC) to determine how many powers to
use I
estimate the model:

> # =
> summary.lm(fit.lm.5)

Call:
lm(formula = energyshare ~ 1 + I(x.log) + I(x.log^2) + I(x.log^3) +
I(x.log^4) + I(x.log^5), data = ev)

Residuals:
  Min1QMedian3Q   Max
-0.098819 -0.023682 -0.007043  0.013924  0.486615

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.452e+01  1.234e+01  -4.418 9.97e-06 ***
I(x.log) 3.177e+01  6.966e+00   4.560 5.13e-06 ***
I(x.log^2)  -7.330e+00  1.567e+00  -4.677 2.93e-06 ***
I(x.log^3)   8.395e-01  1.757e-01   4.778 1.78e-06 ***
I(x.log^4)  -4.775e-02  9.814e-03  -4.865 1.15e-06 ***
I(x.log^5)   1.079e-03  2.185e-04   4.939 7.90e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03748 on 42738 degrees of freedom
Multiple R-squared: 0.09236,Adjusted R-squared: 0.09226
F-statistic: 869.8 on 5 and 42738 DF,  p-value: < 2.2e-16
> # =




Now I use summaryHCCM(fit.lm.5):

> # =
> summaryHCCM(fit.lm.5)
Fehler in solve.default(L %*% V %*% t(L)) :
  System ist für den Rechner singulär: reziproke Konditionszahl =
6.98689e-19
> # =

("Error in solve.default(L %*% V %*% t(L)) :
  System is singulary for the computer: reciprocal number of conditions
=
  6.98689e-19")



This does not happen if I omit I(x.log^5). I do not know what it means
and I'd
be grateful if anyone could help. And I'd like to add a (more or less)
related
question: Can I still use AIC, SIC etc. if I know there's a
heteroskedasticity
problem?


Thanks in advance,
Achim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extend summary.lm for hccm?

2008-12-30 Thread Achim Voß
Hi!


I am trying to estimate Engel curves using a big sample (>42,000) using lm and
taking heteroskedasticity into account by using the summaryHCCM posted here by
John Fox (Mon Dec 25 16:01:59 CET 2006).

Having used the SIC (with MASS stepAIC) to determine how many powers to use I
estimate the model:

> # =
> summary.lm(fit.lm.5)

Call:
lm(formula = energyshare ~ 1 + I(x.log) + I(x.log^2) + I(x.log^3) +
I(x.log^4) + I(x.log^5), data = ev)

Residuals:
  Min1QMedian3Q   Max
-0.098819 -0.023682 -0.007043  0.013924  0.486615

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.452e+01  1.234e+01  -4.418 9.97e-06 ***
I(x.log) 3.177e+01  6.966e+00   4.560 5.13e-06 ***
I(x.log^2)  -7.330e+00  1.567e+00  -4.677 2.93e-06 ***
I(x.log^3)   8.395e-01  1.757e-01   4.778 1.78e-06 ***
I(x.log^4)  -4.775e-02  9.814e-03  -4.865 1.15e-06 ***
I(x.log^5)   1.079e-03  2.185e-04   4.939 7.90e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03748 on 42738 degrees of freedom
Multiple R-squared: 0.09236,Adjusted R-squared: 0.09226
F-statistic: 869.8 on 5 and 42738 DF,  p-value: < 2.2e-16
> # =




Now I use summaryHCCM(fit.lm.5):

> # =
> summaryHCCM(fit.lm.5)
Fehler in solve.default(L %*% V %*% t(L)) :
  System ist für den Rechner singulär: reziproke Konditionszahl = 6.98689e-19
> # =

("Error in solve.default(L %*% V %*% t(L)) :
  System is singulary for the computer: reciprocal number of conditions =
  6.98689e-19")



This does not happen if I omit I(x.log^5). I do not know what it means and I'd
be grateful if anyone could help. And I'd like to add a (more or less) related
question: Can I still use AIC, SIC etc. if I know there's a heteroskedasticity
problem?


Thanks in advance,
Achim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.