Re: [R] extend summary.lm for hccm?
Dear John (and other readers of this mailing list), thanks for your help. It now raises two further questions, one directly related to R and probably easy to answer, the other one a little off-topic. John Fox wrote: ... (BTW, one would not normally call summary.lm() directly, but rather use the generic summary() function instead.) ... Is there any difference for R between using summary.lm() and using summary()? Or is it just that in the second case, R recognizes that the input is lm and then calls summary.lm()? That said, it's hard for me to understand why it's interesting to have standard errors for the individual coefficients of a high-degree polynomial, and I'd also be concerned about the sensibleness of fitting a fifth-degree polynomial in the first place. I am trying to estimate some Engel curves - functions of the relationship between income of a household and the demand share of certain goods. As I want to estimate them for only one good, the only restriction that arises from Gorman (1981) seems to be that in a pure Engel curve model (including only income and the demand share) the income share should be the sum of some multiplications of polynomials of the natural logarithm of the income. I have not yet found a theoretical reason for a limit to the number of polynomials and I know to little maths to say if it's impossible to estimate the influence of x^5 if you've already included x to x^4. So I thought I might just compare different models with different numbers of polynomials using information criteria like Amemiya's Prediction Criterion. I guess using x^1 to x^5 it will be hardly possible to estimate the influence of a single one of these five polynomials as each one of them could be approximated using the other four, but where to draw the line? So if anybody could tell me where to read how many polynomials to include at most, I'd be grateful. Regards, Achim Gorman (1981) is: Gorman, W. M. (1981), "Some Engel Curves," in Essays in the Theory and Measurement of Consumer Behaviour in Honor of Sir Richard Stone __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extend summary.lm for hccm?
Dear Achim, I suspect that the problem, involving a fifth-degree raw polynomial, is very ill-conditioned, and that the computation in linear.hypothesis() fails because it is not as stable as lm() and summary.lm(). (BTW, one would not normally call summary.lm() directly, but rather use the generic summary() function instead.) A possible solution would be to use a fifth-degree orthogonal polynomial, with the formula energyshare ~ poly(x.log, 5). (You don't need the 1 for the constant, since it's implied.) That said, it's hard for me to understand why it's interesting to have standard errors for the individual coefficients of a high-degree polynomial, and I'd also be concerned about the sensibleness of fitting a fifth-degree polynomial in the first place. I hope this helps, John John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ original message -- Achim Voß achim.voss at uni-muenster.de Tue Dec 30 14:15:37 CET 2008 Hi! I am trying to estimate Engel curves using a big sample (>42,000) using lm and taking heteroskedasticity into account by using the summaryHCCM posted here by John Fox (Mon Dec 25 16:01:59 CET 2006). Having used the SIC (with MASS stepAIC) to determine how many powers to use I estimate the model: > # = > summary.lm(fit.lm.5) Call: lm(formula = energyshare ~ 1 + I(x.log) + I(x.log^2) + I(x.log^3) + I(x.log^4) + I(x.log^5), data = ev) Residuals: Min1QMedian3Q Max -0.098819 -0.023682 -0.007043 0.013924 0.486615 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.452e+01 1.234e+01 -4.418 9.97e-06 *** I(x.log) 3.177e+01 6.966e+00 4.560 5.13e-06 *** I(x.log^2) -7.330e+00 1.567e+00 -4.677 2.93e-06 *** I(x.log^3) 8.395e-01 1.757e-01 4.778 1.78e-06 *** I(x.log^4) -4.775e-02 9.814e-03 -4.865 1.15e-06 *** I(x.log^5) 1.079e-03 2.185e-04 4.939 7.90e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.03748 on 42738 degrees of freedom Multiple R-squared: 0.09236,Adjusted R-squared: 0.09226 F-statistic: 869.8 on 5 and 42738 DF, p-value: < 2.2e-16 > # = Now I use summaryHCCM(fit.lm.5): > # = > summaryHCCM(fit.lm.5) Fehler in solve.default(L %*% V %*% t(L)) : System ist für den Rechner singulär: reziproke Konditionszahl = 6.98689e-19 > # = ("Error in solve.default(L %*% V %*% t(L)) : System is singulary for the computer: reciprocal number of conditions = 6.98689e-19") This does not happen if I omit I(x.log^5). I do not know what it means and I'd be grateful if anyone could help. And I'd like to add a (more or less) related question: Can I still use AIC, SIC etc. if I know there's a heteroskedasticity problem? Thanks in advance, Achim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extend summary.lm for hccm?
Hi! I am trying to estimate Engel curves using a big sample (>42,000) using lm and taking heteroskedasticity into account by using the summaryHCCM posted here by John Fox (Mon Dec 25 16:01:59 CET 2006). Having used the SIC (with MASS stepAIC) to determine how many powers to use I estimate the model: > # = > summary.lm(fit.lm.5) Call: lm(formula = energyshare ~ 1 + I(x.log) + I(x.log^2) + I(x.log^3) + I(x.log^4) + I(x.log^5), data = ev) Residuals: Min1QMedian3Q Max -0.098819 -0.023682 -0.007043 0.013924 0.486615 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.452e+01 1.234e+01 -4.418 9.97e-06 *** I(x.log) 3.177e+01 6.966e+00 4.560 5.13e-06 *** I(x.log^2) -7.330e+00 1.567e+00 -4.677 2.93e-06 *** I(x.log^3) 8.395e-01 1.757e-01 4.778 1.78e-06 *** I(x.log^4) -4.775e-02 9.814e-03 -4.865 1.15e-06 *** I(x.log^5) 1.079e-03 2.185e-04 4.939 7.90e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.03748 on 42738 degrees of freedom Multiple R-squared: 0.09236,Adjusted R-squared: 0.09226 F-statistic: 869.8 on 5 and 42738 DF, p-value: < 2.2e-16 > # = Now I use summaryHCCM(fit.lm.5): > # = > summaryHCCM(fit.lm.5) Fehler in solve.default(L %*% V %*% t(L)) : System ist für den Rechner singulär: reziproke Konditionszahl = 6.98689e-19 > # = ("Error in solve.default(L %*% V %*% t(L)) : System is singulary for the computer: reciprocal number of conditions = 6.98689e-19") This does not happen if I omit I(x.log^5). I do not know what it means and I'd be grateful if anyone could help. And I'd like to add a (more or less) related question: Can I still use AIC, SIC etc. if I know there's a heteroskedasticity problem? Thanks in advance, Achim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.