Dear R., The problem you constructed is too ill-conditioned for the method that Anova() uses to compute type-II sums of squares and the associated degrees of freedom, with an immense condition number of the coefficient covariance matrix:
> library(car) Loading required package: carData > mod <- lm(prestige ~ women * type * income * education, data=Prestige) > e <- eigen(vcov(mod))$values > max(e)/min(e) [1] 2.776205e+17 Simply centering the numerical predictors reduces the condition number by a factor of 10^3, which allows Anova() to work, even though the problem is still extremely ill-conditioned: > Prestige.c <- within(Prestige, { + income <- income - mean(income) + education <- education - mean(education) + women <- women - mean(women) + }) > mod.c <- lm(prestige ~ women * type * income * education, data=Prestige.c) > e.c <- eigen(vcov(mod.c))$values > max(e)/min(e) [1] 2.776205e+17 > Anova(mod.c) Anova Table (Type II tests) Response: prestige Sum Sq Df F value Pr(>F) women 167.29 1 4.9516 0.0291142 * type 744.30 2 11.0150 6.494e-05 *** income 789.00 1 23.3529 7.112e-06 *** education 699.54 1 20.7050 2.057e-05 *** women:type 140.32 2 2.0766 0.1326023 women:income 33.14 1 0.9807 0.3252424 type:income 653.40 2 9.6697 0.0001859 *** women:education 30.36 1 0.8986 0.3462316 type:education 0.72 2 0.0107 0.9893462 income:education 7.88 1 0.2331 0.6306681 women:type:income 136.80 2 2.0245 0.1393087 women:type:education 140.18 2 2.0745 0.1328633 women:income:education 100.42 1 2.9722 0.0888832 . type:income:education 82.02 2 1.2138 0.3029069 women:type:income:education 2.05 2 0.0303 0.9701334 Residuals 2500.16 74 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > mod.c.2 <- update(mod.c, . ~ . - women:type:income:education) > sum(residuals(mod.c.2)^2) - sum(residuals(mod.c)^2) [1] 2.049735 Beyond demonstrating that the algorithm that Anova() uses can be made to fail if the coefficient covariance matrix is sufficiently ill-conditioned problem, I’m not sure what the point of this is. I suppose that we could try to detect this condition, which falls in the small region between where lm() detects a singularity and the projections used by Anova() break down. Best, John ------------------------------------------------- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox > On Dec 5, 2018, at 7:33 PM, Ramon Diaz-Uriarte <rdia...@gmail.com> wrote: > > > Dear All, > > I do not understand the degrees of freedom returned by car::Anova under > some models. They seem to be too many (e.g., numerical variables getting > more than 1 df, factors getting more df than levels there are). > > This is a reproducible example: > > library(car) > data(Prestige) > > ## Make sure no issues from NAs in comparisons of SS below > prestige_nona <- na.omit(Prestige) > > Anova(lm(prestige ~ women * type * income * education, > data = prestige_nona)) > > ## Notice how women, a numerical variable, has 3 df > ## and type (factor with 3 levels) has 4 df. > > > ## In contrast this seems to get the df right: > Anova(lm(prestige ~ women * type * income * education, > data = prestige_nona), type = "III") > > ## And also gives the df I'd expect > anova(lm(prestige ~ women * type * income * education, > data = prestige_nona)) > > > > ## Type II SS for women in the above model I do not understand either. > m_1 <- lm(prestige ~ type * income * education, data = prestige_nona) > m_2 <- lm(prestige ~ type * income * education + women, data = prestige_nona) > ## Does not match women SS > sum(residuals(m_1)^2) - sum(residuals(m_2)^2) > > ## See [1] below for examples where they match. > > > Looking at the code, I do not understand what the call from > linearHypothesis returns here (specially compared to other models), and the > problem seems to be in the return from ConjComp, possibly due to the the > vcov of the model? (But this is over my head). > > > I understand this is not a reasonable model to fit, and there are possibly > serious collinearity problems. But I was surprised by the dfs in the > absence of any warning of something gone wrong. So I think there is > something very basic I do not understand. > > > > Thanks, > > > R. > > > [1] In contrast, in other models I see what I'd expect. For example: > > ## 1 df for women, 2 for type > Anova(lm(prestige ~ type * income * women, data = prestige_nona)) > m_1 <- lm(prestige ~ type * income, data = prestige_nona) > m_2 <- lm(prestige ~ type * income + women, data = prestige_nona) > ## Type II SS for women > sum(residuals(m_1)^2) - sum(residuals(m_2)^2) > > ## 1 df for women, income, education > Anova(lm(prestige ~ education * income * women, data = prestige_nona)) > m_1 <- lm(prestige ~ education * income, data = prestige_nona) > m_2 <- lm(prestige ~ education * income + women, data = prestige_nona) > ## Type II SS for women > sum(residuals(m_1)^2) - sum(residuals(m_2)^2) > > > > > -- > Ramon Diaz-Uriarte > Department of Biochemistry, Lab B-25 > Facultad de Medicina > Universidad Autónoma de Madrid > Arzobispo Morcillo, 4 > 28029 Madrid > Spain > > Phone: +34-91-497-2412 > > Email: rdia...@gmail.com > ramon.d...@iib.uam.es > > http://ligarto.org/rdiaz > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.