I am using the step() function to select a model using backward
elimination, with AIC as the selection criterion. The full regression
model contains three predictors, plus all the second order terms and
two-way interactions. The full model is fit via lm() using two different
model formulae. One formula uses explicitly defined variables for the
second-order and interaction terms and the other formula uses the I(x^2)
and colon operators. The fit generated by lm() is exactly the same for
both models, but when I pass these fitted models to the step() function, I
get two different results. Apparently, step() does not recognize the three
main predictors unless the second order and interaction terms are
explicitly defined as separate variables.
I assigned this problem to my first-year graduate students, not realizing
that R would give two different answers. Now I have to re-grade their
homework, but I would really like to give them a reasonable explanation for
the discrepancy.
The complete code is given below.
Could anyone shed some light on this mystery?
Thanks in advance,
Karen Keating
Kansas State University
# Exercise 9.13, Kutner, Nachtsheim, Neter & Li
temp<- scan()
49.0 45.0 36.0 45.0
55.0 30.0 28.0 40.0
85.0 11.0 16.0 42.0
32.0 30.0 46.0 40.0
26.0 39.0 76.0 43.0
28.0 42.0 78.0 27.0
95.0 17.0 24.0 36.0
26.0 63.0 80.0 42.0
74.0 25.0 12.0 52.0
37.0 32.0 27.0 35.0
31.0 37.0 37.0 55.0
49.0 29.0 34.0 47.0
38.0 26.0 32.0 28.0
41.0 38.0 45.0 30.0
12.0 38.0 99.0 26.0
44.0 25.0 38.0 47.0
29.0 27.0 51.0 44.0
40.0 37.0 32.0 54.0
31.0 34.0 40.0 36.0
dat<- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T)
colnames(dat)<-c('Y','X1','X2','X3')
dat <- data.frame(dat)
attach(dat)
# second order terms and interactions
X12<-X1*X2
X13<-X1*X3
X23<-X2*X3
X1sq <- X1^2
X2sq <- X2^2
X3sq <- X3^2
fit1 <- lm(Y~ X1sq + X2sq + X3sq +X1+X2+X3+ X12 + X13 + X23 )
fit2 <- lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3)
sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same
dim(model.matrix(fit1)) # 19 x 10
dim(model.matrix(fit2)) # 19 x 10
dim(fit1$model) # 19 x 10
dim(fit2$model) # 19 x 7 -- could this cause the discrepancy?
back1 <- step(fit1,direction='backward')
back2 <- step(fit2,direction='backward')
# Note that 'back1' considers the three primary predictors X1, X2 and X3,
# while 'back2' does not.
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.