This is a question at the border between stats and r.
When I do a glm with many potential effects, and select a model using
stepAIC, many independent variables are selected even if there are no
relationship between dependent variable and the effects (all are random
numbers).
Do someone has a solution to prevent this effect ? Is it related to
Bonferoni correction ?
Is there is a ratio of independent vs number of observations that is
safe for stepAIC ?
Thanks
Marc
Example of code. When 2 independent variables are included, no effect is
selected, when 11 are included, 7 to 8 are selected.
x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)
df <- data.frame(x=x, A=A, B=B, C=C, D=D,
E=E, F=F, G=G, H=H, I=I, J=J,
K=K)
G1 <- glm(formula = x ~ A + B,
data=df, family = gaussian(link = "identity"))
g1 <- stepAIC(G1)
summary(g1)
G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
data=df, family = gaussian(link = "identity"))
g2 <- stepAIC(G2)
summary(g2)
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.