[R] Stepwise logistic regression with significance testing - stepAIC

2009-05-05 Thread Peter-Heinz Fox
Hello R-Users,
 
I have one binary dependent variable and a set of independent variables 
(glm(formula,…,family=”binomial”) ) and I am using the function stepAIC 
(“MASS”) for choosing an optimal model. However I am not sure if stepAIC 
considers significance properties like Likelihood ratio test and Wald test (see 
example below).  
 
 y - rbinom(30,1,0.4)
 x1 - rnorm(30)
 x2 - rnorm(30)
 x3 - rnorm(30)
 xdata - data.frame(x1,x2,x3)
 
 fit1 - glm(y~ . ,family=binomial,data=xdata)
 stepAIC(fit1,trace=FALSE)
 
Call:  glm(formula = y ~ x3, family = binomial, data = xdata) 
 
Coefficients:
(Intercept)           x3  
    -0.3556       0.8404  
 
Degrees of Freedom: 29 Total (i.e. Null);  28 Residual
Null Deviance:      40.38 
Residual Deviance: 37.86        AIC: 41.86 
 
 fit - glm( stepAIC(fit1,trace=FALSE)$formula  ,family=binomial)
 my.summ - summary(fit)
 # Wald Test 
 print(my.summ$coeff[,4])
(Intercept)          x3 
  0.3609638   0.1395215 
 
 my.anova - anova(fit,test=Chisq)
 #LR Test
 print(my.anova$P[2])
[1] 0.1121783
  
 
Is there an alternative function or a possible way of checking if the added 
variable and the new model are significant within the regression steps? 
 
Thanks in advance for your help
 
Regards
 
Peter-Heinz Fox



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stepwise logistic Regression with significance testing - stepAIC

2009-05-05 Thread Peter-Heinz Fox
Hello R-Users,
 
I have one binary dependent variable and a set of independent variables 
(glm(formula,…,family=”binomial”) ) and I am using the function stepAIC 
(“MASS”) for choosing an optimal model. However I am not sure if stepAIC 
considers significance properties like Likelihood ratio test and Wald test (see 
example below).  
 
 y - rbinom(30,1,0.4)
 x1 - rnorm(30)
 x2 - rnorm(30)
 x3 - rnorm(30)
 xdata - data.frame(x1,x2,x3)
 
 fit1 - glm(y~ . ,family=binomial,data=xdata)
 stepAIC(fit1,trace=FALSE)
 
Call:  glm(formula = y ~ x3, family = binomial, data = xdata) 
 
Coefficients:
(Intercept)           x3  
    -0.3556       0.8404  
 
Degrees of Freedom: 29 Total (i.e. Null);  28 Residual
Null Deviance:      40.38 
Residual Deviance: 37.86        AIC: 41.86 
 
 fit - glm( stepAIC(fit1,trace=FALSE)$formula  ,family=binomial)
 my.summ - summary(fit)
 # Wald Test 
 print(my.summ$coeff[,4])
(Intercept)          x3 
  0.3609638   0.1395215 
 
 my.anova - anova(fit,test=Chisq)
 #LR Test
 print(my.anova$P[2])
[1] 0.1121783

 
 
Is there an alternative function or a possible way of checking if the added 
variable and the new model are significant within the regression steps? 
 
Thanks in advance for your help
 
Regards
 
Peter-Heinz Fox


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stepwise logistic regression with significance testing - stepAIC

2009-05-05 Thread Peter-Heinz Fox
Hello R-Users,
 
I have one binary dependent variable and a set of independent variables 
(glm(formula,…,family=”binomial”) ) and I am using the function stepAIC 
(“MASS”) for choosing an optimal model. However I am not sure if stepAIC 
considers significance properties like Likelihood ratio test and Wald test (see 
example below).  
 
 y - rbinom(30,1,0.4)
 x1 - rnorm(30)
 x2 - rnorm(30)
 x3 - rnorm(30)
 xdata - data.frame(x1,x2,x3)
 
 fit1 - glm(y~ . ,family=binomial,data=xdata)
 stepAIC(fit1,trace=FALSE)
 
Call:  glm(formula = y ~ x3, family = binomial, data = xdata) 
 
Coefficients:
(Intercept)           x3  
    -0.3556       0.8404  
 
Degrees of Freedom: 29 Total (i.e. Null);  28 Residual
Null Deviance:      40.38 
Residual Deviance: 37.86        AIC: 41.86 
 
 fit - glm( stepAIC(fit1,trace=FALSE)$formula  ,family=binomial)
 my.summ - summary(fit)
 # Wald Test 
 print(my.summ$coeff[,4])
(Intercept)          x3 
  0.3609638   0.1395215 
 
 my.anova - anova(fit,test=Chisq)
 #LR Test
 print(my.anova$P[2])
[1] 0.1121783
  
 
Is there an alternative function or a possible way of checking if the added 
variable and the new model are significant within the regression steps? 
 
Thanks in advance for your help
 
Regards
 
Peter-Heinz Fox



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stepwise logistic regression with significance testing - stepAIC

2009-05-05 Thread Greg Snow
There is not a meaningful alternative way since the way you propose is not 
meaningful.  The Wald tests have some know problems even in the well defined 
cases.  Both types of tests are designed to test a predefined hypothesis, not a 
conditional hypothesis on the stepwise procedure.  It is best to use other 
approaches than stepwise selection (it has been shown to give biased results) 
such as the lasso.  If you need to use stepwise, then you should bootstrap the 
entire selection process to get better estimates/standard errors.  

Frank Harrell's book and package go into more detail on this and provide some 
tools to help (as well as the other packages that can be used).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Peter-Heinz Fox
 Sent: Tuesday, May 05, 2009 8:02 AM
 To: r-help@r-project.org
 Subject: [R] Stepwise logistic regression with significance testing -
 stepAIC
 
 Hello R-Users,
 
 I have one binary dependent variable and a set of independent variables
 (glm(formula,…,family=”binomial”) ) and I am using the function stepAIC
 (“MASS”) for choosing an optimal model. However I am not sure if
 stepAIC considers significance properties like Likelihood ratio test
 and Wald test (see example below).
 
  y - rbinom(30,1,0.4)
  x1 - rnorm(30)
  x2 - rnorm(30)
  x3 - rnorm(30)
  xdata - data.frame(x1,x2,x3)
 
  fit1 - glm(y~ . ,family=binomial,data=xdata)
  stepAIC(fit1,trace=FALSE)
 
 Call:  glm(formula = y ~ x3, family = binomial, data = xdata)
 
 Coefficients:
 (Intercept)   x3
     -0.3556   0.8404
 
 Degrees of Freedom: 29 Total (i.e. Null);  28 Residual
 Null Deviance:  40.38
 Residual Deviance: 37.86    AIC: 41.86
 
  fit - glm( stepAIC(fit1,trace=FALSE)$formula  ,family=binomial)
  my.summ - summary(fit)
  # Wald Test
  print(my.summ$coeff[,4])
 (Intercept)  x3
   0.3609638   0.1395215
 
  my.anova - anova(fit,test=Chisq)
  #LR Test
  print(my.anova$P[2])
 [1] 0.1121783
 
 
 Is there an alternative function or a possible way of checking if the
 added variable and the new model are significant within the regression
 steps?
 
 Thanks in advance for your help
 
 Regards
 
 Peter-Heinz Fox
 
 
 
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stepwise logistic regression with significance testing - stepAIC

2009-05-05 Thread Dimitris Rizopoulos

Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is not meaningful.  The Wald tests have some know problems even in the well defined cases.  Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure.  It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso.  If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.  


For bootstrapping the stepAIC procedure you may have a look at package 
bootStepAIC.


Best,
Dimitris



Frank Harrell's book and package go into more detail on this and provide some 
tools to help (as well as the other packages that can be used).

Hope this helps,



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.