Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-23 Thread Greg Snow
What Frank was trying to tell you is that the p-values don't have much meaning 
if you do stepwise regression (sometimes they are worse than useless).  The 
p-values are computed based on certain assumptions, once you remove a variable 
because it is "Not Significant", then recompute, those assumptions no longer 
hold, so the p-values are not answering the question that you are asking.

I remember the 1st time I read about this and had the knee jerk reaction that 
stepwise regression was useful based mainly on having learned it from a text 
book and used it several times to get something that looked good.  But my 
personal epiphany came when I asked myself the question "What question does 
stepwise regression answer?".  I still have not found the answer (question) to 
that question, but I have determined that none of the questions that I am 
interested in answering fit.

With modern tools (R as an example) there are better tools (actually correct) 
for answering the questions that used to be answered with stepwise regression, 
it is better to use those tools.  Which tool is best depends on what question 
you are actually interested in answering.  Stepwise procedures continue to be 
taught, but mostly due to historical inertia (well I learned it when I took 
regression), but things are shifting away from it now (it should probably still 
be mentioned so that new graduates can still get jobs when asked about it in 
interviews, and as history not to be repeated).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of shubha
> Sent: Monday, November 22, 2010 3:10 PM
> To: r-help@r-project.org
> Subject: Re: [R] how do remove those predictor which have p value
> greater than 0.05 in GLM?
> 
> 
> Thanks for the response, Frank.
> I am not saying that I want to delete a variables because of p>0.5. But
> my
> concern was: I am using backward stepwise logistic regression, it keeps
> the
> variables in the final model if the variable significantly contributing
> in
> the model. Otherwise, it should not be in the final model.
> Using other software, they give correct results. But R, did not. I want
> those variables if p<0.05, otherwise exclude from the model. If you
> include
> that variables, it will affect the Log likelihood ratio and AIC. I want
> to
> change a P-value criterion <=0.05 in the model.  Any suggestions.
> thanks
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/how-do-
> remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-
> tp3053921p3054540.html
> Sent from the R help mailing list archive at Nabble.com.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-22 Thread David Winsemius


On Nov 22, 2010, at 5:10 PM, shubha wrote:



Thanks for the response, Frank.
I am not saying that I want to delete a variables because of p>0.5.


Presumably that was meant to be p > 0.05


But my
concern was: I am using backward stepwise logistic regression, it  
keeps the
variables in the final model if the variable significantly  
contributing in

the model.


Isn't that what backwards selection does?


Otherwise, it should not be in the final model.


You're sure? How did you arrive at that conclusion?


Using other software, they give correct results.


Correct? Please describe your standards for correctness.


But R, did not. I want
those variables if p<0.05, otherwise exclude from the model.


But you said above that was _not_ what you wanted. I'm confused about  
your posture here.



If you include
that variables, it will affect the Log likelihood ratio and AIC.


Yes, perhaps it will, ...  so is the standard a p-value or is a  
penalized penalized estimate? When you  toss out a variable, you are  
deluding yourself to then later ignore that act of deletion when  
specifying your degrees of freedom for the multiple hypothesis testing  
effort you have conducting.



I want to
change a P-value criterion <=0.05 in the model.  Any suggestions.


More reading. Less reliance on canned software.



thanks

--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-22 Thread shubha

Thanks for the response, Frank. 
I am not saying that I want to delete a variables because of p>0.5. But my
concern was: I am using backward stepwise logistic regression, it keeps the
variables in the final model if the variable significantly contributing in
the model. Otherwise, it should not be in the final model. 
Using other software, they give correct results. But R, did not. I want
those variables if p<0.05, otherwise exclude from the model. If you include
that variables, it will affect the Log likelihood ratio and AIC. I want to
change a P-value criterion <=0.05 in the model.  Any suggestions. 
thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054540.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-22 Thread Frank Harrell

What would make you want to delete a variable because P > 0.05?  That will
invalidate every aspect of statistical inference for the model.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-22 Thread shubha

Hi R user,
I am a kind of an intermediate user of R. Now I am using GLM model (library
MASS, VEGUS). I used  a backward stepwise logistic regression, but i got a
problem in removing  those predictors which are above 0.05. I don't want to
include those variables which were above 0.05 in final backward stepwise
logetsic regression model.

for example: first I run the model,
 "name<-glm(dep~env1+env2..., family= binomial, data=new)"

after that, I did stepwise for name

name.step<-step(name, direction="backward")

here, I still got those variables which were not significant, for example:
secchi was not significant (see below  example), but still it was in the
model. how can I remove those variables which are not significant in
forward/backward stepwise?.

another question, when I wrote direction="backward", I got the results same
as in the process of "forward". It is really strange. why is it same results
for backward and forward.  I checked in other two statistical software
(Statistica and SYSTAT), they provided a correct results, I think. But, I
need to use R for further analysis, therefore I need to fix the problem.  I
am spending so much time to figure it out, but I could not. could you please
give your suggestions. It would be really a great help. please see the
example of retaining predictors which have p value is greater that 0.05
after stepwise logistic regression.

Thank
Shubha Pandit, PhD
University of Windsor
Windsor, ON, Canada

 

> summary(step.glm.int.ag1)

Call:
glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI +
GEARTEMP:SECCHI + DOGEAR:SECCHI + GEARTEMP:DOGEAR + GEARTEMP:GEARDEPTH +
DOGEAR:GEARDEPTH, family = binomial, data = training)

Deviance Residuals:
Min   1Q   Median   3Q  Max 
-2.1983  -0.8272  -0.4677   0.8014   2.6502 

Coefficients:
Estimate Std. Error z value Pr(>|z|)   
(Intercept) 3.231623   1.846593   1.750 0.080110 . 
GEARTEMP   -0.004408   0.085254  -0.052 0.958761   
DOGEAR -0.732805   0.182285  -4.020 5.82e-05 ***
GEARDEPTH  -0.249237   0.060825  -4.098 4.17e-05 ***
SECCHI  0.311875   0.297594   1.048 0.294645   
GEARTEMP:SECCHI-0.080664   0.010079  -8.003 1.21e-15 ***
DOGEAR:SECCHI   0.066555   0.022181   3.000 0.002695 **
GEARTEMP:DOGEAR 0.030988   0.008907   3.479 0.000503 ***
GEARTEMP:GEARDEPTH  0.008856   0.002122   4.173 3.01e-05 ***
DOGEAR:GEARDEPTH0.006680   0.004483   1.490 0.136151   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3389.5  on 2751  degrees of freedom
Residual devia\
n\
ce: 2720.4  on 2742  degrees of freedom

AIC: 2740.4uh

Number of Fisher Scoring iterations: 6

==

> glm.int.ag1<-glm(ag1less~GEARTEMP+DOGEAR+GEARDEPTH+SECCHI+SECCHI*GEARTEMP+SECCHI*DOGEAR+SECCHI*GEARDEPTH+GEARTEMP*DOGEAR+GEARTEMP*GEARDEPTH+GEARDEPTH*DOGEAR,data=training,
> family=binomial)
> summary(glm.int.ag1)

Call:
glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI +
SECCHI * GEARTEMP + SECCHI * DOGEAR + SECCHI * GEARDEPTH +
GEARTEMP * DOGEAR + GEARTEMP * GEARDEPTH + GEARDEPTH * DOGEAR,
family = binomial, data = training)

Deviance Residuals:
Min   1Q   Median   3Q  Max 
-2.1990  -0.8287  -0.4668   0.8055   2.6673 

Coefficients:
Estimate Std. Error z value Pr(>|z|)   
(Intercept) 2.909805   1.928375   1.509 0.131314   
GEARTEMP0.005315   0.087159   0.061 0.951379   
DOGEAR -0.721864   0.183708  -3.929 8.52e-05 ***
GEARDEPTH  -0.235961   0.064828  -3.640 0.000273 ***
SECCHI  0.391445   0.326542   1.199 0.230622   
GEARTEMP:SECCHI-0.082296   0.010437  -7.885 3.14e-15 ***
DOGEAR:SECCHI   0.065572   0.022319   2.938 0.003305 **
GEARDEPTH:SECCHI   -0.003176   0.005295  -0.600 0.548675   
GEARTEMP:DOGEAR 0.030571   0.008961   3.412 0.000646 ***
GEARTEMP:GEARDEPTH  0.008692   0.002159   4.027 5.66e-05 ***
DOGEAR:GEARDEPTH0.006544   0.004495   1.456 0.145484   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3389.5  on 2751  degrees of freedom
Residual deviance: 2720.0  on 2741  degrees of freedom
AIC: 2742

Number of Fisher Scoring iterations: 6



-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3053921.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.