Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?
What Frank was trying to tell you is that the p-values don't have much meaning if you do stepwise regression (sometimes they are worse than useless). The p-values are computed based on certain assumptions, once you remove a variable because it is "Not Significant", then recompute, those assumptions no longer hold, so the p-values are not answering the question that you are asking. I remember the 1st time I read about this and had the knee jerk reaction that stepwise regression was useful based mainly on having learned it from a text book and used it several times to get something that looked good. But my personal epiphany came when I asked myself the question "What question does stepwise regression answer?". I still have not found the answer (question) to that question, but I have determined that none of the questions that I am interested in answering fit. With modern tools (R as an example) there are better tools (actually correct) for answering the questions that used to be answered with stepwise regression, it is better to use those tools. Which tool is best depends on what question you are actually interested in answering. Stepwise procedures continue to be taught, but mostly due to historical inertia (well I learned it when I took regression), but things are shifting away from it now (it should probably still be mentioned so that new graduates can still get jobs when asked about it in interviews, and as history not to be repeated). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of shubha > Sent: Monday, November 22, 2010 3:10 PM > To: r-help@r-project.org > Subject: Re: [R] how do remove those predictor which have p value > greater than 0.05 in GLM? > > > Thanks for the response, Frank. > I am not saying that I want to delete a variables because of p>0.5. But > my > concern was: I am using backward stepwise logistic regression, it keeps > the > variables in the final model if the variable significantly contributing > in > the model. Otherwise, it should not be in the final model. > Using other software, they give correct results. But R, did not. I want > those variables if p<0.05, otherwise exclude from the model. If you > include > that variables, it will affect the Log likelihood ratio and AIC. I want > to > change a P-value criterion <=0.05 in the model. Any suggestions. > thanks > > -- > View this message in context: http://r.789695.n4.nabble.com/how-do- > remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM- > tp3053921p3054540.html > Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?
On Nov 22, 2010, at 5:10 PM, shubha wrote: Thanks for the response, Frank. I am not saying that I want to delete a variables because of p>0.5. Presumably that was meant to be p > 0.05 But my concern was: I am using backward stepwise logistic regression, it keeps the variables in the final model if the variable significantly contributing in the model. Isn't that what backwards selection does? Otherwise, it should not be in the final model. You're sure? How did you arrive at that conclusion? Using other software, they give correct results. Correct? Please describe your standards for correctness. But R, did not. I want those variables if p<0.05, otherwise exclude from the model. But you said above that was _not_ what you wanted. I'm confused about your posture here. If you include that variables, it will affect the Log likelihood ratio and AIC. Yes, perhaps it will, ... so is the standard a p-value or is a penalized penalized estimate? When you toss out a variable, you are deluding yourself to then later ignore that act of deletion when specifying your degrees of freedom for the multiple hypothesis testing effort you have conducting. I want to change a P-value criterion <=0.05 in the model. Any suggestions. More reading. Less reliance on canned software. thanks -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?
Thanks for the response, Frank. I am not saying that I want to delete a variables because of p>0.5. But my concern was: I am using backward stepwise logistic regression, it keeps the variables in the final model if the variable significantly contributing in the model. Otherwise, it should not be in the final model. Using other software, they give correct results. But R, did not. I want those variables if p<0.05, otherwise exclude from the model. If you include that variables, it will affect the Log likelihood ratio and AIC. I want to change a P-value criterion <=0.05 in the model. Any suggestions. thanks -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054540.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?
What would make you want to delete a variable because P > 0.05? That will invalidate every aspect of statistical inference for the model. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how do remove those predictor which have p value greater than 0.05 in GLM?
Hi R user, I am a kind of an intermediate user of R. Now I am using GLM model (library MASS, VEGUS). I used a backward stepwise logistic regression, but i got a problem in removing those predictors which are above 0.05. I don't want to include those variables which were above 0.05 in final backward stepwise logetsic regression model. for example: first I run the model, "name<-glm(dep~env1+env2..., family= binomial, data=new)" after that, I did stepwise for name name.step<-step(name, direction="backward") here, I still got those variables which were not significant, for example: secchi was not significant (see below example), but still it was in the model. how can I remove those variables which are not significant in forward/backward stepwise?. another question, when I wrote direction="backward", I got the results same as in the process of "forward". It is really strange. why is it same results for backward and forward. I checked in other two statistical software (Statistica and SYSTAT), they provided a correct results, I think. But, I need to use R for further analysis, therefore I need to fix the problem. I am spending so much time to figure it out, but I could not. could you please give your suggestions. It would be really a great help. please see the example of retaining predictors which have p value is greater that 0.05 after stepwise logistic regression. Thank Shubha Pandit, PhD University of Windsor Windsor, ON, Canada > summary(step.glm.int.ag1) Call: glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI + GEARTEMP:SECCHI + DOGEAR:SECCHI + GEARTEMP:DOGEAR + GEARTEMP:GEARDEPTH + DOGEAR:GEARDEPTH, family = binomial, data = training) Deviance Residuals: Min 1Q Median 3Q Max -2.1983 -0.8272 -0.4677 0.8014 2.6502 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.231623 1.846593 1.750 0.080110 . GEARTEMP -0.004408 0.085254 -0.052 0.958761 DOGEAR -0.732805 0.182285 -4.020 5.82e-05 *** GEARDEPTH -0.249237 0.060825 -4.098 4.17e-05 *** SECCHI 0.311875 0.297594 1.048 0.294645 GEARTEMP:SECCHI-0.080664 0.010079 -8.003 1.21e-15 *** DOGEAR:SECCHI 0.066555 0.022181 3.000 0.002695 ** GEARTEMP:DOGEAR 0.030988 0.008907 3.479 0.000503 *** GEARTEMP:GEARDEPTH 0.008856 0.002122 4.173 3.01e-05 *** DOGEAR:GEARDEPTH0.006680 0.004483 1.490 0.136151 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3389.5 on 2751 degrees of freedom Residual devia\ n\ ce: 2720.4 on 2742 degrees of freedom AIC: 2740.4uh Number of Fisher Scoring iterations: 6 == > glm.int.ag1<-glm(ag1less~GEARTEMP+DOGEAR+GEARDEPTH+SECCHI+SECCHI*GEARTEMP+SECCHI*DOGEAR+SECCHI*GEARDEPTH+GEARTEMP*DOGEAR+GEARTEMP*GEARDEPTH+GEARDEPTH*DOGEAR,data=training, > family=binomial) > summary(glm.int.ag1) Call: glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI + SECCHI * GEARTEMP + SECCHI * DOGEAR + SECCHI * GEARDEPTH + GEARTEMP * DOGEAR + GEARTEMP * GEARDEPTH + GEARDEPTH * DOGEAR, family = binomial, data = training) Deviance Residuals: Min 1Q Median 3Q Max -2.1990 -0.8287 -0.4668 0.8055 2.6673 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.909805 1.928375 1.509 0.131314 GEARTEMP0.005315 0.087159 0.061 0.951379 DOGEAR -0.721864 0.183708 -3.929 8.52e-05 *** GEARDEPTH -0.235961 0.064828 -3.640 0.000273 *** SECCHI 0.391445 0.326542 1.199 0.230622 GEARTEMP:SECCHI-0.082296 0.010437 -7.885 3.14e-15 *** DOGEAR:SECCHI 0.065572 0.022319 2.938 0.003305 ** GEARDEPTH:SECCHI -0.003176 0.005295 -0.600 0.548675 GEARTEMP:DOGEAR 0.030571 0.008961 3.412 0.000646 *** GEARTEMP:GEARDEPTH 0.008692 0.002159 4.027 5.66e-05 *** DOGEAR:GEARDEPTH0.006544 0.004495 1.456 0.145484 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3389.5 on 2751 degrees of freedom Residual deviance: 2720.0 on 2741 degrees of freedom AIC: 2742 Number of Fisher Scoring iterations: 6 -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3053921.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.