Sorry. Let me try again then.
I am trying to find "significant" predictors" from a list of about 44 independent variables. So I started with all 44 variables and ran drop1(sep22lr, test="Chisq")... and then dropped the highest p value from the run. Then I reran the drop1. Model: MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_MST_1 + SOIL_NUTR + cE + cN + cELEV + cDIAM_125 + cCRCLS + cCULM_125 + cSPH + cAGE + cVRI_NONPINE + cVRI_nonpineCFR + cVRI_BLEAF + cvol_125 + cstrDST_SW + cwaterDST_SW + cSEEDSRCE_SW + cMAT + cMWMT + cMCMT + cTD + cMAP + cMSP + cAHM + cSHM + cMATMAP + cddless0 + cddless18 + cddgrtr0 + cddgrtr18 + cNFFD + cbFFP + ceFFP + cPAS + cDD5_100 + cEXT_Cold + cS_INDX Df Deviance AIC LRT Pr(Chi) <none> 814.21 938.21 ORG_CODE 4 824.97 940.97 10.76 0.0294100 * BECLBL08 9 845.61 951.61 31.41 0.0002519 *** PEM_SScat 10 829.11 933.11 14.90 0.1357580 SOIL_MST_1 1 814.63 936.63 0.43 0.5135094 SOIL_NUTR 2 818.49 938.49 4.28 0.1175411 cE 1 814.37 936.37 0.16 0.6886085 cN 1 814.40 936.40 0.20 0.6566765 cELEV 1 814.35 936.35 0.14 0.7044864 cDIAM_125 1 817.98 939.98 3.78 0.0519554 . cCRCLS 1 819.32 941.32 5.11 0.0237598 * cCULM_125 1 816.17 938.17 1.97 0.1606846 cSPH 1 816.62 938.62 2.41 0.1204141 cAGE 1 815.92 937.92 1.72 0.1902314 cVRI_NONPINE 1 818.04 940.04 3.84 0.0501149 . cVRI_nonpineCFR 1 821.17 943.17 6.96 0.0083197 ** cVRI_BLEAF 1 818.78 940.78 4.58 0.0324286 * cvol_125 1 814.67 936.67 0.47 0.4949495 cstrDST_SW 1 814.63 936.63 0.42 0.5169757 cwaterDST_SW 1 814.75 936.75 0.55 0.4592643 cSEEDSRCE_SW 1 817.73 939.73 3.53 0.0604234 . cMAT 1 814.27 936.27 0.06 0.8002333 cMWMT 1 814.49 936.49 0.28 0.5942246 cMCMT 1 819.39 941.39 5.18 0.0228425 * cTD 1 816.20 938.20 1.99 0.1580332 cMAP 1 814.25 936.25 0.04 0.8386626 cMSP 1 818.41 940.41 4.20 0.0404411 * cAHM 1 815.66 937.66 1.46 0.2276311 cSHM 1 819.95 941.95 5.75 0.0165227 * cMATMAP 1 814.91 936.91 0.71 0.4001878 cddless0 1 818.04 940.04 3.83 0.0502153 . cddless18 1 817.81 939.81 3.60 0.0576931 . cddgrtr0 1 816.64 938.64 2.44 0.1184235 cddgrtr18 1 815.77 937.77 1.57 0.2104958 cNFFD 1 815.38 937.38 1.18 0.2782582 cbFFP 1 814.39 936.39 0.18 0.6677481 ceFFP 1 820.22 942.22 6.01 0.0141863 * cPAS 1 814.21 936.21 0.01 0.9347654 cDD5_100 1 814.79 936.79 0.58 0.4447531 cEXT_Cold 1 816.99 938.99 2.78 0.0954512 . cS_INDX 1 815.21 937.21 1.01 0.3157208 And then systematically reran the drop1, removing the HIGHEST p value (least significant)from each resultant until only significant (0.10) variables remained. Model: MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR + cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold Df Deviance AIC LRT Pr(Chi) <none> 884.20 946.20 ORG_CODE 4 916.38 970.38 32.18 1.757e-06 *** BECLBL08 9 940.66 984.66 56.46 6.418e-09 *** PEM_SScat 11 906.20 946.20 22.00 0.0243795 * SOIL_NUTR 2 894.19 952.19 9.99 0.0067557 ** cSEEDSRCE_SW 1 894.41 954.41 10.21 0.0013983 ** cMSP 1 896.97 956.97 12.77 0.0003516 *** ceFFP 1 928.50 988.50 44.30 2.812e-11 *** cEXT_Cold 1 923.35 983.35 39.15 3.921e-10 *** I didn't create any kind of dummy or factor variables for my categorical data (at least, not on purpose). With a remaining 8 variables, I tried to run a logistic regression (glm) against my dependent variable(MIN_Mstocked). When I do a summary of the glm, I am provided with the usual table of estimate, std error, z value, and Pr(>|z|)... BUT there are some coefficients missing in the list. None of the categorical data is complete. Some are missing only one category, while others are missing 4 or 5 categories. e.g. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.324e+02 1.363e+03 -0.097 0.922611 ORG_CODE[T.DLA] -1.504e+01 1.363e+03 -0.011 0.991192 ORG_CODE[T.DMO] -1.494e+01 1.363e+03 -0.011 0.991253 ORG_CODE[T.DPG] -1.766e+01 1.363e+03 -0.013 0.989658 ORG_CODE[T.DVA] -1.841e+01 1.363e+03 -0.014 0.989220 BECLBL08[T.SBS dw 2] -6.733e-01 5.903e-01 -1.141 0.254033 BECLBL08[T.SBS dw 3] -1.094e+00 5.714e-01 -1.914 0.055586 . BECLBL08[T.SBS mc 2] 1.573e-01 5.004e-01 0.314 0.753211 BECLBL08[T.SBS mc 3] 1.402e+00 5.824e-01 2.408 0.016043 * BECLBL08[T.SBS mk 1] -2.388e+00 7.529e-01 -3.172 0.001514 ** BECLBL08[T.SBS mw] -1.672e+01 1.393e+03 -0.012 0.990425 BECLBL08[T.SBS vk] -1.614e+01 1.243e+03 -0.013 0.989640 BECLBL08[T.SBS wk 1] -3.640e+00 8.174e-01 -4.453 8.48e-06 *** BECLBL08[T.SBS wk 3] -1.838e+01 1.363e+03 -0.013 0.989240 PEM_SScat[T.B] -1.815e+01 3.956e+03 -0.005 0.996339 PEM_SScat[T.C] 1.998e-01 3.925e-01 0.509 0.610792 PEM_SScat[T.D] -2.314e-01 3.215e-01 -0.720 0.471621 PEM_SScat[T.E] 5.581e-01 3.433e-01 1.626 0.104020 PEM_SScat[T.F] -1.113e+00 5.782e-01 -1.926 0.054153 . PEM_SScat[T.G] 1.780e-01 4.420e-01 0.403 0.687150 PEM_SScat[T.H] 1.670e+01 3.956e+03 0.004 0.996633 PEM_SScat[T.I] 2.751e-01 9.313e-01 0.295 0.767705 PEM_SScat[T.J] -2.623e-01 9.693e-01 -0.271 0.786649 PEM_SScat[T.K] -1.862e+01 3.956e+03 -0.005 0.996244 PEM_SScat[T.L] -1.661e+01 1.211e+03 -0.014 0.989056 SOIL_NUTR[T.C] -1.119e+00 3.781e-01 -2.960 0.003073 ** SOIL_NUTR[T.D] -7.912e-02 9.049e-01 -0.087 0.930320 cSEEDSRCE_SW -1.512e-03 4.930e-04 -3.066 0.002170 ** cMSP 1.808e-02 5.304e-03 3.409 0.000652 *** ceFFP 2.889e-01 4.662e-02 6.196 5.80e-10 *** cEXT_Cold -1.880e+00 3.330e-01 -5.647 1.63e-08 *** There should be a PEM_Sscat[T.A]. It is the most prevalent occurrence in this category. ORG_CODE is missing more than 6 categories in the list SOIL_NUTR should have a [T.B] Does that help? -----Original Message----- From: Kevin E. Thorpe [mailto:[EMAIL PROTECTED] Sent: Saturday, September 27, 2008 6:21 AM To: Darin Brooks Cc: r-help@r-project.org Subject: Re: [R] logistic regression Darin Brooks wrote: > Good afternoon > > I have what I hope is a simple logistic regression issue. > > I started with 44 independent variables and then used the drop1, > test="chisq" to reduce the list to 8 significant independent variables. > > drop1(sep22lr, test="Chisq") and wound up with this model: > > Model: MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR + > cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold > > 4 of the remaining variables are categorical and 4 are continuous. > > However, when I run a glm and then a summary on the glm - some of the > categorical data is missing from the output. > > The PEM_SScat is missing only one variable ... the BECLBL08 is missing > several variables ... the ORG_CODE is missing 4 .. and the SOIL_NUTR > is missing 1 variable. > > It seems arbitrary to the number of variables missing. Is there > something wrong with my syntax in calling the logistic model? Am I not understanding > the inputs correctly? > > Any help would be appreciated. > I'm not sure I fully understand your question. It sounds like you created your own dummy variables for your categorical variables. Did you? Or did you use factor variables for your categorical variables? If the latter, then I REALLY don't understand your question. Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: [EMAIL PROTECTED] Tel: 416.864.5776 Fax: 416.864.6057 No virus found in this incoming message. Checked by AVG - http://www.avg.com 6:55 PM ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.