> On Mar 10, 2016, at 2:21 PM, Michael Artz <michaelea...@gmail.com> wrote: > > Here is the results of the logistic regression model. Is it because of the > NA values?
It's unclear. The InternetServiceNo (an other "No")-values could well be the cause. Many times questionnaires get encoded in a manner that causes complete collinearity and the glm function then "aliases" those levels and displays an NA result for the coefficients. I don't remember the predict function then emitting that warning, but seems possible that including column names for aliased factors would be a well-mannered behavior for software. At any rate I don't see the absurd sorts of coefficients (such as 10 or 20) that I associate with severe numerical pathology. > > Call: > glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection + > gender + InternetService + MonthlyCharges + MultipleLines + > OnlineBackup + OnlineSecurity + PaperlessBilling + Partner + > PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies + > StreamingTV + TechSupport + tenure + TotalCharges, family = > binomial(link = "logit"), > data = churn_training) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.8943 -0.6867 -0.2863 0.7378 3.4259 > > Coefficients: (7 not defined because of singularities) > Estimate Std. Error z value Pr(>|z|) > > (Intercept) 1.0664928 1.7195494 0.620 0.5351 > > ContractOne year -0.6874005 0.1314227 -5.230 1.69e-07 > *** > ContractTwo year -1.2775385 0.2101193 -6.080 1.20e-09 > *** > DependentsYes -0.1485301 0.1095348 -1.356 0.1751 > > DeviceProtectionNo internet service -1.5547306 0.9661837 -1.609 0.1076 > > DeviceProtectionYes 0.0459115 0.2114253 0.217 0.8281 > > genderMale -0.0350970 0.0776896 -0.452 0.6514 > > InternetServiceFiber optic 1.4800374 0.9545398 1.551 0.1210 > > InternetServiceNo NA NA NA NA > > MonthlyCharges -0.0324614 0.0379646 -0.855 0.3925 > > MultipleLinesNo phone service 0.0808745 0.7736359 0.105 0.9167 > > MultipleLinesYes 0.3990450 0.2131343 1.872 0.0612 > . > OnlineBackupNo internet service NA NA NA NA > > OnlineBackupYes -0.0328892 0.2081145 -0.158 0.8744 > > OnlineSecurityNo internet service NA NA NA NA > > OnlineSecurityYes -0.2760602 0.2132917 -1.294 0.1956 > > PaperlessBillingYes 0.3509944 0.0890884 3.940 8.15e-05 > *** > PartnerYes 0.0306815 0.0940650 0.326 0.7443 > > PaymentMethodCredit card (automatic) -0.0710923 0.1377252 -0.516 0.6057 > > PaymentMethodElectronic check 0.3074078 0.1137939 2.701 0.0069 > ** > PaymentMethodMailed check -0.0201076 0.1377539 -0.146 0.8839 > > PhoneServiceYes NA NA NA NA > > SeniorCitizen 0.1856454 0.1023527 1.814 0.0697 > . > StreamingMoviesNo internet service NA NA NA NA > > StreamingMoviesYes 0.5260087 0.3899615 1.349 0.1774 > > StreamingTVNo internet service NA NA NA NA > > StreamingTVYes 0.4781321 0.3905777 1.224 0.2209 > > TechSupportNo internet service NA NA NA NA > > TechSupportYes -0.2511197 0.2181612 -1.151 0.2497 > > tenure -0.0702813 0.0077113 -9.114 < 2e-16 > *** > TotalCharges 0.0004276 0.0000874 4.892 9.97e-07 > *** > > On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsem...@comcast.net> > wrote: > >> >>> On Mar 10, 2016, at 8:08 AM, Michael Artz <michaelea...@gmail.com> >> wrote: >>> >>> HI all, >>> I have the following error - >>>> resultVector <- predict(logitregressmodel, dataset1, type='response') >>> Warning message: >>> In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == >> : >>> prediction from a rank-deficient fit may be misleading >> >> It wasn't an R error. It was an R warning. Was the `summary` output on >> logitregressmodel informative? Does the resultVector look sensible given >> its inputs? >> >> >>> I have seen on internet that there may be some collinearity in the data >> and >>> this is causing that. How can I be sure? >> >> Do some diagnostics. After looking carefully at the output of >> summary(logitregressmodel) and perhaps summary(dataset1) if it was the >> original input to the modeling functions, and then you could move on to >> looking at cross-correlations on things you think are continuous and >> crosstabs on factor variables and the condition number on the full data >> matrix. >> >> Lots of stuff turns up on search for "detecting collinearity condition >> number in r" >> >>> >>> Thanks >>> >>> [[alternative HTML version deleted]] >>> David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.