Sorry just saw you are not using the liblinear solver, agree with Sebastian, you should subtract mean not median
On 15 Dec 2016 11:02 pm, "Sean Violante" <sean.viola...@gmail.com> wrote: > The problem is the (stupid!) liblinear solver that also penalises the > intercept (in regularisation) . Use a different solver or change the > intercept_scaling parameter > > On 15 Dec 2016 10:44 pm, "Sebastian Raschka" <se.rasc...@gmail.com> wrote: > >> Subtracting the median wouldn’t result in normalizing the usual sense, >> since subtracting a constant just shifts the values by a constant. Instead, >> for logistic regression & most optimizers, I would recommend subtracting >> the mean to center the features at mean zero and divide by the standard >> deviation to get “z” scores (e.g., this can be done by the >> StandardScaler()). >> >> Best, >> Sebastian >> >> > On Dec 15, 2016, at 4:02 PM, Rachel Melamed <mela...@uchicago.edu> >> wrote: >> > >> > I just tried it and it did not appear to change the results at all? >> > I ran it as follows: >> > 1) Normalize dummy variables (by subtracting median) to make a matrix >> of about 10000 x 5 >> > >> > 2) For each of the 1000 output variables: >> > a. Each output variable uses the same dummy variables, but not all >> settings of covariates are observed for all output variables. So I create >> the design matrix using patsy per output variable to include pairwise >> interactions. Then, I have an around 10000 x 350 design matrix , and a >> matrix I call “success_fail” that has for each setting the number of >> success and number of fail, so it is of size 10000 x 2 >> > >> > b. Run regression using: >> > >> > skdesign = np.vstack((design,design)) >> > >> > sklabel = np.hstack((np.ones(success_fail.shape[0]), >> > np.zeros(success_fail.shape[0]))) >> > >> > skweight = np.hstack((success_fail['success'], success_fail['fail'])) >> > >> > logregN = linear_model.LogisticRegression(C=1, >> > solver= 'lbfgs',fit_intercept=False) >> > logregN.fit(skdesign, sklabel, sample_weight=skweight) >> > >> > >> >> On Dec 15, 2016, at 2:16 PM, Alexey Dral <aad...@gmail.com> wrote: >> >> >> >> Could you try to normalize dataset after feature dummy encoding and >> see if it is reproducible behavior? >> >> >> >> 2016-12-15 22:03 GMT+03:00 Rachel Melamed <mela...@uchicago.edu>: >> >> Thanks for the reply. The covariates (“X") are all dummy/categorical >> variables. So I guess no, nothing is normalized. >> >> >> >>> On Dec 15, 2016, at 1:54 PM, Alexey Dral <aad...@gmail.com> wrote: >> >>> >> >>> Hi Rachel, >> >>> >> >>> Do you have your data normalized? >> >>> >> >>> 2016-12-15 20:21 GMT+03:00 Rachel Melamed <mela...@uchicago.edu>: >> >>> Hi all, >> >>> Does anyone have any suggestions for this problem: >> >>> http://stackoverflow.com/questions/41125342/sklearn-logistic >> -regression-gives-biased-results >> >>> >> >>> I am running around 1000 similar logistic regressions, with the same >> covariates but slightly different data and response variables. All of my >> response variables have a sparse successes (p(success) < .05 usually). >> >>> >> >>> I noticed that with the regularized regression, the results are >> consistently biased to predict more "successes" than is observed in the >> training data. When I relax the regularization, this bias goes away. The >> bias observed is unacceptable for my use case, but the more-regularized >> model does seem a bit better. >> >>> >> >>> Below, I plot the results for the 1000 different regressions for 2 >> different values of C: >> >>> >> >>> I looked at the parameter estimates for one of these regressions: >> below each point is one parameter. It seems like the intercept (the point >> on the bottom left) is too high for the C=1 model. >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> scikit-learn mailing list >> >>> scikit-learn@python.org >> >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Yours sincerely, >> >>> Alexey A. Dral >> >>> _______________________________________________ >> >>> scikit-learn mailing list >> >>> scikit-learn@python.org >> >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> >> >> >> >> >> >> -- >> >> Yours sincerely, >> >> Alexey A. Dral >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn