The problem is the (stupid!) liblinear solver that also penalises the intercept (in regularisation) . Use a different solver or change the intercept_scaling parameter
On 15 Dec 2016 10:44 pm, "Sebastian Raschka" <se.rasc...@gmail.com> wrote: > Subtracting the median wouldn’t result in normalizing the usual sense, > since subtracting a constant just shifts the values by a constant. Instead, > for logistic regression & most optimizers, I would recommend subtracting > the mean to center the features at mean zero and divide by the standard > deviation to get “z” scores (e.g., this can be done by the > StandardScaler()). > > Best, > Sebastian > > > On Dec 15, 2016, at 4:02 PM, Rachel Melamed <mela...@uchicago.edu> > wrote: > > > > I just tried it and it did not appear to change the results at all? > > I ran it as follows: > > 1) Normalize dummy variables (by subtracting median) to make a matrix of > about 10000 x 5 > > > > 2) For each of the 1000 output variables: > > a. Each output variable uses the same dummy variables, but not all > settings of covariates are observed for all output variables. So I create > the design matrix using patsy per output variable to include pairwise > interactions. Then, I have an around 10000 x 350 design matrix , and a > matrix I call “success_fail” that has for each setting the number of > success and number of fail, so it is of size 10000 x 2 > > > > b. Run regression using: > > > > skdesign = np.vstack((design,design)) > > > > sklabel = np.hstack((np.ones(success_fail.shape[0]), > > np.zeros(success_fail.shape[0]))) > > > > skweight = np.hstack((success_fail['success'], success_fail['fail'])) > > > > logregN = linear_model.LogisticRegression(C=1, > > solver= 'lbfgs',fit_intercept=False) > > logregN.fit(skdesign, sklabel, sample_weight=skweight) > > > > > >> On Dec 15, 2016, at 2:16 PM, Alexey Dral <aad...@gmail.com> wrote: > >> > >> Could you try to normalize dataset after feature dummy encoding and see > if it is reproducible behavior? > >> > >> 2016-12-15 22:03 GMT+03:00 Rachel Melamed <mela...@uchicago.edu>: > >> Thanks for the reply. The covariates (“X") are all dummy/categorical > variables. So I guess no, nothing is normalized. > >> > >>> On Dec 15, 2016, at 1:54 PM, Alexey Dral <aad...@gmail.com> wrote: > >>> > >>> Hi Rachel, > >>> > >>> Do you have your data normalized? > >>> > >>> 2016-12-15 20:21 GMT+03:00 Rachel Melamed <mela...@uchicago.edu>: > >>> Hi all, > >>> Does anyone have any suggestions for this problem: > >>> http://stackoverflow.com/questions/41125342/sklearn- > logistic-regression-gives-biased-results > >>> > >>> I am running around 1000 similar logistic regressions, with the same > covariates but slightly different data and response variables. All of my > response variables have a sparse successes (p(success) < .05 usually). > >>> > >>> I noticed that with the regularized regression, the results are > consistently biased to predict more "successes" than is observed in the > training data. When I relax the regularization, this bias goes away. The > bias observed is unacceptable for my use case, but the more-regularized > model does seem a bit better. > >>> > >>> Below, I plot the results for the 1000 different regressions for 2 > different values of C: > >>> > >>> I looked at the parameter estimates for one of these regressions: > below each point is one parameter. It seems like the intercept (the point > on the bottom left) is too high for the C=1 model. > >>> > >>> > >>> > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn@python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> > >>> > >>> > >>> > >>> -- > >>> Yours sincerely, > >>> Alexey A. Dral > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn@python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> > >> > >> > >> -- > >> Yours sincerely, > >> Alexey A. Dral > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn