see https://github.com/scikit-learn/scikit-learn/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+scale_C+ for historical perspective on this issue.
Alex On Wed, May 29, 2019 at 11:32 PM Stuart Reynolds <stu...@stuartreynolds.net> wrote: > > I looked into like a while ago. There were differences in which algorithms > regularized the intercept, and which ones do not. (I believe liblinear does, > lbgfs does not). > All of the algorithms disagreed with logistic regression in scipy. > > - Stuart > > On Wed, May 29, 2019 at 10:50 AM Andreas Mueller <t3k...@gmail.com> wrote: >> >> That is not very ideal indeed. >> I think we just went with what liblinear did, and when saga was introduced >> kept that behavior. >> It should probably be scaled as in Lasso, I would imagine? >> >> >> On 5/29/19 1:42 PM, Michael Eickenberg wrote: >> >> Hi Jesse, >> >> I think there was an effort to compare normalization methods on the data >> attachment term between Lasso and Ridge regression back in 2012/13, but this >> might have not been finished or extended to Logistic Regression. >> >> If it is not documented well, it could definitely benefit from a >> documentation update. >> >> As for changing it to a more consistent state, that would require adding a >> keyword argument pertaining to this functionality and, after discussion, >> possibly changing the default value after some deprecation cycles (though >> this seems like a dangerous one to change at all imho). >> >> Michael >> >> >> On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.live...@gmail.com> >> wrote: >>> >>> Hi everyone, >>> >>> I noticed recently that in the Lasso implementation (and docs), the MSE >>> term is normalized by the number of samples >>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html >>> >>> but for LogisticRegression + L1, the logloss does not seem to be normalized >>> by the number of samples. One consequence is that the strength of the >>> regularization depends on the number of samples explicitly. For instance, >>> in Lasso, if you tile a dataset N times, you will learn the same coef, but >>> in LogisticRegression, you will learn a different coef. >>> >>> Is this the intended behavior of LogisticRegression? I was surprised by >>> this. Either way, it would be helpful to document this more clearly in the >>> Logistic Regression docs (I can make a PR.) >>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html >>> >>> Jesse >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn