Re: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

Alexandre Gramfort Mon, 10 Jun 2019 00:19:20 -0700

see 
https://github.com/scikit-learn/scikit-learn/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+scale_C+
for historical perspective on this issue.


Alex

On Wed, May 29, 2019 at 11:32 PM Stuart Reynolds
<stu...@stuartreynolds.net> wrote:
>
> I looked into like a while ago. There were differences in which algorithms 
> regularized the intercept, and which ones do not. (I believe liblinear does, 
> lbgfs does not).
> All of the algorithms disagreed with logistic regression in scipy.
>
> - Stuart
>
> On Wed, May 29, 2019 at 10:50 AM Andreas Mueller <t3k...@gmail.com> wrote:
>>
>> That is not very ideal indeed.
>> I think we just went with what liblinear did, and when saga was introduced 
>> kept that behavior.
>> It should probably be scaled as in Lasso, I would imagine?
>>
>>
>> On 5/29/19 1:42 PM, Michael Eickenberg wrote:
>>
>> Hi Jesse,
>>
>> I think there was an effort to compare normalization methods on the data 
>> attachment term between Lasso and Ridge regression back in 2012/13, but this 
>> might have not been finished or extended to Logistic Regression.
>>
>> If it is not documented well, it could definitely benefit from a 
>> documentation update.
>>
>> As for changing it to a more consistent state, that would require adding a 
>> keyword argument pertaining to this functionality and, after discussion, 
>> possibly changing the default value after some deprecation cycles (though 
>> this seems like a dangerous one to change at all imho).
>>
>> Michael
>>
>>
>> On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.live...@gmail.com> 
>> wrote:
>>>
>>> Hi everyone,
>>>
>>> I noticed recently that in the Lasso implementation (and docs), the MSE 
>>> term is normalized by the number of samples
>>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
>>>
>>> but for LogisticRegression + L1, the logloss does not seem to be normalized 
>>> by the number of samples. One consequence is that the strength of the 
>>> regularization depends on the number of samples explicitly. For instance, 
>>> in Lasso, if you tile a dataset N times, you will learn the same coef, but 
>>> in LogisticRegression, you will learn a different coef.
>>>
>>> Is this the intended behavior of LogisticRegression? I was surprised by 
>>> this. Either way, it would be helpful to document this more clearly in the 
>>> Logistic Regression docs (I can make a PR.)
>>> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
>>>
>>> Jesse
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

Reply via email to