Thanks Tom. That's clear now. 

Sent from my phone - sorry to be brief and potential misspell.

From: tom.duprelat...@orange.fr
Sent: 8 February 2019 02:52
To: scikit-learn@python.org
Reply to: scikit-learn@python.org
Subject: Re: [scikit-learn] Probabilities for LogisticRegression and LDA

The set of independent regressions described in Wikipedia is *not* an OvR model. It is just a (weird) way to understand the multinomial logistic regression model.
OvR logistic regression and multinomial logistic regression are two different models.

In multinomial logistic regression as a set of independent binary regressions as described in Wikipedia, you have K - 1 binary regressions between class k (k from 1 to K - 1) and class K.
Whereas in OvR logistic regression you have K binary regressions between class k (k from 1 to K) and class "not class k".
The normalization is therefore different.

Indeed, in multinomial logistic regression as a set of independent binary regressions, you have (from the beginning) the property 1 = sum_k p(y = k). The normalization 1 / (1 + sum_{k=1}^{K - 1} p(y = k)) comes from the late computation of p(y = K) using this property.
Whereas in OvR logistic regression, you only have 1 = p_k(y = k) + p_k(y != k). Therefore the probabilities p_k(y = k) do not sum to one, and you need to normalize them with sum_{k=1}^{K} p_k(y = k) to create a valid probability of the OvR model. This is done in the same way in OneVsRestClassifier (https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351).

But I agree that this description of the multinomial model is quite confusing, compared to the log-linear/softmax description.

Tom

Le jeu. 7 févr. 2019 à 08:31, Guillaume Lemaître <g.lemaitr...@gmail.com> a écrit :
I was earlier looking at the code of predict_proba of LDA and LogisticRegression. While we certainly some bugs I was a bit confused and I thought an email would be better than opening an issue since that might not be one.

In the case of multiclass classification, the probabilities could be computed with two different assumptions - either as a set of independent binary regression or as a log-linear model (https://en.wikipedia.org/wiki/Multinomial_logistic_regression).

Then, we can compute the probabilities either by using a class as a pivot and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes and computing a softmax.

My question is related to the LogisticRegression in the OvR scheme. Naively, I thought that it was corresponding to the former case (case of a set of independent regression). However, we are using another normalization there which was first implemented in liblinear. I search on liblinear's issue tracker and found: https://github.com/cjlin1/liblinear/pull/20

It is related to the following paper: https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf

My skill in math is limited and I am not sure to grasp what is going on? Anybody could bring some lights on this OvR normalization and why is it different from the case of a set of independent regression describe in Wikipedia?

Cheers,
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to