*The set of independent regressions described in Wikipedia is *not* an OvR model.* It is just a (weird) way to understand the multinomial logistic regression model. OvR logistic regression and multinomial logistic regression are two different models.
In multinomial logistic regression as a set of independent binary regressions as described in Wikipedia, you have K - 1 binary regressions between class k (k from 1 to K - 1) and class K. Whereas in OvR logistic regression you have K binary regressions between class k (k from 1 to K) and class "not class k". The normalization is therefore different. Indeed, in multinomial logistic regression as a set of independent binary regressions, you have (from the beginning) the property 1 = sum_k p(y = k). The normalization 1 / (1 + sum_{k=1}^{K - 1} p(y = k)) comes from the late computation of p(y = K) using this property. Whereas in OvR logistic regression, you only have 1 = p_k(y = k) + p_k(y != k). Therefore the probabilities p_k(y = k) do not sum to one, and you need to normalize them with sum_{k=1}^{K} p_k(y = k) to create a valid probability of the OvR model. This is done in the same way in OneVsRestClassifier ( https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351 ). But I agree that this description of the multinomial model is quite confusing, compared to the log-linear/softmax description. Tom Le jeu. 7 févr. 2019 à 08:31, Guillaume Lemaître <g.lemaitr...@gmail.com> a écrit : > I was earlier looking at the code of predict_proba of LDA and > LogisticRegression. While we certainly some bugs I was a bit confused and I > thought an email would be better than opening an issue since that might not > be one. > > In the case of multiclass classification, the probabilities could be > computed with two different assumptions - either as a set of independent > binary regression or as a log-linear model ( > https://en.wikipedia.org/wiki/Multinomial_logistic_regression). > > Then, we can compute the probabilities either by using a class as a pivot > and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes > and computing a softmax. > > My question is related to the LogisticRegression in the OvR scheme. > Naively, I thought that it was corresponding to the former case (case of a > set of independent regression). However, we are using another normalization > there which was first implemented in liblinear. I search on liblinear's > issue tracker and found: https://github.com/cjlin1/liblinear/pull/20 > > It is related to the following paper: > https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf > > My skill in math is limited and I am not sure to grasp what is going on? > Anybody could bring some lights on this OvR normalization and why is it > different from the case of a set of independent regression describe in > Wikipedia? > > Cheers, > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn