I was earlier looking at the code of predict_proba of LDA and LogisticRegression. While we certainly some bugs I was a bit confused and I thought an email would be better than opening an issue since that might not be one.
In the case of multiclass classification, the probabilities could be computed with two different assumptions - either as a set of independent binary regression or as a log-linear model ( https://en.wikipedia.org/wiki/Multinomial_logistic_regression). Then, we can compute the probabilities either by using a class as a pivot and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes and computing a softmax. My question is related to the LogisticRegression in the OvR scheme. Naively, I thought that it was corresponding to the former case (case of a set of independent regression). However, we are using another normalization there which was first implemented in liblinear. I search on liblinear's issue tracker and found: https://github.com/cjlin1/liblinear/pull/20 It is related to the following paper: https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf My skill in math is limited and I am not sure to grasp what is going on? Anybody could bring some lights on this OvR normalization and why is it different from the case of a set of independent regression describe in Wikipedia? Cheers, -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn