On 2012-03-20, at 9:12 PM, Rami Al-Rfou' wrote:
> Hi All,
>
> I trained a Logistic regression classifier.
>
> To simplify the deployment of my work, I want to decrease the libraries
> dependencies. As I doubt there is a need to retrain the classifier, I want to
> remove the sklearn dependency.
>
> Therefore, I want to write the predict function myself with the help of the
> trained classifier coefficients. Can someone point to me the decision
> function used to calculate the score for each class?
The actual function is buried in liblinear C code, but it's pretty easy to
reproduce, if all you want is a single prediction. It will be the class i with
the highest value for dot(coef[i], test_example) + intercept[i].
e.g.
In [45]: lr = LogisticRegression().fit(randn(5000, 300), random_integers(0, 3,
size=5000))
In [46]: test_data = randn(50, 300)
In [47]: lr.predict(test_data)
Out[47]:
array([2, 3, 0, 1, 1, 1, 2, 3, 1, 1, 3, 3, 1, 3, 2, 3, 3, 0, 1, 0, 3, 3, 1,
1, 2, 2, 0, 3, 1, 1, 2, 0, 3, 3, 0, 2, 2, 0, 3, 3, 1, 2, 0, 2, 1, 2,
1, 0, 1, 3], dtype=int32)
Now, to do the same without the predict() method:
In [48]: np.argmax(np.dot(test_data, lr.coef_.T) + lr.intercept_, axis=1)
Out[48]:
array([2, 3, 0, 1, 1, 1, 2, 3, 1, 1, 3, 3, 1, 3, 2, 3, 3, 0, 1, 0, 3, 3, 1,
1, 2, 2, 0, 3, 1, 1, 2, 0, 3, 3, 0, 2, 2, 0, 3, 3, 1, 2, 0, 2, 1, 2,
1, 0, 1, 3])
If you want the output of predict_proba() then I'm not entirely sure how that's
calculated for multiclass in the case of liblinear, I'm pretty sure it's not
the way I'm familiar with (i.e. a softmax).
David
P.S. devs: why does this yield a 4x300 in coef_ when the documentation says
n_classes - 1?
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general