I wrote the following multi-class implementation of ROC AUC metrics. I
looked into creating a Pull Request, but since I am still not familiar with
the internals of scikit-learn and PRs in GitHub yet, I decided to instead
share it here instead, in case anyone finds it useful.
Note that this is a multi-class implementation and not a multi-label one
(which was also discussed in the list recently).
One thing I noticed browsing the source code on GH is that many internal
metrics are not defined in sklearn using make_scorer. I presume that this
is deliberate (i.e. leaving make_scorer exclusively for the definition of
user-defined metrics).
Please feel free to comment/ make suggestions, etc.
Thanks,
Josh
====
import numpy as np
from sklearn import datasets
from sklearn import cross_validation
from sklearn import svm
from sklearn.metrics import make_scorer, roc_auc_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
def multi_label_macro_auc(y_gt, y_pred):
from sklearn.metrics import roc_auc_score
n_labels = y_pred.shape[1]
auc_scores = [None] * n_labels
for label in xrange(n_labels):
auc_scores[label] = roc_auc_score((y_gt == label)*1, y_pred[:,label])
return np.mean(auc_scores)
def multi_label_weighted_macro_auc(y_gt, y_pred):
from sklearn.metrics import roc_auc_score
n_total = len(y_gt)
n_labels = y_pred.shape[1]
auc_scores = [None] * n_labels
for label in xrange(n_labels):
tmp = y_gt == label
n_this_label = float(np.sum(tmp))
auc_scores[label] = n_this_label/n_total *roc_auc_score(tmp,
y_pred[:,label])
return np.sum(auc_scores)
def multi_label_micro_auc(y_gt, y_pred):
from sklearn.metrics import roc_auc_score
pred_v = y_pred.flatten('F')
labels = range(y_pred.shape[1])
gt_v = [ y_gt == l for l in labels]
gt_v = np.hstack(gt_v) * 1
return roc_auc_score(gt_v, pred_v)
ml_macro_auc_s = make_scorer(multi_label_macro_auc,
greater_is_better=True, needs_threshold=False, needs_proba=True)
ml_w_macro_auc_s = make_scorer(multi_label_weighted_macro_auc,
greater_is_better=True, needs_threshold=False, needs_proba=True)
ml_micro_auc_s = make_scorer(multi_label_micro_auc,
greater_is_better=True, needs_threshold=False, needs_proba=True)
clf = svm.SVC(kernel='linear', probability=True)
print cross_validation.cross_val_score(clf, iris.data, iris.target,
cv=3, scoring=ml_macro_auc_s)
print cross_validation.cross_val_score(clf, iris.data, iris.target,
cv=3, scoring=ml_w_macro_auc_s)
print cross_validation.cross_val_score(clf, iris.data, iris.target,
cv=3, scoring=ml_micro_auc_s)
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general