[Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

Josh Wasserstein Mon, 08 Jul 2013 11:52:24 -0700

The following call runs into an error

clf = GridSearchCV(SVC(C=1), tuned_parameters,
score_func=sklearn.metrics.auc_score,verbose=2, n_jobs=1, cv=loo)
clf.fit(X, y)


with:
/opt/python/virtualenvs/work/lib/python2.7/site-packages/skle
arn/metrics/metrics.pyc in auc(x, y, reorder)
     64     # XXX: Consider using  ``scipy.integrate`` instead, or moving t
o
     65     # ``utils.extmath``
---> 66     x, y = check_arrays(x, y)
     67     if x.shape[0] < 2:
     68         raise ValueError('At least 2 points are needed to compute'

even though X and y hold more than 100 examples with 20+ positives.

It looks sklearn cannot obtain  AUC scores with LOO since this requires at
least two points (and probably a mix of positives and negatives), and in
LOO each fold only has one point.

However, one way to circumvent this limitation could be to concatenate the
prediction of each fold in LOO (concatenate all predictions), and only then
measure AUC.

In fact, this is a whole different way of evaluating the performance of a
model with cross validation. Rather than averaging the scores across folds,
one could always concatenate the prediction results and measure the
performance. This way score functions can always be measured directly on
the prediction of the full dataset.

This also brings interesting an interesting ML question since  mean(scores)
!= score(concatenation))

Is there anything wrong with this approach?

Thanks,

Josh

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

Reply via email to