Re: [Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

Joel Nothman Mon, 08 Jul 2013 18:33:09 -0700

I don't know about the theory of applying the metric across all cv folds,
but it certainly fits poorly with the current API which assumes folds are
scored with scorer(estimator, X, y_true) -> float which is averaged across
folds. Such functionality could be made possible with:
* scorers allowed to return a structured value (in this case the output of
`predict_proba` or `decision_function`) rather than a scalar, and
* pluggable aggregation of scores across folds: calc_objective(fold_scores,
cv, y_true) -> float.


- Joel


On Tue, Jul 9, 2013 at 4:50 AM, Josh Wasserstein <[email protected]>wrote:

> The following call runs into an error
>
> clf = GridSearchCV(SVC(C=1), tuned_parameters,
> score_func=sklearn.metrics.auc_score,verbose=2, n_jobs=1, cv=loo)
> clf.fit(X, y)
>
> with:
> /opt/python/virtualenvs/work/lib/python2.7/site-packages/skle
> arn/metrics/metrics.pyc in auc(x, y, reorder)
>      64     # XXX: Consider using  ``scipy.integrate`` instead, or moving t
> o
>      65     # ``utils.extmath``
> ---> 66     x, y = check_arrays(x, y)
>      67     if x.shape[0] < 2:
>      68         raise ValueError('At least 2 points are needed to compute'
>
> even though X and y hold more than 100 examples with 20+ positives.
>
> It looks sklearn cannot obtain  AUC scores with LOO since this requires at
> least two points (and probably a mix of positives and negatives), and in
> LOO each fold only has one point.
>
> However, one way to circumvent this limitation could be to concatenate the
> prediction of each fold in LOO (concatenate all predictions), and only then
> measure AUC.
>
> In fact, this is a whole different way of evaluating the performance of a
> model with cross validation. Rather than averaging the scores across folds,
> one could always concatenate the prediction results and measure the
> performance. This way score functions can always be measured directly on
> the prediction of the full dataset.
>
> This also brings interesting an interesting ML question since
>  mean(scores) != score(concatenation))
>
> Is there anything wrong with this approach?
>
> Thanks,
>
> Josh
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

Reply via email to