On 4/3/19 7:59 AM, Joel Nothman wrote:
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.

For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For something like Recall
(sensitivity) it will be almost identical assuming similar test set
sizes *and* stratification. For something like precision whose
denominator is determined by the biases of the learnt classifier on
the test dataset, you can't say the same. For something like ROC AUC
score, relying on some decision function that may not be equivalently
calibrated across splits, evaluating in this way is almost
meaningless.

In theory. Not sure how it holds up in practice.

I didn't get the point about precision.

But yes, we should add to the docs that in particular for losses that don't decompose this is a weird thing to do.

If the loss decomposes, the result might be different b/c of different test set sizes, but I'm not sure if they are "worse" in some way?

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to