The equations in Murphy and Hastie very clearly assume a metric decomposable over samples (a loss function). Several popular metrics are not.
For a metric like MSE it will be almost identical assuming the test sets have almost the same size. For something like Recall (sensitivity) it will be almost identical assuming similar test set sizes *and* stratification. For something like precision whose denominator is determined by the biases of the learnt classifier on the test dataset, you can't say the same. For something like ROC AUC score, relying on some decision function that may not be equivalently calibrated across splits, evaluating in this way is almost meaningless. On Wed, 3 Apr 2019 at 22:01, Boris Hollas <hol...@informatik.htw-dresden.de> wrote: > > I use > > sum((cross_val_predict(model, X, y) - y)**2) / len(y) (*) > > to evaluate the performance of a model. This conforms with Murphy: Machine > Learning, section 6.5.3, and Hastie et al: The Elements of Statistical > Learning, eq. 7.48. However, according to the documentation of > cross_val_predict, "it is not appropriate to pass these predictions into an > evaluation metric". While it is obvious that cross_val_predict is different > from cross_val_score, I don't see what should be wrong with (*). > > Also, the explanation that "cross_val_predict simply returns the labels (or > probabilities)" is unclear, if not wrong. As I understand it, this function > returns estimates and no labels or probabilities. > > Regards, Boris > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn