The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.

For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For something like Recall
(sensitivity) it will be almost identical assuming similar test set
sizes *and* stratification. For something like precision whose
denominator is determined by the biases of the learnt classifier on
the test dataset, you can't say the same. For something like ROC AUC
score, relying on some decision function that may not be equivalently
calibrated across splits, evaluating in this way is almost
meaningless.

On Wed, 3 Apr 2019 at 22:01, Boris Hollas
<hol...@informatik.htw-dresden.de> wrote:
>
> I use
>
> sum((cross_val_predict(model, X, y) - y)**2) / len(y)        (*)
>
> to evaluate the performance of a model. This conforms with Murphy: Machine 
> Learning, section 6.5.3, and Hastie et al: The Elements of Statistical 
> Learning,  eq. 7.48. However, according to the documentation of 
> cross_val_predict, "it is not appropriate to pass these predictions into an 
> evaluation metric". While it is obvious that cross_val_predict is different 
> from cross_val_score, I don't see what should be wrong with (*).
>
> Also, the explanation that "cross_val_predict simply returns the labels (or 
> probabilities)" is unclear, if not wrong. As I understand it, this function 
> returns estimates and no labels or probabilities.
>
> Regards, Boris
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to