Re: [scikit-learn] Why is cross_val_predict discouraged?

Boris Hollas Wed, 03 Apr 2019 09:52:36 -0700

Am 03.04.19 um 13:59 schrieb Joel Nothman:

The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.


For a metric like MSE it will be almost identical assuming the test
sets have almost the same size.

What will be almost identical to what? I suppose you mean that (*) isconsistent with the scores of the models in the fold (ie, the result ofcross_val_score) if the loss function is (x-y)².

For something like Recall
(sensitivity) it will be almost identical assuming similar test set
sizes*and*  stratification. For something like precision whose
denominator is determined by the biases of the learnt classifier on
the test dataset, you can't say the same.

I can't follow here. If the loss function is L(x,y) = 1_{x = y}, then(*) gives the accuracy.

  For something like ROC AUC
score, relying on some decision function that may not be equivalently
calibrated across splits, evaluating in this way is almost
meaningless.

In any case, I still don't see what may be wrong with (*). Otherwise,the warning in the documentation about the use of cross_val_predictshould be removed or revised.

On the other hand, an example in the documentation usescross_val_scores.mean(). This is debatable since this computes a mean ofmeans.


On Wed, 3 Apr 2019 at 22:01, Boris Hollas
<[email protected]>  wrote:

I use

sum((cross_val_predict(model, X, y) - y)**2) / len(y)        (*)

to evaluate the performance of a model. This conforms with Murphy: Machine Learning, 
section 6.5.3, and Hastie et al: The Elements of Statistical Learning,  eq. 7.48. 
However, according to the documentation of cross_val_predict, "it is not appropriate 
to pass these predictions into an evaluation metric". While it is obvious that 
cross_val_predict is different from cross_val_score, I don't see what should be wrong 
with (*).

Also, the explanation that "cross_val_predict simply returns the labels (or 
probabilities)" is unclear, if not wrong. As I understand it, this function returns 
estimates and no labels or probabilities.

Regards, Boris

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Why is cross_val_predict discouraged?

Reply via email to