Re: [Scikit-learn-general] score_func discussion

Mathieu Blondel Tue, 23 Oct 2012 00:21:40 -0700

On Tue, Oct 23, 2012 at 6:43 AM, Gael Varoquaux <
[email protected]> wrote:


> a. having the score method of object accept score_func argument, that
>    would than be used inside the score method:
>    https://github.com/scikit-learn/scikit-learn/pull/1198/files#L0R265
>
> b. having score_func's signature be 'estimator, X, y' (discussed
>    offline).
>

I think there are 2 different problems here:

1) Some metrics require more complicated input than just y_true and y_pred.

2) Some people have expressed the need to be able to call
estimator.score(X, y) with other
metrics than the default one.

Solving 2) doesn't solve the function signature problem of 1).

A question is how does a solution to these problems fit in our grid search
framework.


> The drawback that I see to option a is that the requirements of score
> funcs are not homogeneous, and that all cannot apply to every estimator.
> We are already seeing in the PRs that we need to define a
> 'requires_threshold' decoration. For some of my personnal usecases, I can
>

To be more general, I would rather call it `requires_prediction_score` (by
prediction score, I mean the output of predict_proba or decision_function).


> already see other signatures of score functions required. I really don't
> like this, because it embeds custom code in parts of the scikit that are
> general purpose. This pattern, in my experience, tend to create tight
> coupling and to eventually lead to code that is harder to extend.
>
> I must say that defining a meta-language defining capabilities of score
> functions really raises warning signs as far as I am concerned. I find
> that contracts based on imperative code are much easier to maintain and
> extend than contracts based in declarative interfaces.
>



> Option b seems fairly reasonnable from the design point of view. I think
> that it is very versatile. The main drawback that I see, is that it does
> not make the user's life easy to use various score functions existing in
> the metrics module, as their signature is 'y_true, y_pred'.
>

Can you elaborate option b and how it would solve the function signature
problem?


>
> However, option b really has the look and smell of a method to me.
> Combine with the fact that some score functions need estimator-specific
> information (i.e.: how to retrieve unthresholded decisions), it led me
> to think that my favorite option would be to put as much as possible in
> the estimator. The option that I am championing would be to add an
> argument to estimators to be able to switch the score function. This
> argument could either be a string, say 'auc', or a score_func with a
> given signature (estimator specific, but we would try to have as little
> of these as possible).
>

I would rather avoid adding too many parameters to the constructor that
don't affect `fit`.
If I fit a model, I would expect to be able to evaluate it with different
metrics without creating a new estimator object.
Being able to just do estimator.score(X, y, "auc") would be very convenient.


>  The goal of this email is to try to have a sane discussion of what are
> the best choices in terms of simplicity of code, simplicity for the user,
> and versatility for the scoring API. As I am writing this email, I get to
> defend my point of view, but I hope that Andy will correct any false or
> incomplete vision that I gave of his point of view.
>

I personally like the decorator approach as it allows us to declare what a
metrics expects although I understand your aversion to adding too much
framework code.
It would also allow to declare whether a metric is a score (bigger is
better) or a loss (smaller is better) and thus get rid of loss_func in
GridSearchCV.

>From a practical point of view, most users don't need custom metrics (just
the ones provided by scikit-learn), so most people would not even notice
this underlying declarative framework.
We should keep in mind the 90% use case first.

Mathieu

# A third option would be to introduce metric objects and a way to inspect
their capabilities but that's even more framework-ish than the decorator
solution :)

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] score_func discussion

Reply via email to