I think we could keep the existing simple score / loss functions for
day to day manual validation of analysis output in an interactive
session for instance, while introducing a richer object oriented API
when working with model selection tools such as cross validation and
grid search.

For instance we could have:

```
class ROCAreaUnderCurveScore(object):

    higher_is_better = True

    def from_estimator(self, clf, X, y_expected):
        if hasattar(clf, 'decision_function'):
            y_predicted_thresholds = clf.decision_function(X)
        elif hasattr(clf, 'predict_proba'):
            y_predicted_thresholds = clf.predict_proba(X)
        else:
            raise TypeError("%r does not support thresholded predictions" % clf)
        # TODO: check binary classification shape or raise ValueError
        return self.from_decision_thresholds(y_expected, y_predicted_thresholds)

    def from_decision_thresholds(self, expected, predicted_thresholds):
        return auc_score(expected, predicted_thresholds)


class FScore(object):

    higher_is_better = True

     def __init__(self, beta):
         self.beta = beta

     def from_estimator(self, clf, X, y_expected):
         # TODO: check input to provide meaningful ValueError or
TypeError to the caller
         return fbeta_score(y_expected, clf.predict(X), beta=self.beta)

     def from_multiclass_prediction(self, y_expected, y_predicted):
         return fbeta_score(y_expected, y_predicted, beta=self.beta)


class RMSELoss(object):

    higher_is_better = False

    def from_estimator(self, clf, X, y_expected):
        # TODO

    def from_regression_prediction(self, y_expected, y_predicted):
        # TODO

# Then later to address common use cases in a flat manner:

COMMON_SCORES = {
  'roc_auc': ROCAreaUnderCurveScore(),
  'f1': FScore(1.0),
  'pr_auc': PRAreaUnderCurveScore(beta=1.0),
  'rmse': RMSELoss(),
}
```

Then in GridSearchCV we can have a flat and convenient API for common
uses cases such as:

>>> GridSearchCV(clf, score='roc_auc').fit(X, y)

while preserving a flexible yet homogeneous API to handle custom use cases:

>>> class MyCustomScore(object):
...    def __init__(self, some_param=1.0):
...         self.some_param = some_param
...    def  from_decision_thresholds(self, expected, predicted_threshold):
...        # do something with  self.some_param, expected and
predicted_threshold
...        return score_value
...
>>> the_forty_two_custom_score = MyCustomScore(some_param=42.)
>>> GridSearchCV(clf, score=the_forty_two_custom_score).fit(X, y)

This way we still have a flat API for 99% of the common use cases
while allowing to express richer semantics when needed (for instance
plugging a domain specific evaluation metric for your own research to
reproduce a domain-specifc benchmark or to compete on a kaggle
challenge).

We further use plain vanilla ducktyping instead of framework specific
helpers (decorators / DSL) to handle the complex case.

We can also provide a bunch of scorers mixin types / ABC to factorize
redundant code.

Finally we might want to later extend such as Scoring API to deal wrap
a validation set / OOB samples to do early stopping on a configurable
score for various scikit-learn models such as SGDClassifer, GBRT... I
have not really thought about that part yet but having score objects
rather than simple funcs / callables should probably make that much
easier.

WDYT?

-- 
Olivier

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to