Hi Joel,

The last time I encountered this problem, I wrote a custom
cross-validation as you suggest. But I'm solving the same problem again
now, so I started thinking that a general solution would perhaps be
beneficial.

IIRC, there was a discussion here about scoring a while ago but I didn't
follow it. Is there a reason why scorer must return a single number?

My first idea on how to approach this was to allow returning a tuple and
only require the first item in the tuple to be a number. This number
would then be used for comparing models and the rest.

Alternatively, I thought about only requiring the return type of score
to implement comparison operators (for deciding which model is better).
So a numeric type would work well for most use cases and if something
more sophisticated was required, that would also be possible.

Either way, complexity increases. But on the other hand, I think that
looking at classifiers in more detail is a common use case. For example,
one might be interested in how stable some model parameter is across the
whole cross-validation.

The solution using Joblib that you suggested also crossed my mind.
Perhaps it's the most practical one.

Do others have any comments on this?

Cheers,
Michal

On 01/04/14 00:45, [email protected] wrote:
> Hi Michal,
> 
> One way is to roll your own cross validation routine; it's not very
> complicated when specialised to a particular task.
> 
> I have also previously proposed that cross_val_score and
> Randomized/GridSearchCV provide an arbitrary callback parameter that could
> return the model or other diagnostic information. The right interface for
> this sort of thing is uncertain.
> 
> Finally, you could consider my "remember" branch:
> https://github.com/jnothman/scikit-learn/tree/remember. It provides
> sklearn.memo.remember_model, which can wrap your base estimator, and will
> save a joblib dump of each model (in the directory specified by the memory
> parameter). However, to recover these models, the easiest way is to call
> fit() again on the remembered model, with the right portion of training
> data (and parameters if using grid search). [I am sorry this requires a
> patch/branch rather than a gist, but this functionality necessitates a
> polymorphic implementation of sklearn.base.clone.]
> 
> Cheers,
> 
> - Joel
> 
> 
> On 1 April 2014 06:23, Michal Romaniuk 
> <[email protected]>wrote:
> 
>> > Hi,
>> >
>> > I am working on a problem where, in addition to the cross-validation
>> > scores, I would like to be able to also record the full classifiers for
>> > further analysis (visualisation etc.) Is there a way to do this?
>> >
>> > I tried to build a custom scoring function that returns a tuple of
>> > different metrics (including the classifier itself) but it didn't work
>> > as the scoring function seems to be required to return a number.
>> >
>> > Thanks,
>> > Michal
>> >


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to