Hi Joel, The last time I encountered this problem, I wrote a custom cross-validation as you suggest. But I'm solving the same problem again now, so I started thinking that a general solution would perhaps be beneficial.
IIRC, there was a discussion here about scoring a while ago but I didn't follow it. Is there a reason why scorer must return a single number? My first idea on how to approach this was to allow returning a tuple and only require the first item in the tuple to be a number. This number would then be used for comparing models and the rest. Alternatively, I thought about only requiring the return type of score to implement comparison operators (for deciding which model is better). So a numeric type would work well for most use cases and if something more sophisticated was required, that would also be possible. Either way, complexity increases. But on the other hand, I think that looking at classifiers in more detail is a common use case. For example, one might be interested in how stable some model parameter is across the whole cross-validation. The solution using Joblib that you suggested also crossed my mind. Perhaps it's the most practical one. Do others have any comments on this? Cheers, Michal On 01/04/14 00:45, [email protected] wrote: > Hi Michal, > > One way is to roll your own cross validation routine; it's not very > complicated when specialised to a particular task. > > I have also previously proposed that cross_val_score and > Randomized/GridSearchCV provide an arbitrary callback parameter that could > return the model or other diagnostic information. The right interface for > this sort of thing is uncertain. > > Finally, you could consider my "remember" branch: > https://github.com/jnothman/scikit-learn/tree/remember. It provides > sklearn.memo.remember_model, which can wrap your base estimator, and will > save a joblib dump of each model (in the directory specified by the memory > parameter). However, to recover these models, the easiest way is to call > fit() again on the remembered model, with the right portion of training > data (and parameters if using grid search). [I am sorry this requires a > patch/branch rather than a gist, but this functionality necessitates a > polymorphic implementation of sklearn.base.clone.] > > Cheers, > > - Joel > > > On 1 April 2014 06:23, Michal Romaniuk > <[email protected]>wrote: > >> > Hi, >> > >> > I am working on a problem where, in addition to the cross-validation >> > scores, I would like to be able to also record the full classifiers for >> > further analysis (visualisation etc.) Is there a way to do this? >> > >> > I tried to build a custom scoring function that returns a tuple of >> > different metrics (including the classifier itself) but it didn't work >> > as the scoring function seems to be required to return a number. >> > >> > Thanks, >> > Michal >> > ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
