2013/8/20 Olivier Grisel <[email protected]>: > Wouldn't it be possible to implement a StratifiedKFolds that preserves > the dependency relationship as much as possible?
Here is a notebook to illustrate the issue: http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Non%2520IID%2520cross-validation.ipynb The sample dependency structure can be responsible to 7% test score discrepancy on non optimal models on the digits dataset. I think that the fact that StratifiedKFold is the default CV-scheme for classification and that it hides dependent samples related issues is a bug. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
