2013/8/20 Olivier Grisel <[email protected]>:
> Wouldn't it be possible to implement a StratifiedKFolds that preserves
> the dependency relationship as much as possible?

Here is a notebook to illustrate the issue:

http://nbviewer.ipython.org/urls/raw.github.com/ogrisel/notebooks/master/Non%2520IID%2520cross-validation.ipynb

The sample dependency structure can be responsible to 7% test score
discrepancy on non optimal models on the digits dataset. I think that
the fact that StratifiedKFold is the default CV-scheme for
classification and that it hides dependent samples related issues is a
bug.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to