Dear Arteaga, Thank you for spotting the bug in the nested_cv.py example. I have fixed it the way you suggested, and to disambiguate if even further I have used dataset_ argument name within select_best_clf to avoid any possible confusion ;) I also tuned it up a bit to make example more lightweight and faster.
Fix was pushed into maint/0.5 and master (main development toward 0.6) branches of our main repository: http://github.com/PyMVPA/PyMVPA Cheers, Yarik On Mon, 18 Oct 2010, Arteaga, Dan (NIH/NINDS) [F] wrote: > Hi Yaroslav, > In the function: def select_best_clf(dataset, clfs), it is my > understanding that only the training splits of each CV fold are used > to calculate the average NCV transfer error for each classifier across > the CNV folds via: > try: > error = np.mean(cv(dstrain)) > However, to get the cheating classifier results, the entire dataset is > sent to the select_best_clf function via the line: > cheating_clf, cheating_error = select_best_clf(dataset, > clfswh['!gnpp']) > Yet, it still uses the error = np.mean(cv(dstrain)) to calculate the > cheating CV transfer errors as well. Replacing it with error = > np.mean(cv(dataset)) gives different results for the cheating errors, > but not for the original NCV errors. > In addition, the select_best_clf only introduces the dataset variable, > whereas dstrain is only used in the for loop? > If I am misunderstanding a fundamental programming concept I apologize > and you can ignore this if you want. And I am extremely appreciative > of this new example. > Best, > Dan -- Yaroslav O. Halchenko Postdoctoral Fellow, Department of Psychological and Brain Sciences Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

