Hi, I have a couple of questions about the nperlabel parameter of the Splitter class (NFoldSplitter, actually). I have unequal numbers of each class within each scan, and also across scans, so I have been manually balancing the number of exemplars used from each class in each chunk by throwing out random trials from the over-represented class before classification. I'd like to take advantage of the nperlabel='equal' option on my splitter to do this for me, but I have a couple of questions about how this affects the error rate, which I could not figure out from the documentation (sorry if I missed something obvious):
- Suppose I am using NFoldSplitter to leave one chunk out, and I have 11 examples of C1 and 13 examples of C2 in chunk 1, but only 8 C1 and 10 C2 for chunk 2. Will the NFoldSplitter with nperlabel='equal' force the number of examples of each category from each chunk down to 8? Or will it use 11 of each class for chunk 1, and 8 of each class for chunk 2? - If it is the latter (balanced separately within chunks), how is the error rate determined with the CrossValidatedTransferError class? Does the error rate reflect the simple average error across folds (error run 1 + error run 2)/2, or is the average weighted by the number of exemplars from each fold (equivalent to the total error / total number of tests)? If it is averaging fold performance, is there a way to force it to report the overall test case performance, instead? The simple average over fold performance would seem to be skewed by better or worse performance on chunk 2 in the example above, since it has fewer test cases. Thanks for your help! -Tim
_______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

