Re: [pymvpa] Train and test on different classes from a dataset

J.A. Etzel Tue, 05 Feb 2013 06:27:30 -0800

If you are doing some sort of cross-modal or cross-day analysis (e.g.training on data collected under one set of conditions and testing ondata collected under another set of conditions) I agree that onlypermuting the training data can be quite sensible.

But when we only have a single dataset (such as do a and b vary in thisset of trials, partitioning on the runs) I don't fully agree with thischaracterization:


> For all these classifiers trained on permuted data we want to know how
> well they can discriminate our empirical data (aka testing data) -- more
> precisely the pristine testing data. Because, in general, we do _not_
> want to know how well a classifier trained on no signal can discriminate
> any other dataset with no signal (aka permuted testing data).

Perhaps the difference is that I tend to think of each set ofcross-validations as the unit that should be permuted. For example,suppose I have four runs and I'm partitioning on the runs. Thetrue-labeled accuracy is thus computed by averaging the four accuracies(test on the first run, test on the second run, ...).

My instinct is that the permutations should follow that pattern: ONEpermutation mean should come from permuting the labels, doing thecross-validation, and averaging over the folds (the four runs, in thiscase). In other words, to keep the linking between the cross-validationfolds in the permutation test (e.g. training on runs 1-3 will likely besomewhat similar to training on runs 1,2,4) we need to permute the*entire* dataset at once (all four runs), not just the training (threeruns at a time).

I hope this argument is somewhat clear; I'd like to make a picture andexample but have to get something else done first. Unfortunately Ihaven't been able to dig into the code you've sent yet either; hopefullytomorrow.


Jo


On 2/5/2013 7:33 AM, Francisco Pereira wrote:

I'm catching up with this long thread and all I can say is I fully
concur with Michael, in particular:

On Tue, Feb 5, 2013 at 3:11 AM, Michael Hanke <[email protected]> wrote:


Why are we doing permutation analysis? Because we want to know how
likely it is to observe a specific prediction performance on a
particular dataset under the H0 hypothesis, i.e. how good can a
classifier get at predicting our empirical data when the training
did not contain the signal of interest -- aka chance performance.


Permuting the test set might make sense, perhaps, if you wanted to
make a statement about the result variability over all possible test
sets of that size if H0 was true.

Francisco

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


--
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Train and test on different classes from a dataset

Reply via email to