Hello,

I wonder if anyone could help me think through the issue of testing classifier results for significance and how it relates to cross- validation.

We are running a design with 8 chunks, 27 trials in each chunk divided into 3 classes. Let's say we do an eight way (leave one out) cross- validation. This results in an accuracy value for each set of 27 tests... 8 x 27 for a total of 216 trials that were predicted correctly or incorrectly.

Is it wrong to use a binomial test for significance on the total number of correct predictions out of 216? Or would that be inappropriate given that the 8 cross-validation steps are not really independent from each other and we must test each cross-validation step separately as a binomial with n=27? This latter option raises the issue of how to combine across the 8 tests.

Alternatively, if we use the Monte Carlo simulation to produce a null distribution we have the same issue -- we are generating this null distribution for each cross-validation step -- and therefore not taking into account the overall success of the cross-validation routine across all 216 trials. Would it make sense to generate a null distribution by scrambling the regressors and generating the results of an entire cross-validation procedure for scrambled regressors? If so, does pymvpa have a routine for doing this?

Thanks for any input or corrections of my thought,


Jonas



P.S. we are using pymvpa for several active projects with much pleasure and will happily send you posters/papers when our work is more complete.


----
Jonas Kaplan, Ph.D.
Research Assistant Professor
Brain & Creativity Institute
University of Southern California







_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to