[pymvpa] unbalanced datasets

Edmund Chong Tue, 07 Aug 2012 08:53:03 -0700

Hi all,

I recently asked a question on dealing with unbalanced datasets and here's
a follow-up question.
So let's say I have empty runs, or runs where there are zero samples for
one of the conditions. This leads to problems if that run happens to be the
test run on a leave-one-run-out cross-validation procedure.


My workaround for that was this: if I had one of such runs with empty
conditions, then I would set NFoldPartitioner(cvtype=2), together with
Balancer() so that any combination of two runs would have at least one
sample per condition. But if I had two of such runs with empty conditions,
then I would set cvtype=3, and so on. However this means I have less data
for the training set on each classification fold.

Is there any other possible solution for this? In fact, is it possible to
do leave-n-samples-out classification: So on each fold I randomly select n
samples per condition to test on, and use the remaining samples (after
balancing) for training, disregarding the chunks structure?

Thanks!
-Edmund

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

[pymvpa] unbalanced datasets

Reply via email to