Dear Jo,
Thank you very much for your input! I'll play around with different
chunk combinations and see whether I can find a solution that is better
balanced.
Jan
On 07.08.2013 14:00 , [email protected]
wrote:
It doesn't look like anyone's replied to this yet, so here's my two cents.
I think of this sort of situation as a case of imbalance - there aren't
equal numbers of examples of each class in each training/testing set
(aka chunk). This happens in all sorts of situations, such as when which
trials are included depends upon participant behavior (e.g.
correctly-performed trials).
There isn't a universally appropriate strategy to regain balance, but
either the chunks or the examples will need to be changed.
For example, in one dataset we wanted to do leave-one-run-out
cross-validation, but the imbalance was too great (e.g. some runs with
very few examples), so we combined runs, for leave-three-runs-out
cross-validation. We combined temporally adjacent runs (e.g. 1-3, 4-6,
7-9) to make sure we didn't somehow inflate the accuracy. Depending on
the design, you could potentially partition on something other than the
runs to give more flexibility. If the imbalance is not too great (e.g.
10 of one class and 12 of the other), my usual practice is to subset the
larger class at random, repeating the whole thing a few times (leaving
out different examples).
By changing the examples I mean strategies like averaging across
examples within a run (or fitting parameter estimate images), so that
instead of classifying with individual trials you have a fixed number of
summary images (e.g. beta weights, averages) per person. In my
experience this can really help performance, even though the number of
samples is greatly reduced.
good luck,
Jo
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa