The curves look reasonable to me; sometimes the tails of the permutation distribution can be quite long.

The longer tails on the "averaged" analysis could be just from the smaller number of data points. If possible (allowed by your fMRI design/experimental questions/etc.), using a different cross-validation scheme might help reduce variability. As a plug, I wrote about some of these partitioning considerations in "The impact of certain methodological choices on multivariate analysis of fMRI data with support vector machines" http://dx.doi.org/10.1016/j.neuroimage.2010.08.050.

Randomizing the real data labels is often the best strategy, because you want to make sure the permuted data sets have the same structure (as much as possible) as the real data. For example, if you're partitioning on the runs, you should permute the data labels within each run. Similarly, if you need to omit some examples for balance (i.e. because you have more examples of one label than another) you want to permute the labels after removing those examples (replicated for removing different examples, of course).

Something to look at when trying to figure out the difference in your averaged or not-averaged results might be the block structure. Since fMRI data always has time dependencies, acquisition order and effects (how much time was between events being classified) can have a big influence. You have to be very, very careful when classifying individual events within a short block. (mentioning trials within a block caught my eye).

Jo


On 5/18/2011 1:57 PM, Vadim Axel wrote:
Hi,

Thank you both for the answers!


1. The mean chance is a perfect 0.5. The 0.6 is a tail.

2. I have 6 trials per block and 25 blocks for each condition in total.
So, in one scenario I average the trials within block and make
classification based on 25 data points per condition. There I get 0.6
permutated prediction in the tail. In other case I do not average and
run classification based on 25x6=150 data points per condition. There I
get ~0.55 permutated prediction in the tail. I attach the histograms for
mean and for non-mean permutation. For raw data definitely looks more
normal.

3. I did a manual permutation by reshuffling the labels. In particular,
I have a matrix of data values [trials X voxels] and a vector of correct
labels [correct labels x 1].  For each permutation test I randomize the
order of correct labels vector. Makes sense? As far as I understand the
Monte-Carlo simulation works for artificially generated data values. But
I am using my original data labels.

BTW, I did not use for this analysis PyMVPA, so you have no reason to
worry about potential bug in your code :)


Thanks again,
Vadim



On Mon, May 16, 2011 at 7:53 PM, Yaroslav Halchenko
<[email protected] <mailto:[email protected]>> wrote:

    d'oh -- just now recalled that I have this email in draft:

    eh, picture (histogram) would have been useful:

     > I wanted to ask your opinion about some weird result that I get.
     > To establish the significance I randomly permute my labels and I
    get a
     > prediction rate of 0.6 and even above it (p-value=0.05). In other
    words 5%
     > of of permuted samples result in 0.6+ prediction rate. The
    training/test
     > samples are independent and ROI size is small (no overfitting).

    just to make sure:  0.6 is not a mean-chance performance across the
    permutations.  You just worry that the distribution of chance
    performances is so wide that the right 5% tail is above 0.6 accuracy.

    if that is the case, it is indeed a good example case ;)

     > Interestingly, the described result I get when I average trials
    within block
     > (use one data-point per block; ~25 blocks in total). When I run the

    so it is 25 blocks for 2 conditions? which one has more? ;)

     > classification on raw trials, my permutation threshold becomes
    ~0.55. In
     > both cases for non-permuted labels the prediction is around
    significance
     > level.
     > How should I treat such a result? What might have gone wrong?

    I guess nothing went wrong and everything is logical.  With of
    random chance performances distribution is conditioned on many factors,
    such as independence of samples, presence of order effects, how
    permutation is done (disregarding dependence  of samples or not) etc.

    So, to troubleshoot we could start with:

    * histogram
    * what kind of permutation testing have you done? (i.e. what was
      permutted exactly? was testing set labels permutted?)
      have you seen recently improved
    http://www.pymvpa.org/examples/permutation_test.html
      ? ;)

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to