Re: [pymvpa] high prediction rate in a permutation test

J.A. Etzel Wed, 18 May 2011 12:29:05 -0700

The curves look reasonable to me; sometimes the tails of the permutationdistribution can be quite long.

The longer tails on the "averaged" analysis could be just from thesmaller number of data points. If possible (allowed by your fMRIdesign/experimental questions/etc.), using a different cross-validationscheme might help reduce variability. As a plug, I wrote about some ofthese partitioning considerations in "The impact of certainmethodological choices on multivariate analysis of fMRI data withsupport vector machines"http://dx.doi.org/10.1016/j.neuroimage.2010.08.050.

Randomizing the real data labels is often the best strategy, because youwant to make sure the permuted data sets have the same structure (asmuch as possible) as the real data. For example, if you're partitioningon the runs, you should permute the data labels within each run.Similarly, if you need to omit some examples for balance (i.e. becauseyou have more examples of one label than another) you want to permutethe labels after removing those examples (replicated for removingdifferent examples, of course).

Something to look at when trying to figure out the difference in youraveraged or not-averaged results might be the block structure. SincefMRI data always has time dependencies, acquisition order and effects(how much time was between events being classified) can have a biginfluence. You have to be very, very careful when classifying individualevents within a short block. (mentioning trials within a block caught myeye).


Jo


On 5/18/2011 1:57 PM, Vadim Axel wrote:

Hi,

Thank you both for the answers!

1. The mean chance is a perfect 0.5. The 0.6 is a tail.

2. I have 6 trials per block and 25 blocks for each condition in total.
So, in one scenario I average the trials within block and make
classification based on 25 data points per condition. There I get 0.6
permutated prediction in the tail. In other case I do not average and
run classification based on 25x6=150 data points per condition. There I
get ~0.55 permutated prediction in the tail. I attach the histograms for
mean and for non-mean permutation. For raw data definitely looks more
normal.

3. I did a manual permutation by reshuffling the labels. In particular,
I have a matrix of data values [trials X voxels] and a vector of correct
labels [correct labels x 1].  For each permutation test I randomize the
order of correct labels vector. Makes sense? As far as I understand the
Monte-Carlo simulation works for artificially generated data values. But
I am using my original data labels.

BTW, I did not use for this analysis PyMVPA, so you have no reason to
worry about potential bug in your code :)

Thanks again,
Vadim

On Mon, May 16, 2011 at 7:53 PM, Yaroslav Halchenko
<[email protected] <mailto:[email protected]>> wrote:

    d'oh -- just now recalled that I have this email in draft:

    eh, picture (histogram) would have been useful:

     > I wanted to ask your opinion about some weird result that I get.
     > To establish the significance I randomly permute my labels and I
    get a
     > prediction rate of 0.6 and even above it (p-value=0.05). In other
    words 5%
     > of of permuted samples result in 0.6+ prediction rate. The
    training/test
     > samples are independent and ROI size is small (no overfitting).

    just to make sure:  0.6 is not a mean-chance performance across the
    permutations.  You just worry that the distribution of chance
    performances is so wide that the right 5% tail is above 0.6 accuracy.

    if that is the case, it is indeed a good example case ;)

     > Interestingly, the described result I get when I average trials
    within block
     > (use one data-point per block; ~25 blocks in total). When I run the

    so it is 25 blocks for 2 conditions? which one has more? ;)

     > classification on raw trials, my permutation threshold becomes
    ~0.55. In
     > both cases for non-permuted labels the prediction is around
    significance
     > level.
     > How should I treat such a result? What might have gone wrong?

    I guess nothing went wrong and everything is logical.  With of
    random chance performances distribution is conditioned on many factors,
    such as independence of samples, presence of order effects, how
    permutation is done (disregarding dependence  of samples or not) etc.

    So, to troubleshoot we could start with:

    * histogram
    * what kind of permutation testing have you done? (i.e. what was
      permutted exactly? was testing set labels permutted?)
      have you seen recently improved
    http://www.pymvpa.org/examples/permutation_test.html
      ? ;)


_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] high prediction rate in a permutation test

Reply via email to