On 5/19/2011 12:36 PM, Jonas Kaplan wrote:
This is an issue I have been thinking about quite a bit recently, as we
have used t-tests across subjects in the past (after checking for
violation of normalcy and also performing arsine transformation),
however I'm no longer convinced it's a great idea, and the main reason
for me comes down to interpretation. The interesting hypothetical case
to my mind is where a t-test is significant across subjects, but no
single subject has significant performance according to a within-subject
permutation test. How would we interpret such a result?

A related issue is, what does it mean to have prediction performance
that is consistently above chance in all subjects, but so small that
prediction is still practically speaking pretty bad? What conclusions
does that case allow us to draw about the underlying neural
representations? Yes, they contain more information about the stimuli
than pure noise would... but is that meaningful? The problem is I'm not
sure what an alternative criterion would be. The example quoted above
appeals to some sense of this... clearly we want the performance numbers
to be higher, but what objective standard do we have other than
statistical significance?

Just a bit of rambling...

-Jonas

Another version of this dilemma is: Should you consider two results equally important/"good" if they have the same (properly calculated) p-value but one accuracy is 0.8 and the other is 0.56?

I haven't heard a fully convincing answer. It seems clear that higher accuracies are "better" than lower, but what should the thresholds be when we're dealing with something as noisy as fMRI data? Very small differences are considered important in mass-univariate analyses; does that apply to MVPA as well?

Sometimes people (e.g. Rajeev Raizada) have been able to tie classification accuracy to behavior, but that's not possible with many experiments.

So this is a bit more rambling! :) My general strategy right now is to lean heavily on permutation test results, combined with looking at the variability and setting up control tests whenever possible (e.g. a classification or region that should definitely work or definitely not work). I tend to put more weight on results that are consistent across cross-validation folds and replications (e.g. if I drop trials, does the accuracy vary a lot?), as well as across subjects (e.g. is one really-high-accuracy subject pulling up the average?). This is all rather subjective, of course.

Relatedly, I'd probably not believe a significant t-test if the single-subject permutation tests were all non-significant. This suggests to me that the variance is so high within each person that the results shouldn't be trusted, even though the means come out a bit above chance. So much of the variance structure is lost in a t-test ...

Jo

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to