btw -- few hints. if you have some assumptions (e.g. indeed you have independent samples in testing etc) about chance distribution, then you could revert to parametric testing... e.g., if I think that it should be close to binomial distribution, then with sufficient number of trials (like in your case) it is relatively well approximated with normal. Then, instead of using default Nonparametric distribution estimator in MCNullDist you can use smth like
null_dist = MCNullDist(scipy.stats.norm, permutations=100, tail='left') That would fit normal distribution to the data from 100 permutations and assess p-value from it. NB Normal approximates binomial quite well for a reasonable number of trials. Above function though doesn't do http://en.wikipedia.org/wiki/Continuity_correction yet, but that is negligible under reasonable sample size More over, lets say I know that by chance mean performance should be 0.5, then I can help it out by fixating it at that mean (unfortunately for that you would need to use maint/0.4 or yoh/0.4 or yoh/master with the fix I've submitted yesterday): null_dist = MCNullDist(rv_semifrozen(scipy.stats.norm, loc=0.5), permutations=100, tail='left') Advantage of those parametric tests is exactly for your case -- very low p-values, where you simply don't have enough power from doing non-parametric testing (e.g. to get any p-value as low as 10^(-x) you would need to do 10^x permutations), e.g. in your case you simply can't get precision higher than 0.001 since you are doing 1000 permutations. On the other hand, parametric testing approximates non-parametric results even when tested value (error) lies in the heavy part of the distribution. I hope this is of some value ;) On Wed, 27 Jan 2010, Yaroslav Halchenko wrote: > could you also enable storing all estimates from MC... i.e. > cv = CrossValidatedTransferError( > TransferError(clf), > splitter, > null_dist=MCNullDist(permutations=no_permutations, > tail='left', > enable_states=['dist_samples']), > enable_states=['confusion']) > weird enough -- either I do not thing straight or smth is strange -- chance > distribution after permutation on our testdata is indeed quite biased into > high > values (which are errors), although I would expect it to mean at chance > (i.e. 0.5 since I did binary classification). -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

