Re: [pymvpa] Permutation testing and Nipype
Hi, Thanks for the response! please see my response to Roni few minutes ago, so just collect up to 50 permutations per subject and then use GroupClusterThreshold to do bootstrapping across subjects' permutation results. I've read over that thread and I like the idea, but I've got a couple quick questions. One, we're doing leave-one-subject-out cross-validation, combining the four runs each subject has, instead of leave-one-run-out (due to balance issues). Would this change anything in your recommendations? I.e., can we still use GroupClusterThreshold the way you recommended for Roni? Two, we're doing regression in addition to classification (using EpsilonSVR); that shouldn't change anything either, right? Finally, you recommend permuting all labels, not just the training set ones. It's unclear to me why that works. Don't you need to train with permuted and test with actual labels to get a null distribution. Or is it okay because Roni's data has one beta per run (whereas we have one value per trial that we're regressing to, so it's not)? Thanks, Bill ___ Pkg-ExpPsy-PyMVPA mailing list Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
Re: [pymvpa] Permutation testing and Nipype
On Mon, Aug 10, 2015 at 5:33 PM, Yaroslav Halchenko deb...@onerussian.com wrote: it would help to know what/at what level you are permutting etc, and what is that timing issue (does nipype kills tasks if they run too long, unlikely)? I'm running my analysis with leave-one-subject-out cross-validation (so combining all runs for each subject), permuting the labels in the training set in two categories 100 times. I originally was running the whole brain in one job, but found that took too long (didn't get killed by nipype or our SGE cluster, but it was taking too long to be feasible), so I'm using sphere_searchlight's center_ids option to split permutation testing into a a bunch of smaller jobs, each with about 5 searchlights. Here's what my function looks like: clf = LinearCSVMC() repeater = Repeater(count=100) permutator = AttributePermutator('targets',limit={'partitions':1},count=1) nf = NFoldPartitioner(attr='subject') null_cv = CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()),errorfx=mean_mismatch_error) distr_est = MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples']) cv = CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)],errorfx=mean_mismatch_error) sl = sphere_searchlight(cv,radius=3,center_ids=range(sl_range[0],sl_range[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')]) sl_res = sl(ds) null_dist = cv.null_dist.ca.dist_samples where sl_range is a tuple, passed to the function, defining which searchlights to run. In my current set up, the above function is a Nipype MapNode, iterating on sl_range, such that when it reaches this function it creates many versions of this job (currently I'm working with about 5000), each running permutation testing on different searchlights. These are all submitted in parallel to the SGE cluster, which allows users to submit as many jobs as they want but limits them to running jobs at 200-some nodes at a time. When I split this into about 5000 jobs, I ran into an issue with Nipype where each of these jobs would finish running (in about 1.5 hours) but the Nipype master job that spawned them would take a very long time to realize they were done (as in, it would find one an hour), so it never finished and moved on. If I split this into fewer jobs, it doesn't run into this issue, but each job takes a lot longer. So either I could figure out what's going on with Nipype or could just not take as long for permutations. Is that clear? Has anyone run into similar issues or found a way to run the permutation faster? Thanks, Bill ___ Pkg-ExpPsy-PyMVPA mailing list Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
Re: [pymvpa] Permutation testing and Nipype
Okay, so I did a little more investigating of this and I cannot replicate my original problem. Now it's looking like it's taking a long time just because the permutation testing is taking a long time. At the bottom of this message is the script I used for testing the timing. Using python 2.7.6 and PyMVPA version 2.4.0, I time the script as follows: python2.7 -O -m timeit -n 1 -r 1 'import test' 'test.main()' The dataset I'm loading in has 3504 trials that we're using and 29462 voxels. I get the following times: perm_num=1, ids=(0,1) : 161sec perm_num=1, ids=(0,2) : 316sec perm_num=1, ids=(0,3) : 531sec perm_num=1, ids=(0,4) : 687sec perm_num=5, ids=(0,1) : 435sec Which makes me realize that there's no way I can get 100 permutations and 5 searchlights (which is about what I was looking at earlier) in 1.5 hours. I don't know what changed -- going back through my commits I haven't changed any of the relevant code since then; it's possible I made a mistake and accidentally did 10 permutations or something like that. Regardless, this is still taking way too long. Does anyone have any idea how to speed it up? It looks like it's a good idea to have jobs run a bunch of permutations in one function, but split up the searchlights, which is what I'm doing at the moment, but I still need to do something else to speed it up. Thanks, Bill test.py script: def main(perm_num=5,ids=(0,1)): from mvpa2.suite import h5load,LinearCSVMC,Repeater,AttributePermutator,NFoldPartitioner,CrossValidation,ChainNode,MCNullDist,sphere_searchlight ds=h5load('dataset.hdf5') clf=LinearCSVMC() repeater=Repeater(count=perm_num) permutator = AttributePermutator('targets',limit={'partitions':1},count=1) nf = NFoldPartitioner(attr='subject',cvtype=1,count=None,selection_strategy='random') null_cv = CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space())) distr_est = MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples']) cv = CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)]) print 'running...' sl = sphere_searchlight(cv,radius=3,center_ids=range(ids[0],ids[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')]) res=sl(ds) On Tue, Aug 11, 2015 at 12:00 PM, Bill Broderick billb...@gmail.com wrote: On Mon, Aug 10, 2015 at 5:33 PM, Yaroslav Halchenko deb...@onerussian.com wrote: it would help to know what/at what level you are permutting etc, and what is that timing issue (does nipype kills tasks if they run too long, unlikely)? I'm running my analysis with leave-one-subject-out cross-validation (so combining all runs for each subject), permuting the labels in the training set in two categories 100 times. I originally was running the whole brain in one job, but found that took too long (didn't get killed by nipype or our SGE cluster, but it was taking too long to be feasible), so I'm using sphere_searchlight's center_ids option to split permutation testing into a a bunch of smaller jobs, each with about 5 searchlights. Here's what my function looks like: clf = LinearCSVMC() repeater = Repeater(count=100) permutator = AttributePermutator('targets',limit={'partitions':1},count=1) nf = NFoldPartitioner(attr='subject') null_cv = CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()),errorfx=mean_mismatch_error) distr_est = MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples']) cv = CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)],errorfx=mean_mismatch_error) sl = sphere_searchlight(cv,radius=3,center_ids=range(sl_range[0],sl_range[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')]) sl_res = sl(ds) null_dist = cv.null_dist.ca.dist_samples where sl_range is a tuple, passed to the function, defining which searchlights to run. In my current set up, the above function is a Nipype MapNode, iterating on sl_range, such that when it reaches this function it creates many versions of this job (currently I'm working with about 5000), each running permutation testing on different searchlights. These are all submitted in parallel to the SGE cluster, which allows users to submit as many jobs as they want but limits them to running jobs at 200-some nodes at a time. When I split this into about 5000 jobs, I ran into an issue with Nipype where each of these jobs would finish running (in about 1.5 hours) but the Nipype master job that spawned them would take a very long time to realize they were done (as in, it would find one an hour), so it never finished and moved on. If I split this into fewer jobs, it doesn't run into this issue, but each job takes a lot longer. So either I could figure out what's going on with Nipype or could just not take as long for permutations. Is that clear? Has anyone run into similar issues or found a way to run the
Re: [pymvpa] Permutation testing and Nipype
On Mon, 10 Aug 2015, Bill Broderick wrote: I was wondering if anyone on this list has used PyMVPA with Nipype for permutation testing. I'm attempting to do so now, but am running into timing issues (which I'm asking the Nipype folks about here). Has anyone had any luck getting results in a reasonable time combining the two? If so, how? it would help to know what/at what level you are permutting etc, and what is that timing issue (does nipype kills tasks if they run too long, unlikely)? -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Research Scientist,Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ Pkg-ExpPsy-PyMVPA mailing list Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa