Re: [pymvpa] Permutation testing and Nipype

2015-08-12 Thread Bill Broderick
Hi,

Thanks for the response!

 please see my response to Roni few minutes ago, so just collect up to 50
 permutations per subject and then use GroupClusterThreshold to do
 bootstrapping across subjects' permutation results.

I've read over that thread and I like the idea, but I've got a couple
quick questions.

One, we're doing leave-one-subject-out cross-validation, combining the
four runs each subject has, instead of leave-one-run-out (due to
balance issues). Would this change anything in your recommendations?
I.e., can we still use GroupClusterThreshold the way you recommended
for Roni?

Two, we're doing regression in addition to classification (using
EpsilonSVR); that shouldn't change anything either, right?

Finally, you recommend permuting all labels, not just the training set
ones. It's unclear to me why that works. Don't you need to train with
permuted and test with actual labels to get a null distribution. Or is
it okay because Roni's data has one beta per run (whereas we have one
value per trial that we're regressing to, so it's not)?

Thanks,
Bill

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


Re: [pymvpa] Permutation testing and Nipype

2015-08-11 Thread Bill Broderick
On Mon, Aug 10, 2015 at 5:33 PM, Yaroslav Halchenko
deb...@onerussian.com wrote:
 it would help to know what/at what level you are permutting etc,
 and what is that timing issue (does nipype kills tasks if they run too
 long, unlikely)?

I'm running my analysis with leave-one-subject-out cross-validation
(so combining all runs for each subject), permuting the labels in the
training set in two categories 100 times. I originally was running the
whole brain in one job, but found that took too long (didn't get
killed by nipype or our SGE cluster, but it was taking too long to be
feasible), so I'm using sphere_searchlight's center_ids option to
split permutation testing into a a bunch of smaller jobs, each with
about 5 searchlights. Here's what my function looks like:

clf = LinearCSVMC()
repeater = Repeater(count=100)
permutator = AttributePermutator('targets',limit={'partitions':1},count=1)
nf = NFoldPartitioner(attr='subject')
null_cv = 
CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()),errorfx=mean_mismatch_error)
distr_est =
MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
cv = 
CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)],errorfx=mean_mismatch_error)
sl = 
sphere_searchlight(cv,radius=3,center_ids=range(sl_range[0],sl_range[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
sl_res = sl(ds)
null_dist = cv.null_dist.ca.dist_samples

where sl_range is a tuple, passed to the function, defining which
searchlights to run. In my current set up, the above function is a
Nipype MapNode, iterating on sl_range, such that when it reaches this
function it creates many versions of this job (currently I'm working
with about 5000), each running permutation testing on different
searchlights. These are all submitted in parallel to the SGE cluster,
which allows users to submit as many jobs as they want but limits them
to running jobs at 200-some nodes at a time.

When I split this into about 5000 jobs, I ran into an issue with
Nipype where each of these jobs would finish running (in about 1.5
hours) but the Nipype master job that spawned them would take a very
long time to realize they were done (as in, it would find one an
hour), so it never finished and moved on. If I split this into fewer
jobs, it doesn't run into this issue, but each job takes a lot longer.
So either I could figure out what's going on with Nipype or could just
not take as long for permutations.

Is that clear? Has anyone run into similar issues or found a way to
run the permutation faster?

Thanks,
Bill

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


Re: [pymvpa] Permutation testing and Nipype

2015-08-11 Thread Bill Broderick
Okay, so I did a little more investigating of this and I cannot
replicate my original problem. Now it's looking like it's taking a
long time just because the permutation testing is taking a long time.

At the bottom of this message is the script I used for testing the
timing. Using python 2.7.6 and PyMVPA version 2.4.0, I time the script
as follows:

 python2.7 -O -m timeit -n 1 -r 1 'import test' 'test.main()'

The dataset I'm loading in has 3504 trials that we're using and 29462 voxels.

I get the following times:
 perm_num=1, ids=(0,1)  : 161sec
 perm_num=1, ids=(0,2)  : 316sec
 perm_num=1, ids=(0,3)  : 531sec
 perm_num=1, ids=(0,4)  : 687sec
 perm_num=5, ids=(0,1)  : 435sec

Which makes me realize that there's no way I can get 100 permutations
and 5 searchlights (which is about what I was looking at earlier) in
1.5 hours. I don't know what changed -- going back through my commits
I haven't changed any of the relevant code since then; it's possible I
made a mistake and accidentally did 10 permutations or something like
that.

Regardless, this is still taking way too long. Does anyone have any
idea how to speed it up? It looks like it's a good idea to have jobs
run a bunch of permutations in one function, but split up the
searchlights, which is what I'm doing at the moment, but I still need
to do something else to speed it up.

Thanks,
Bill


test.py script:

def main(perm_num=5,ids=(0,1)):
from mvpa2.suite import
h5load,LinearCSVMC,Repeater,AttributePermutator,NFoldPartitioner,CrossValidation,ChainNode,MCNullDist,sphere_searchlight

ds=h5load('dataset.hdf5')
clf=LinearCSVMC()
repeater=Repeater(count=perm_num)
permutator = AttributePermutator('targets',limit={'partitions':1},count=1)
nf = 
NFoldPartitioner(attr='subject',cvtype=1,count=None,selection_strategy='random')
null_cv = 
CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()))
distr_est =
MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
cv = 
CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)])
print 'running...'
sl = 
sphere_searchlight(cv,radius=3,center_ids=range(ids[0],ids[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
res=sl(ds)

On Tue, Aug 11, 2015 at 12:00 PM, Bill Broderick billb...@gmail.com wrote:
 On Mon, Aug 10, 2015 at 5:33 PM, Yaroslav Halchenko
 deb...@onerussian.com wrote:
 it would help to know what/at what level you are permutting etc,
 and what is that timing issue (does nipype kills tasks if they run too
 long, unlikely)?

 I'm running my analysis with leave-one-subject-out cross-validation
 (so combining all runs for each subject), permuting the labels in the
 training set in two categories 100 times. I originally was running the
 whole brain in one job, but found that took too long (didn't get
 killed by nipype or our SGE cluster, but it was taking too long to be
 feasible), so I'm using sphere_searchlight's center_ids option to
 split permutation testing into a a bunch of smaller jobs, each with
 about 5 searchlights. Here's what my function looks like:

 clf = LinearCSVMC()
 repeater = Repeater(count=100)
 permutator = AttributePermutator('targets',limit={'partitions':1},count=1)
 nf = NFoldPartitioner(attr='subject')
 null_cv = 
 CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()),errorfx=mean_mismatch_error)
 distr_est =
 MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
 cv = 
 CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)],errorfx=mean_mismatch_error)
 sl = 
 sphere_searchlight(cv,radius=3,center_ids=range(sl_range[0],sl_range[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
 sl_res = sl(ds)
 null_dist = cv.null_dist.ca.dist_samples

 where sl_range is a tuple, passed to the function, defining which
 searchlights to run. In my current set up, the above function is a
 Nipype MapNode, iterating on sl_range, such that when it reaches this
 function it creates many versions of this job (currently I'm working
 with about 5000), each running permutation testing on different
 searchlights. These are all submitted in parallel to the SGE cluster,
 which allows users to submit as many jobs as they want but limits them
 to running jobs at 200-some nodes at a time.

 When I split this into about 5000 jobs, I ran into an issue with
 Nipype where each of these jobs would finish running (in about 1.5
 hours) but the Nipype master job that spawned them would take a very
 long time to realize they were done (as in, it would find one an
 hour), so it never finished and moved on. If I split this into fewer
 jobs, it doesn't run into this issue, but each job takes a lot longer.
 So either I could figure out what's going on with Nipype or could just
 not take as long for permutations.

 Is that clear? Has anyone run into similar issues or found a way to
 run the 

Re: [pymvpa] Permutation testing and Nipype

2015-08-10 Thread Yaroslav Halchenko

On Mon, 10 Aug 2015, Bill Broderick wrote:
I was wondering if anyone on this list has used PyMVPA with Nipype for
permutation testing. I'm attempting to do so now, but am running into
timing issues (which I'm asking the Nipype folks about here).

Has anyone had any luck getting results in a reasonable time combining the
two? If so, how?

it would help to know what/at what level you are permutting etc,
and what is that timing issue (does nipype kills tasks if they run too
long, unlikely)?

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa