Re: [pymvpa] correlation between 2 matrices

2016-10-28 Thread Christopher J Markiewicz
Sounds like you just want the correlation coefficient of the two
matrices, unraveled?

import numpy as np

def similarity(matA, matB):
coeffs = np.corrcoef(matA.reshape(-1), matB.reshape(-1))
return coeffs[0, 1]


Or did you want a coefficient for each time point?

Assuming rows are time points:

def similarity(matA, matB):
nrows = matA.shape[0]
coeffs = np.corrcoef(matA, matB)
topright = coeffs[:nrows,nrows:]
return topright.diagonal()


If you're starting from PyMVPA datasets, you can use:

similarity(dsA.samples, dsB.samples)


On 10/28/2016 03:44 PM, David Soto wrote:
> hi, 
> 
> I wonder whether PyMVPA can compute a similarity index (correlation)
> between two 2D matrices, each comprised of 12 seconds & 300 voxels?
> 
> thanks!
> david
> 
> 
> ___
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
> 


-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


[pymvpa] Never-ending searchlight

2015-01-29 Thread Christopher J Markiewicz
At commit: 573883e099a598c127fc5076123ebe3906f2be82

Running into a weird issue where under one (of 120) combination of
parameters, the searchlight hangs and never ends. On a 16 core machine,
16 cores will run for about 1.5-2 hours, and then one core will be at
100% until terminated (longest run was ~3 weeks, having forgotten about
it over break).

I post the searchlight sequence below for the sake of context, and in
the hope that this is where the problem is, but it should be noted that
the difference is entirely in which dataset is loaded. The dataset loads
without error and contains no NaNs. Foolishly, I haven't saved a copy of
the stack trace, but a SIGTERM will result in a numpy shape coercion
error. (Will rerun and send along the stack trace when I can.)

Given a fmri_dataset ds and a QueryEngine qe, we use the following
algorithm:

cn = ChainNode([NFoldPartitioner(),
Balancer(attr='targets', count=2, limit='partitions',
 apply_selection=True)], space='partitions')

svm = LinearCSVMC()
cvte = CrossValidation(svm, cn)
sl = Searchlight(cvte, qe, postproc=mean_sample())

errors = sl(ds)
nifti = map2nifti(dataset, 1 - errors.samples)
nifti.to_filename(fname)

-

Thanks in advance for any help.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Never-ending searchlight

2015-01-29 Thread Christopher J Markiewicz
On 01/29/2015 04:35 PM, Yaroslav Halchenko wrote:
> 
> 
> On Thu, 29 Jan 2015, Christopher J Markiewicz wrote:
> 
>> At commit: 573883e099a598c127fc5076123ebe3906f2be82
> 
> $> git describe 573883e099a598c127fc5076123ebe3906f2be82
> upstream/2.3.1
> 
> 
>> Running into a weird issue where under one (of 120) combination of
>> parameters, the searchlight hangs and never ends. On a 16 core machine,
>> 16 cores will run for about 1.5-2 hours, and then one core will be at
>> 100% until terminated (longest run was ~3 weeks, having forgotten about
>> it over break).
> 
> might well be that SVM got stuck somehow?
> 
> what if you set nproc=1 (no parallelization), and 
> 
> debug.active += ["SLC", "SLC_"]
> 
> so you see what is going on atm -- where would it get stuck?

Thanks for the tip. That's going to take some time to get to the point
where it's stuck (estimating another 6 hours).

Weirdly, though, one of my attempts to run this finally worked (probably
been ~20 attempts). I'll let the single thread run to see if it sticks
and maybe figure out why.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Never-ending searchlight

2015-01-30 Thread Christopher J Markiewicz
On 01/30/2015 10:45 AM, Yaroslav Halchenko wrote:
> 
> On Fri, 30 Jan 2015, Thomas Nickson wrote:
> 
>>Hello All,
> 
>>I have also had this problem I think. I gave up after about a month. I
>>though that it was just computationally prohibitive with 133 subjects but
>>is this not the case?
> 
>>I can run it and contribute debug data if you want as I'm still interested
>>in the analysis.
> 
> Thank you Thomas,
> 
> Let's first see if Christopher's run might provide sufficient
> information

Naturally, the single-threaded attempt also succeeded, so I have no
useful debugging information.

My guess would have to be that there's a pathological case lurking in
the balancing, where, for some subset of balanced training sets, on a
given searchlight (probably high voxel count), the SVM fails to
converge. Since we did two balanced partitions per attempt, what would
be required was for both partitions fail to fall in the pathological
case for that searchlight.

(This is not to say that the balancer is failing but that the balancer
is why it sometimes fails and sometimes succeeds.)

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] MacOSX Installation

2015-02-25 Thread Christopher J Markiewicz
On 02/25/2015 11:31 AM, Matthias Zunhammer wrote:
> Hi,
> 
> I succeeded to install pyMVPA on MacOSX, but after starting python and
> typing
> 
>>> import mvpa2
> 
> I get the following error:
> 
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named mvpa2

Likely it's not in your PYTHONPATH.

Find your installation:
sudo updatedb && locate mvpa2/__init__.py

Mine is: /usr/lib/python2.7/dist-packages/mvpa2/__init__.py, so I want
to make sure /usr/lib/python2.7/dist-packages is in my python path.

Examine your path:
python -c 'import sys; print sys.path'

If your directory isn't in there, you can add it with a command like this:

echo 'PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages' >> ~/.bashrc

The single quotes are important. Obviously use the directory you found
above, and if you're not using bash, you may need to append the line to
a different RC file than ~/.bashrc.

Hope this helps.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] group level test on decoding accuracy

2015-05-10 Thread Christopher J Markiewicz
On 05/09/2015 12:49 PM, Jingwen Jin wrote:
> 
> Hi MVPA experts,
> 
> I have a general question about conducting group-level analysis on the
> subjects' classification accuracy maps. Let's say I am doing a
> one-sample t-test to find the voxels that have high classification
> accuracy across subjects. Essentially, I am doing a t-test on percentage
> numbers (SVM classification accuracy measured as percentage correct).
> Since percentage is highly affected by the testing example numbers, and
> in general would probably not meet the normal distribution assumption
> for t-test. 
> 
> So my question is if people adjust for testing trial numbers or any sort
> of transformation? For example, I converted each voxel's percentage
> number to a z score at the individual subject's classification map
> level, and then do group-level t-test on these z score maps. I wonder if
> this is valid?

What you describe sounds like the strategy used by Lee et al (2012)
<https://dx.doi.org/10.1523/JNEUROSCI.3814-11.2012> so there's
precedent. On the other hand, it's not clear to me how to do cluster
thresholding for multiple comparisons correction properly, using this
method. They use SPM8's random effects analysis, but if I recall
correctly that requires smoothness assumptions, while MVPA analyses
typically use unsmoothed volumes.

There's also a much more intensive non-parametric test used by Stelzer
et al. (2012) <https://dx.doi.org/10.1016/j.neuroimage.2012.09.063>, but
it requires a lot of computing time and at least temporary storage space.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] searchlight

2015-05-15 Thread Christopher J Markiewicz
On 05/15/2015 09:25 AM, basile pinsard wrote:
> Hi MVPA experts,
> 
> I have a theoretical question that arised from recent analysis using
> searchlight (either surface or voxel based):
> 
> What is the most sensible feature selection strategy between:
> - a radius with variable number of features included, which will make
> the different classifiers trained on different amount of dimensions;
> - a fixed number of closest voxels/surface_nodes that would represent
> different surface/volume/spatial_extent depending on the localization.

From my reading, the more common is the former. This is probably
because, without evidence that your results particularly correlate with
searchlight size, the more interpretable figure is one in which each
voxel represents a statistic taken over a fixed spatial extent.

A fixed number of voxels (I agree with Jo that one should always use
voxels; even if you are using surface nodes to define a neighborhood,
these should be mapped back to voxels to avoid smoothing and resampling)
is beneficial if you are using an error metric that is sensitive to
dimensionality, such as mean squared error.

With a surface searchlight of radius 9mm, I get a distribution of
searchlight sizes (in one subject) that's approximately normal(66, 8). I
have not found that the cross-validation training error of classifiers
(linear SVM, mostly) is particularly sensitive to searchlight size. On
the other hand, attempting to use the same searchlights with regression
problems produces results that correlate strongly (positively or
negatively, depending on regression algorithm) with number of voxels.

> I had the examples with surfaces, for which I used a spherical templates
> (similar to 32k surfaces in HCP dataset) transformed into subject space.
> I computed the number of neighbors for each node with a fixed radius and
> noted a differential sampling resolution in the brain, which somewhat
> overlay with my network of interest (motor) and thus my concerns.

Do your preliminary results correlate with searchlight size across
several regions? That would be my primary indication that this is a concern.

> With voxel based searchlight, depending on masking voxels on the borders
> of the mask will have less neighbors in a fixed radius sphere.

Could you smear the mask with your searchlight, i.e. extend it in all
directions? You'll still be including (presumably) uninformative voxels,
but at least you won't be dimensionality itself that gets you.

> PyMVPA has only this strategy for now, but I read many papers with fixed
> amount of features in Searchlight.
> 
> What do you think?
> 
> I did an ugly modification to have a temporary fixed feature number
> (closest) on surface but it should be optimized:
> https://github.com/bpinsard/PyMVPA/commit/1af58ea8a57882ed57059491c19d83bed43e0bce

If I'm reading this right (I haven't dug into the PyMVPA surface
searchlight implementation), this is selecting a maximum number of
surface nodes, and then mapping that to voxels, and will still end up
with variable numbers of voxels, depending on the density of surface nodes.

What about sorting nodes based on distance, mapping to voxels, and then
taking the first max_features voxels?

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Preprocessing/Analyzing physiological data with python/pyMVPA

2015-05-29 Thread Christopher J Markiewicz
On 05/29/2015 04:09 PM, jdemanet wrote:
> Hello all,
> 
> I am looking for a elegant way to preprocess physiological data in 
> python/pyMVPA. Does anyone know a good way to preprocess and analyse, heart 
> rate, pupil size, skin conductance in python, possibly with pyMVPA?
> I really want to avoid using MATLAB for that.

What are you looking to do with them? And what formats are they coming
in? NumPy (http://www.numpy.org/) will get you pretty far with loading
and saving text files and doing linear algebra. With a little more work
you can handle binary files.

Just as a warning: if you're used to MATLAB, numpy can be a bit
surprising at times, particularly with regard to matrix operations.

Without knowing more, I can't help further.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Searchlights and Permutation testing

2015-06-16 Thread Christopher J Markiewicz
Hi Bill,

Hopefully what I write below won't be entirely wrong. I've done
permutation testing, but with PyMVPA at the bottom of the loop, not
itself managing null distributions and statistics.

On 06/16/2015 02:50 PM, Bill Broderick wrote:
> Hi all,
> 
> I'm trying to implement permutation testing with searchlights and, after
> going through the manual and the mailing list archives, I'm still not
> sure how to do it.
> 
> According to this thread
> http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2012q1/002071.html,
> the fastest way to get searchlight permutations is to use the GNB
> searchlight; otherwise it takes so long as to be impractical. However,
> when trying to set up the GNB searchlight with a null_dist, as shown in
> here
> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/tests/test_usecases.py#L168,
> I get a NotImplementedError: "GNBSearchlight does not yet support
> partitions altering the target (e.g. permutators)", as warned about on
> the documentation page for GNB searchlight and mentioned in this thread
> http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2012q4/002304.html.
> 
> However, if I can't use an attribute permutator with the GNB
> searchlight, how can I use it to run permutation tests and get a null
> probability? What am I missing?

I looked at GNB searchlight a while back, and I believe it works by
creating a distribution per-target, per-voxel, and then moving a
searchlight around these pre-calculated distributions. So it makes sense
that running a permutator inside the GNB won't work, since it negates
the pre-computation advantage.

An almost* equivalent problem is to permute the class labels and run a
GNB searchlight. (*There is the difference that you'll be sampling the
same subspace of permutations at each voxel, but that shouldn't make a
large difference if you perform enough permutations.)

So suppose you have something like this (I have not checked the docs to
make sure this is particularly sensible):

partitioner = ChainNode(NFoldPartitioner(), AttributePermutator())
sl = GNBSearchlight(GNB(), partitioner)
err = sl(dataset)

You might instead do something like:

gnb = GNBSearchlight(GNB(), NFoldPartitioner())
sl = ChainNode(gnb, AttributePermutator())
err = sl(dataset)

Again, not sure if this is really how the pieces fit together, but the
idea would be to permute the class labels and run an entirely new
GNBSearchlight on them. Assuming I did what I meant to do, err should
actually be your null distribution (a matrix of voxels-by-permutations
errors).

> Additionally, and this is a side note, is there any way to pass the null
> distribution from the searchlight's null_dist attribute to the results
> dataset? Or should I just give up on that, because trying to save the
> distribution for each searchlight would result in a huge file?

Assuming 50k voxels (post-masking) and a desired resolution of 0.001
from your nonparametric test, with 32-bit float representations, a null
distribution would only require 200MB. To save that as an uncompressed
nifti with dimensions 128x128x40 would require something closer to 5GB.
Large, but within the bounds of memory on normal systems and easily
within normal disk storage constraints.

If you want higher resolution (e.g. the Stelzer et al. test suggests
10^5 permutations), you'll need to multiply those figures by 100 and now
you're out of memory by a long shot and filling half a terabyte per
file. Our strategy for these kinds of problems is to create
memory-mapped arrays and sort a few rows at a time. This is outside the
scope of PyMVPA, but it's doable with numpy.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Searchlights and Permutation testing

2015-06-16 Thread Christopher J Markiewicz
On 06/16/2015 05:32 PM, Bill Broderick wrote:
> Hi Christopher,
> 
> I think your method works, if I do the following
> 
> res=[]
> gnb=GNB()
> nf=NFoldPartitioner()
> permutator=AttributePermutator('targets',number_of_permutations)
> sl_gnb=sphere_gnbsearchlight(gnb,nf,reuse_neighbors=True,radius=3)
> for i in permutator.generate(ds):
>  res.append(sl_gnb(ds))

I think this should be res.append(sl_gnb(i))?

> And then I combine all the resulting results in res. I couldn't make it
> work with a ChainNode containing a permutator and the sl_gnb; it would
> only run once, so I had to use the permutator explicitly as a generator.

Fair enough. I think there should be a way to chain together Measures,
even if ChainNode isn't it, but I don't know it off the top of my head.

> If I do things this way, there's no way to just permute the labels
> within the training data, right? Isn't it better to do that?

I'm not sure what you're asking. permutator.generate(ds) should be
permuting the labels of both training and testing data prior to exposing
them to sl_gnb.

> I also don't understand, if the gnbsearchlight is not seeing the
> permutator, how its implementation is saving time. It has to do that
> pre-calculation on each permuted dataset, right? Does it just take that
> much less time to run a gnbsearchlight through the brain on each dataset?

Yes, I believe that's correct. It is not saving time by preserving
computations across permutations, but within each searchlight analysis
it is faster than other types of searchlight.

> I had also gotten permutation testing working with a linear SVM, using
> the following:
> 
> repeater = Repeater(count=number_of_permutations)
> permutator =
> AttributePermutator('targets',limit={'partitions':1},count=1)
> nf = NFoldPartitioner()
> clf = LinearCSVMC()
> null_cv =
> CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()))
> distr_est =
> MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
> cv = CrossValidation(clf,nf,null_dist=distr_est)
> sl =
> sphere_searchlight(cv,radius=3,center_ids=range(0,10),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
> res = sl(ds)
> 
> Is there anything wrong with doing it this way? Other than it just
> taking a very long time.

That looks reasonable to me, but I defer to anybody who might gainsay
me. I did permutation testing by saving a new result file for every
permutation, so I have no experience with MCNullDist and the like.

> I would still need to make sure to pass on the null_prob from cv to sl
> to res, and I can't figure out how to pass the null distribution, if
> that's something I decide to do. I'm leaning towards not saving the null
> distribution -- it's good to know it's feasible but I'm not sure it's
> worth it. I didn't have any specific plans to use it, I just wanted to
> have it around in case it turned out to be useful.



-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Searchlights and Permutation testing

2015-06-17 Thread Christopher J Markiewicz
On 06/17/2015 10:00 AM, Bill Broderick wrote:
> On Tue, Jun 16, 2015 at 5:42 PM, Christopher J Markiewicz
>  wrote:
>>
>> On 06/16/2015 05:32 PM, Bill Broderick wrote:
>>> Hi Christopher,
>>>
>>> I think your method works, if I do the following
>>>
>>> res=[]
>>> gnb=GNB()
>>> nf=NFoldPartitioner()
>>> permutator=AttributePermutator('targets',number_of_permutations)
>>> sl_gnb=sphere_gnbsearchlight(gnb,nf,reuse_neighbors=True,radius=3)
>>> for i in permutator.generate(ds):
>>>  res.append(sl_gnb(ds))
>>
>> I think this should be res.append(sl_gnb(i))?
> 
> Yes, my bad. That was a typo. It's res.append(sl_gnb(i))
> 
>>> And then I combine all the resulting results in res. I couldn't make it
>>> work with a ChainNode containing a permutator and the sl_gnb; it would
>>> only run once, so I had to use the permutator explicitly as a generator.
>>
>> Fair enough. I think there should be a way to chain together Measures,
>> even if ChainNode isn't it, but I don't know it off the top of my head.
> 
> Poking around in the documentation, I've also found CombinedNode and
> ChainLearner, neither of which seem to work either. I'll stick with
> the explicit generation for now, but if anyone knows how to chain them
> together, I'd really appreciate it!
> 
>>> If I do things this way, there's no way to just permute the labels
>>> within the training data, right? Isn't it better to do that?
>>
>> I'm not sure what you're asking. permutator.generate(ds) should be
>> permuting the labels of both training and testing data prior to exposing
>> them to sl_gnb.
> 
> Yup, that's what it's doing, but isn't that bad? According to the
> permutation testing tutorial
> http://www.pymvpa.org/tutorial_significance.html#avoiding-the-trap-or-advanced-magic-101,
> don't I want to permute *just* the training labels and then test on
> unpermuted labels?

Hmm. Possibly. I'll need to read more, and potentially add notes in our
methods/discussion.

>>> I also don't understand, if the gnbsearchlight is not seeing the
>>> permutator, how its implementation is saving time. It has to do that
>>> pre-calculation on each permuted dataset, right? Does it just take that
>>> much less time to run a gnbsearchlight through the brain on each dataset?
>>
>> Yes, I believe that's correct. It is not saving time by preserving
>> computations across permutations, but within each searchlight analysis
>> it is faster than other types of searchlight.
>>
>>> I had also gotten permutation testing working with a linear SVM, using
>>> the following:
>>>
>>> repeater = Repeater(count=number_of_permutations)
>>> permutator =
>>> AttributePermutator('targets',limit={'partitions':1},count=1)
>>> nf = NFoldPartitioner()
>>> clf = LinearCSVMC()
>>> null_cv =
>>> CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()))
>>> distr_est =
>>> MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
>>> cv = CrossValidation(clf,nf,null_dist=distr_est)
>>> sl =
>>> sphere_searchlight(cv,radius=3,center_ids=range(0,10),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
>>> res = sl(ds)
>>>
>>> Is there anything wrong with doing it this way? Other than it just
>>> taking a very long time.
>>
>> That looks reasonable to me, but I defer to anybody who might gainsay
>> me. I did permutation testing by saving a new result file for every
>> permutation, so I have no experience with MCNullDist and the like.
> 
> Okay, excellent. I think I'll play around with this first to get a
> sense for how long it takes, since I'd prefer to use the Linear SVM.
> 
> Thanks,
> Bill
> 
> ___
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
> 


-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] Surface searchlight taking 6 to 8 hours

2015-07-23 Thread Christopher J Markiewicz
On 07/23/2015 06:45 AM, Nick Oosterhof wrote:
> 
>> On 22 Jul 2015, at 20:11, John Baublitz  wrote:
>>
>> I have been battling with a surface searchlight that has been taking 6 to 8 
>> hours for a small dataset. It outputs a usable analysis but the time it 
>> takes is concerning given that our lab is looking to use even higher 
>> resolution fMRI datasets in the future. I profiled the searchlight call and 
>> it looks like approximately 90% of those hours is spent mapping in the 
>> function from feature IDs to linear voxel IDs (the function 
>> feature_id2linear_voxel_ids).
> 
> From mvpa2.misc.surfing.queryengine, you are using the 
> SurfaceVoxelsQueryEngine, not the SurfaceVerticesQueryEngine? Only the former 
> should be using the feature_id2linear_voxel_ids function. 
> 
> (When instantiating a query engine through disc_surface_queryengine, the 
> Vertices variant is the default; the Voxels variant is used then 
> output_modality=‘volume’).
> 
> For the typical surface-based analysis, the output is a surface-based 
> dataset, and the SurfaceVerticesQueryEngine is used for that. When using the 
> SurfaceVoxelsQueryEngine, the output is a volumetric dataset.
> 
>> I looked into the source code and it appears that it is using the in keyword 
>> on a list which has to search through every element of the list for each 
>> iteration of the list comprehension and then calls that function for each 
>> feature. This might account for the slowdown. I'm wondering if there is a 
>> way to work around this or speed it up.
> 
> When using the SurfaceVoxelsQueryEngine, the euclidean distance between each 
> node (on the surface) and each voxel (in the volume) is computed. My guess is 
> that this is responsible for the slow-down. This could probably be made 
> faster by dividing the 3D space into blocks and assigning nodes and vertices 
> to each block, and then compute distances between nodes and voxels only 
> within each block and across neighbouring ones. (a somewhat similar approach 
> is taken in mvpa2.support.nibabel.Surface.map_to_high_resolution_surf). But 
> that would take some time to implement and test. How important is this 
> feature for you?  Is there a particular reason why you would want the output 
> to be a volumetric, not surface-based, dataset?  
> ___
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
> 

Nick,

To clarify, are you saying that using SurfaceVerticesQueryEngine runs
the classifiers (or other measure) on sets of vertices, not sets of
voxels? I'm not familiar enough with AFNI surfaces, but the ratio of
vertices to intersecting voxels in FreeSurfer is about 6:1. If a
searchlight is a set of vertices, how is the implicit resampling
accounted for?

Sorry if this is explained in documentation. I have my own
FreeSurfer-based implementation that I've been using that uses the
surface only to generate sets of voxels, so I haven't been keeping close
tabs on how PyMVPA's AFNI-based one works.

Also, if mapping vertices to voxel IDs is a serious bottleneck, you can
have a look at my query engine
(https://github.com/effigies/PyMVPA/blob/qnl_surf_searchlight/mvpa2/misc/neighborhood.py#L383).
It uses FreeSurfer vertex map volumes (see: mri_surf2vol --vtxvol),
where each voxel contains the ID of the vertex nearest its center. Maybe
AFNI has something similar?

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University



signature.asc
Description: OpenPGP digital signature
___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] BOLD delay

2016-02-11 Thread Christopher J Markiewicz
On 02/11/2016 08:28 AM, Salim Al-wasity wrote:
> Dears
> When we do the classification, do we need to take into our account the
> delay of the BOLD signal and shift our samples by 5 sec.?
> 
> Sincerely
> Salim Al-Wasity
> PhD student
> University of Glasgow

Generally speaking, you should model the the events of interest with
hemodynamic response functions and use the beta estimates as your
samples for classification. If you've done the HRF modeling, then
there's no need to do an additional shift.

If you've used a sparse-acquisition paradigm, and HRF modeling isn't
feasible (though I would consider the following before ruling it out:
https://www.ncbi.nlm.nih.gov/pubmed/23616742), then, yes: You want to
label your volumes based on the peak BOLD response.

Hope this helps.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


Re: [pymvpa] glm for MVPA

2016-02-15 Thread Christopher J Markiewicz
On 02/15/2016 10:20 AM, basile pinsard wrote:
> Hi pymvpa users and developers.
> 
> I have multiple questions regarding the use of GLM to model events for
> MVPA analysis which are not limited to PyMVPA.
> 
> First: in NiPy GLM mapper the measure extracted are beta weights from
> GLM fit
> <https://github.com/bpinsard/PyMVPA/blob/master/mvpa2/mappers/glm/nipy_glm.py#L50>,
> is that common for MVPA? StatsModelGLM mappers return t-p-z... values
> from model. Does using the t-statistic is more relevant?
> The following paper Decoding information in the human hippocampus
> <http://www.sciencedirect.com/science/article/pii/S0028393212002953>: A
> user's guide  says: "In summary, the pre-processing method of choice at
> present appears to be the use of the GLM to produce /t/-values as the
> input to MVPA analyses. However, it is important to note that this does
> not invalidate the use of other approaches such as raw BOLD or betas,
> rather the evidence suggests that these approaches may be sub-optimal,
> reducing the power of the analysis, making it more difficult to observe
> significant results."
> What do you think?

I don't remember reading the referenced article, but I've done a little
playing in this area. While I haven't done per-event t-scores, I have
done one t-statistic per-condition-per-run, and found very similar
spatial pattern of results to those from per-event betas, but with much
higher classification accuracy. For whatever that's worth.

With per-event betas (z-scored using the ZScoreMapper built on control
conditions), I get distributions of CV accuracies around chance, which
is fine for our analysis strategy. Personally, I don't quite know what
to make of ~50% minimum classification on a problem where chance is 33%.
That said, we use non-parametric significance testing, while there may
be parametric methods that work better on t-score-based MVPA.

Not sure if that's relevant to you, but that's what I think. :-)

> Second: I have developed another custom method to use
> Least-square-separate (LS-S) model that uses 1 model for each
> event/block/regressor of interest as shown to provide more stable
> estimates of the patterns and improved classification in Mumford et al
> 2012. However for each block I want to model 2 regressors, 1 for
> instruction and 1 for execution phases which are consecutive. So the
> procedure I use is 1 regressor for the block/phase that I want to model
> + 1 regressor for each phase, is that correct?

I've used a similar strategy, also inspired by the Mumford, et al paper.
I'm not sure if I'm understanding you correctly, so I'll just describe
my approach:

Each trial has an A event and a B event, and is described by a single
stimulus type. If I have 8 trial types, then I have 16 "conditions"
[(cond1, A), (cond1, B), ...]. For each condition, I perform one
least-squares estimate: one column for each condition of non-interest,
and the condition of interest expanded into one column per event.

It sounds like you might be doing something slightly different?

> I would be interested to include these in PyMVPA in the future, as the
> LS-A (stands for All) is not adequate in rapid event design with
> correlated regressors.

Since it turns out I'm not the only one who's needed to develop this,
this seems like a good idea to me. If you're interested in my input, you
can ping me on Github (@effigies) when you start to add it.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


Re: [pymvpa] Cannot build LIBSVM files on Mac OS

2016-02-18 Thread Christopher J Markiewicz
On 02/14/2016 10:15 AM, Michael Bannert wrote:
> Dear PyMVPA community,
> 
> I have a problem installing PyMVPA on Mac OS X 10.10.5 (Yosemite). Since
> my Homebrew does not get along well with MacPorts, I figured it would be
> best to try to build it from source.
>
> "python setup.py build_ext" fails with the following error (just the
> error bit):
> 
> /Users/MYUSERNAME/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2:
> warning: "Using deprecated NumPy API, disable it by "
> "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
> #warning "Using deprecated NumPy API, disable it by " \
>  ^
> 1 warning generated.
> creating build/lib.macosx-10.5-x86_64-2.7/mvpa2/clfs/libsvmc
> g++ -bundle -undefined dynamic_lookup -L/Users/MYUSERNAME/anaconda/lib
> -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.5.sdk -arch x86_64
> build/temp.macosx-10.5-x86_64-2.7/3rd/libsvm/svm.o
> build/temp.macosx-10.5-x86_64-2.7/build/src.macosx-10.5-x86_64-2.7/mvpa2/clfs/libsvmc/svmc_wrap.o
> -L/Users/MYUSERNAME/anaconda/lib -o
> build/lib.macosx-10.5-x86_64-2.7/mvpa2/clfs/libsvmc/_svmc.so -bundle
> ld: library not found for -lc++
> clang: error: linker command failed with exit code 1 (use -v to see
> invocation)
> ld: library not found for -lc++
> clang: error: linker command failed with exit code 1 (use -v to see
> invocation)
> error: Command "g++ -bundle -undefined dynamic_lookup
> -L/Users/MYUSERNAME/anaconda/lib -arch x86_64 -isysroot
> /Developer/SDKs/MacOSX10.5.sdk -arch x86_64
> build/temp.macosx-10.5-x86_64-2.7/3rd/libsvm/svm.o
> build/temp.macosx-10.5-x86_64-2.7/build/src.macosx-10.5-x86_64-2.7/mvpa2/clfs/libsvmc/svmc_wrap.o
> -L/Users/MYUSERNAME/anaconda/lib -o
> build/lib.macosx-10.5-x86_64-2.7/mvpa2/clfs/libsvmc/_svmc.so -bundle"
> failed with exit status 1
> 
> It seems that there is a problem with building the libsvm functions.
> (After all, the installation works fine with the "--no-libsvm" option.)
> I discovered a previous email reporting problems with that. The
> suggested solution was to use an earlier version of swig (3.0.0 instead
> of 3.0.8). This didn't work for me. (
> http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2015q3/003176.html
> )
> 
> However, I did find the library "ld" on my system. How can I tell
> PyMVPA's setup.py to use it? I played around with the config settings in
> setup.py but to no avail so far ...

ld isn't a library, it's the linker. It was saying that it couldn't find
libc++. These might be useful to you:

https://stackoverflow.com/questions/15981424/where-is-libc-on-os-x
https://stackoverflow.com/questions/17302349/what-is-lc

I have very limited experience with Macs, so I can't really help much
otherwise.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


Re: [pymvpa] Searchlight and permutation tests.

2016-04-26 Thread Christopher J Markiewicz
On 04/26/2016 09:41 AM, Roberto Guidotti wrote:
> Hi all,
> 
> I'm writing the second part of my story, begun with the thread
> "Balancing with searchlight and statistical issue"!
> 
> I have a problem with searchlight results, I ran some whole brain
> searchlight to decode betas of an fMRI dataset regarding a memory task
> (this time the dataset is balanced).
> I divided the dataset in 5 parts (11 betas per condition in each fold)
> and run a searchlight using Linear SVM (C=1) with a leave-one out
> cross-validation. 
> The across-subject average map has some suspicious results,

For group results, I'd consider using something like the more
conservative approach in Lee, et al. 2012
(https://www.ncbi.nlm.nih.gov/pubmed/22423114). They subtracted the
spatial mean from each subject, treating the center of the histogram as
an empirically discovered chance accuracy. They then use t tests to find
clusters that are consistently informative across subjects.

This can be pretty easily extended to a permutation test to get a null
distribution of cluster sizes.

> the accuracy
> histogram is not peaked at chance level (0.5) but is peaked at
> 0.55-0.56, so the majority of voxels has that range of value. Do you
> think is it reasonable? Or it depends on some cross-validation scheme,
> beta issue, or who knows?

I've used a similar analysis strategy as yours. While most of my
analyses tend to clump around chance, some are shifted a little right (I
don't think I've seen any shifted left). On a two-class problem,
0.55-0.56 doesn't seem terribly unreasonable to me, though it might not
be a bad idea to check your GLM to make sure that you're not
accidentally introducing trial information into your betas.

> To validate that result I ran a exploratory permutation test (n=100) on
> a single subject to look at accuracy distribution, in that subject, the
> histogram after permutation test is correctly peaked at chance level
> (the map with correct labeling is peaked at 0.57). I don't know if I'm
> correct but this should validate the hypothesis that the map histogram
> peaked at 0.55-0.56 is reasonable!

It would be even more surprising if permutations were peaked at anything
other than chance. Even if you introduced trial information into the
betas, shuffling the trial labels would destroy the usefulness of that
information. So I wouldn't consider this a validation. Sorry.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University

___
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa