Re: [pymvpa] AttributePermutator: Permute within chunks (& subjects) but only training labels

Michael Bannert Thu, 25 May 2017 04:38:40 -0700

Hi,

I have decided to go with a quick and dirty fix. I added a new
permutation strategy to permutation.py called 'simple_test_intact'. It
checks if the sample ids belong to the test set and just skips the
permutation if they do. So the partitions attribute has to be defined,
i.e. use only in combination with a partitioner. Otherwise it is
identical to 'simple'.


I think this will have to do until I figure out how it is supposed to be
done in PyMVPA.

Cheers,
Michael


On 24/05/17 16:27, Michael Bannert wrote:
> Hey Richard,
> 
> Thanks for your time. I believe what I want is really basic and I am
> somewhat surprised that it isn't covered in the tutorials (correct me if
> I'm wrong).
> 
> I'm trying to impose two constraints on the way the labels are shuffled:
> 
> 1. Leave label assignments intact in the test set.
> 2. Only permute labels within each subject's individual runs.
> 
> I know how to define AttributePermutator to implement each constraint
> indvidually, for example
> 
> ad 1.:
> permutator = AttributePermutator(attr='targets', limit={'permutations':
> 1}, count=n_perm)
> 
> This restricts the permutations to the training set only.
> 
> ad 2.:
> permutator = AttributePermutator(attr='targets', limit=['subject',
> 'chunks'], count=n_perm)
> 
> Labels are now shuffled only within each pair of subject/chunks values.
> 
> It seems that the limit argument has two different functions depending
> on whether it is a list or a dictionary. If it's a dictionary (case 1),
> then each key/value pair determines which labels should be included in
> the permutation in the first place. So here it has a selection function.
> If it's a list (case 2), it defines subsets of labels within which to
> perform permutation. In this case it has a "chunking" function.
> 
> I do not see how I can do both - include selection AND chunking.
> 
> Since in my code that I sent before, limit is a dictionary, it will only
> have the selection function - it defines which labels to include in the
> permutation, namely in my example ALL subjects and ALL runs. Silly me -
> this basically just tells the permutator to include all samples in the
> dataset :)
> 
> So to answer your question, no when I restrict the selection to training
> set and subjects, which I assume you would imagine to look like this:
> 
> limit={'partitions': 1, 'subject': range(n_subj)}
> 
> ... I get the same result as if I had used limit={'partitions': 1}.
> 
> Best,
> Michael
> 
> 
> 
> On 24/05/17 12:35, Richard Dinga wrote:
>> Hi Michael,
>> I am sorry, I misunderstood your problem. Your snipet seems good to me.
>> What do you mean by it doesn't work? Does it work if you limit only on
>> training set and subjects for example?
>>
>> On Mon, May 22, 2017 at 6:28 PM, Michael Bannert
>> <mbann...@tuebingen.mpg.de <mailto:mbann...@tuebingen.mpg.de>> wrote:
>>
>>     Hi Richard,
>>
>>     Thanks for your email. Unfortunately, this is not exactly the answer
>>     that I'm looking for:
>>
>>     In the section you refer to, the only limiting condition is that the
>>     test set labels should remain unpermuted. This solves only 50 % of my
>>     problem. It has already been explained very well in the worked examples
>>     - so no problem there.
>>
>>     However, I also want to permute only within runs (and subjects) and do
>>     not see how this can be achieved with AttributePermutator (or any other
>>     method).
>>
>>     Best,
>>     Michael
>>
>>
>>     On 22/05/17 17:32, Richard Dinga wrote:
>>     > Hi,
>>     > Does this answer your question?
>>     > 
>> http://www.pymvpa.org/tutorial_significance.html#avoiding-the-trap-or-advanced-magic-101
>>     
>> <http://www.pymvpa.org/tutorial_significance.html#avoiding-the-trap-or-advanced-magic-101>
>>     >
>>     >
>>     > On Fri, May 19, 2017 at 8:18 PM, Michael Bannert
>>     > <mbann...@tuebingen.mpg.de <mailto:mbann...@tuebingen.mpg.de>
>>     <mailto:mbann...@tuebingen.mpg.de
>>     <mailto:mbann...@tuebingen.mpg.de>>> wrote:
>>     >
>>     >     Dear all,
>>     >
>>     >     I would like to use permutation testing for spatially aligned
>>     >     across-subject decoding. I have one vector of beta estimates
>>     per run
>>     >     (aka chunks) and per subject. Hence I figured it would be wise to
>>     >     permute within subjects and runs.
>>     >
>>     >     I can achieve this (I think) if I use AttributePermutator in
>>     this way:
>>     >
>>     >     permutator = AttributePermutator(attr='targets', limit=['subject',
>>     >     'chunks'], count=n_perm)
>>     >
>>     >     According to the debugging information provided when setting
>>     the 'APERM'
>>     >     option, the permutations that are produced look reasonable.
>>     >
>>     >     However, I would also like to permute only the training data.
>>     How can I
>>     >     accomplish this?
>>     >
>>     >     I tried something like this:
>>     >
>>     >     permutator = AttributePermutator(attr='targets', limit={
>>     >              'partitions': 1, 'subject': range(n_subj), 'chunks':
>>     >     ['run%02.f' % j for j in range(1, n_runs + 1)]},
>>     >               count=1)
>>     >
>>     >     ... but it doesn't work.
>>     >
>>     >     I guess I am not very clear on what the documentation of
>>     >     AttributePermutator has to say about the limit argument.
>>     >
>>     >     Could anyone help?
>>     >
>>     >     Thanks & best,
>>     >     Michael
>>     >
>>     >
>>     >
>>     >     _______________________________________________
>>     >     Pkg-ExpPsy-PyMVPA mailing list
>>     >     Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
>>     <mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org>
>>     >     <mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
>>     <mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org>>
>>     >   
>>      
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>     
>> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
>>     >   
>>      
>> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>     
>> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>>
>>     >
>>     >
>>     >
>>     >
>>     > _______________________________________________
>>     > Pkg-ExpPsy-PyMVPA mailing list
>>     > Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
>>     <mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org>
>>     >
>>     http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>     
>> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
>>     >
>>
>>     _______________________________________________
>>     Pkg-ExpPsy-PyMVPA mailing list
>>     Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
>>     <mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org>
>>     http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>     
>> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
>>
>>
>>
>>
>> _______________________________________________
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
> 
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>

# emacs: -*- mode: python; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
#   See COPYING file distributed along with the PyMVPA package for the
#   copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
"""Generator nodes to permute datasets.
"""

__docformat__ = 'restructuredtext'

import numpy as np

from mvpa2.base import warning
from mvpa2.base.dochelpers import _repr_attrs

from mvpa2.base.node import Node
from mvpa2.base.dochelpers import _str, _repr
from mvpa2.misc.support import get_limit_filter
from mvpa2.misc.support import get_rng

from mvpa2.support.utils import deprecated
from mvpa2.mappers.fx import _product

if __debug__:
    from mvpa2.base import debug

class AttributePermutator(Node):
    """Node to permute one a more attributes in a dataset.

    This node can permute arbitrary sample or feature attributes in a dataset.
    Moreover, it supports limiting the permutation to a subset of samples or
    features (see ``limit`` argument). The node can simply be called with a
    dataset for a one time permutation, or used as a generator to produce
    multiple permutations.

    This node only permutes dataset attributes, dataset samples are no affected.
    The permuted output dataset shares the samples container with the input
    dataset.
    """
    def __init__(self, attr, count=1, limit=None, assure=False,
                 strategy='simple', chunk_attr=None, rng=None, **kwargs):
        """
        Parameters
        ----------
        attr : str or list(str)
          Name of the to-be-permuted attribute. This can also be a list of
          attribute names, in which case the *identical* shuffling is applied to
          all listed attributes.
        count : int
          Number of permutations to be yielded by .generate()
        limit : None or str or list or dict
          If ``None`` all attribute values will be permuted. If a single
          attribute name is given, its unique values will be used to define
          chunks of data that are permuted individually (i.e. no attributed
          values will be replaced across chunks). If a list given, then combination
          of those attributes per each sample is used together. Finally, if a dictionary is
          provided, its keys define attribute names and its values (single value
          or sequence thereof) attribute value, where all key-value combinations
          across all given items define a "selection" of to-be-permuted samples
          or features.
        strategy : 'simple', 'uattrs', 'chunks', 'simple_test_intact'
          'simple' strategy is the straightforward permutation of attributes (given
          the limit).  In some sense it assumes independence of those samples.
          'uattrs' strategy looks at unique values of attr (or their unique
          combinations in case of `attr` being a list), and "permutes" those
          unique combinations values thus breaking their assignment to the samples
          but preserving any dependencies between samples within the same unique
          combination. The 'chunks' strategy swaps attribute values of entire chunks.
          Naturally, this will only work if there is the same number of samples in
          all chunks. 'simple_test_intact' is identical to 'simple' except that
          samples from the test set are not permuted. This strategy requires the
          sample attribute 'partitions'. So you may want to use this in combination
          with a partitioner.
        assure : bool
          If set, by-chance non-permutations will be prevented, i.e. it is
          checked that at least two items change their position. Since this
          check adds a runtime penalty it is off by default.
        rng : int or RandomState, optional
          Integer to seed a new RandomState upon each call, or instance of the
          numpy.random.RandomState to be reused across calls. If None, the
          numpy.random singleton would be used


        """
        Node.__init__(self, **kwargs)
        self._pattr = attr

        self.count = count
        self._limit = limit

        self._assure_permute = assure
        self.strategy = strategy
        self.rng = rng
        self.chunk_attr = chunk_attr

    def _get_call_kwargs(self, ds):
        # determine to be permuted attribute to find the collection
        pattr = self._pattr
        if isinstance(pattr, str):
            pattr, collection = ds.get_attr(pattr)
        else:
            # must be sequence of attrs, take first since we only need the shape
            pattr, collection = ds.get_attr(pattr[0])

        # _call might need to operate on the dedicated instantiated rng
        # e.g. if seed int is provided
        return {
            'limit_filter': get_limit_filter(self._limit, collection),
            'rng': get_rng(self.rng)
        }

    def _call(self, ds, limit_filter=None, rng=None):
        # local binding
        pattr = self._pattr
        assure_permute = self._assure_permute

        if isinstance(pattr, str):
            # wrap single attr name into tuple to simplify the code
            pattr = (pattr,)

        # get actual attributes
        in_pattrs = [ds.get_attr(pa)[0] for pa in pattr]

        # Method to use for permutations
        try:
            permute_fx = getattr(self, "_permute_%s" % self.strategy)
            permute_kwargs = {'rng': rng}
        except AttributeError:
            raise ValueError("Unknown permutation strategy %r" % self.strategy)

        if self.chunk_attr is not None:
            permute_kwargs['chunks'] = ds.sa[self.chunk_attr].value

        for i in xrange(10):  # for the case of assure_permute
            # shallow copy of the dataset for output
            out = ds.copy(deep=False)

            out_pattrs = [out.get_attr(pa)[0] for pa in pattr]
            # replace .values with copies in out_pattrs so we do
            # not override original values
            for pa in out_pattrs:
                pa.value = pa.value.copy()

            for limit_value in np.unique(limit_filter):
                if limit_filter.dtype == np.bool:
                    # simple boolean filter -> do nothing on False
                    if not limit_value:
                        continue
                    # otherwise get indices of "selected ones"
                    limit_idx = limit_filter.nonzero()[0]
                else:
                    # non-boolean limiter -> determine "chunk" and permute within
                    limit_idx = (limit_filter == limit_value).nonzero()[0]

                # need list to index properly
                limit_idx = list(limit_idx)

                # When using 'simple_test_intact' as strategy the partition indices must
                # be provided as well. Otherwise the normal call of the permutation function
                # will be sufficient
                if self.strategy is 'simple_test_intact':
                    permute_fx(limit_idx, in_pattrs, out_pattrs, out, **permute_kwargs)
                    
                else:
                    permute_fx(limit_idx, in_pattrs, out_pattrs, **permute_kwargs)

            if not assure_permute:
                break

            # otherwise check if we differ from original, and if so -- break
            differ = False
            for in_pattr, out_pattr in zip(in_pattrs, out_pattrs):
                differ = differ or np.any(in_pattr.value != out_pattr.value)
                if differ:
                    break                 # leave check loop if differ
            if differ:
                break                     # leave 10 loop, otherwise go to the next round

        if assure_permute and not differ:
            raise RuntimeError(
                "Cannot assure permutation of %s with limit %r for "
                "some reason (dataset %s). Should not happen"
                % (pattr, self._limit, ds))

        return out


    def _permute_simple(self, limit_idx, in_pattrs, out_pattrs, rng=None):
        """The simplest permutation
        """
        perm_idx = rng.permutation(limit_idx)

        if __debug__:
            debug('APERM', "Obtained permutation %s", (perm_idx, ))

        # for all to be permuted attrs
        for in_pattr, out_pattr in zip(in_pattrs, out_pattrs):
            # replace all values in current limit with permutations
            # of the original ds's attributes
            out_pattr.value[limit_idx] = in_pattr.value[perm_idx]

    def _permute_simple_test_intact(self, limit_idx, in_pattrs, out_pattrs, ds, rng=None):
        """Same as 'permute_simple' except that labels in the test set or not shuffled
        """

        # Check if the partitions attribute is at all defined
        if hasattr(ds.sa, 'partitions'):
            partitions = ds.sa.partitions[limit_idx]

        else:
            raise RuntimeError(
                "Need to use 'simple_test_intact' in combination with a partitioner")
                

        # All samples permuted here must belong to the same set:
        # either the training or test set.
        if len(np.unique(partitions)) is not 1:
            raise RuntimeError(
                "Can't tell if these samples belong to training or test set")
        
        # If all IDs refer to samples in the test set, do not permute anything
        if np.any(partitions != 2) == False:
            perm_idx = np.array(limit_idx)

        else:
            perm_idx = rng.permutation(limit_idx)
        
        if __debug__:
            debug('APERM', "Obtained permutation %s", (perm_idx, ))
        
        # for all to be permuted attrs
        for in_pattr, out_pattr in zip(in_pattrs, out_pattrs):
            # replace all values in current limit with permutations
            # of the original ds's attributes
            out_pattr.value[limit_idx] = in_pattr.value[perm_idx]


    def _permute_uattrs(self, limit_idx, in_pattrs, out_pattrs, rng=None):
        """Provide a permutation given a specified strategy
        """
        # Select given limit_idx
        pattrs_lim = [p.value[limit_idx] for p in in_pattrs]
        # convert to list of tuples
        pattrs_lim_zip = zip(*pattrs_lim)
        # find unique groups
        unique_groups = list(set(pattrs_lim_zip))
        # now we need to permute the groups to generate remapping
        # get permutation indexes first
        perm_idx = rng.permutation(np.arange(len(unique_groups)))
        # generate remapping
        remapping = dict([(t, unique_groups[i])
                          for t, i in zip(unique_groups, perm_idx)])
        if __debug__:
            debug('APERM', "Using remapping %s", (remapping,))

        for i, in_group in zip(limit_idx, pattrs_lim_zip):
            out_group = remapping[in_group]
            # now we need to assign them ot out_pattrs
            for pa, out_v in zip(out_pattrs, out_group):
                pa.value[i] = out_v

    @staticmethod
    def _permute_chunks_sanity_check(in_pattrs, chunks, uniques):
        #  Verify that we are not dealing with some degenerate scenario

        for in_pattr in in_pattrs:
            sample_targets = in_pattr.value[np.where(chunks == uniques[0])]

            for orig in uniques[1:]:
                chunk_targets = in_pattr.value[np.where(chunks == orig)]
                # must be of the same length
                if np.any(chunk_targets != sample_targets):
                    # Escape as early as possible
                    return

        warning("Permutation via strategy='chunk' makes no sense --"
                " all chunks have the same order of targets: %s"
                % (sample_targets,))

    def _permute_chunks(self, limit_idx, in_pattrs, out_pattrs, chunks=None, rng=None):
        # limit_idx is doing nothing

        if chunks is None:
            raise ValueError("Missing 'chunk_attr' for strategy='chunk'")

        uniques = np.unique(chunks)

        if __debug__ and len(uniques):
            # Somewhat a duplication, since could be checked within the loop,
            # but IMHO makes it cleaner and shouldn't be that big of an impact
            self._permute_chunks_sanity_check(in_pattrs, chunks, uniques)

        for in_pattr, out_pattr in zip(in_pattrs, out_pattrs):
            shuffled = uniques.copy()
            rng.shuffle(shuffled)

            for orig, new in zip(uniques, shuffled):
                out_pattr.value[np.where(chunks == orig)] = \
                    in_pattr.value[np.where(chunks == new)]

    def generate(self, ds):
        """Generate the desired number of permuted datasets."""
        # figure out permutation setup once for all runs
        # permute as often as requested
        for i in xrange(self.count):
            kwargs = self._get_call_kwargs(ds)
            ## if __debug__:
            ##     debug('APERM', "%s generating %i-th permutation", (self, i))
            yield self(ds, _call_kwargs=kwargs)

    def __str__(self):
        return _str(self, self._pattr, n=self.count, limit=self._limit,
                    assure=self._assure_permute)

    def __repr__(self, prefixes=None):
        if prefixes is None:
            prefixes = []
        return super(AttributePermutator, self).__repr__(
            prefixes=prefixes
            + _repr_attrs(self, ['attr'])
            + _repr_attrs(self, ['count'], default=1)
            + _repr_attrs(self, ['limit'])
            + _repr_attrs(self, ['assure'], default=False)
            + _repr_attrs(self, ['strategy'], default='simple')
            + _repr_attrs(self, ['rng'], default=None)
            )

    attr = property(fget=lambda self: self._pattr)
    limit = property(fget=lambda self: self._limit)
    assure = property(fget=lambda self: self._assure_permute)

# Understand permutation objects

from mvpa2.suite import *
import numpy as np

n_samples = 10.
n_features = 3.
n_conds = 2.
n_runs = 5.
n_perm = 4


ds = dataset_wizard(np.tile(np.arange(n_samples), (n_features, 1)).T)
ds.sa['targets'] = ['cond%i' % np.mod(x, n_conds) for x in np.arange(n_samples)]
ds.sa['chunks'] = ['run%i' % np.floor(x / (n_samples / n_runs)) for x in np.arange(n_samples)]
ds.sa['sample_id'] = ['id%02.f' % x for x in np.arange(n_samples)]

partitioner = NFoldPartitioner(cvtype=1, attr='chunks')

permutator = AttributePermutator('sample_id', limit='chunks', count=n_perm, strategy='simple_test_intact')

permutator_chain = ChainNode([partitioner, permutator], space=partitioner.get_space())

perm_ds = list(permutator_chain.generate(ds))

for sd in perm_ds:
    print sd.sa['sample_id'].value
    print sd.sa['chunks'].value
    print sd.sa['partitions'].value
    print '---'

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Re: [pymvpa] AttributePermutator: Permute within chunks (& subjects) but only training labels

Reply via email to