On Thu, 25 Oct 2012, Jacob Itzhacki wrote: > "e.g. �some super-ordinate category (e.g. �animate-vs-inanimate) �you > would like to cross-validate not across functional runs BUT across > sub-ordinate stimuli categories (e.g. train on > humans/reptiles/shoes/scissors to discriminate animacy and > cross-validate into bugs/houses, then continue with another pair to take > out)." > BTW, this exactly what I would like to do but I still don't figure out how > to leave out the test trials from the training trials, so they don't get > classified into themselves.
ok then -- the point is to craft such an interesting partitioner. And there are actually 2 approaches to this. Let's first look into https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_usecases.py#L50 which I am citing here with some additional comments and omitting import statement(s) -- it is a bit more cumbersome since in it we have 6 subordinate categories and 3 superord (not 2 which would make explanation easier): # Let's simulate the beast -- 6 categories total groupped into 3 # super-ordinate, and actually without any 'superordinate' effect # since subordinate categories independent # in your case I hope you would have a true superordinate effect like in # example study I am referring to below ds = normal_feature_dataset(nlabels=6, snr=100, # pure signal! ;) perlabel=30, nfeatures=6, nonbogus_features=range(6), nchunks=5) ds.sa['subord'] = ds.sa.targets.copy() # Here I am creating a new 'superord' category as a remainder of division by 3 # of original 6 categories (in 'subord') ds.sa['superord'] = ['super%d' % (int(i[1])%3,) for i in ds.targets] # 3 superord categories # let's override original targets just to be sure that we aren't relying on them ds.targets[:] = 0 npart = ChainNode([ ## so we split based on superord # So now this NFold partitioner would select 3 subord categories (possibly where we even # have multiple samples from the same superord category) NFoldPartitioner(len(ds.sa['superord'].unique), attr='subord'), ## so it should select only those splits where we took 1 from ## each of the superord categories leaving things in balance Sifter([('partitions', 2), ('superord', { 'uvalues': ds.sa['superord'].unique, 'balanced': True}) ]), ], space='partitions') # And with that NFold + Sifter we achieve desired effect that we would get only # those splits where into testing we place 3 different subord categories with 1 # of each superord # and then do your normal where clf is space='superord' clf = LinearCSVMC(space='superord') cvte_regular = CrossValidation(clf, NFoldPartitioner(), errorfx=lambda p,t: np.mean(p==t)) # below we use our NFold + Sifter partitioner instead of a simple NFold on chunks cvte_super = CrossValidation(clf, npart, errorfx=lambda p,t: np.mean(p==t)) # apply as usual ;) accs_regular = cvte_regular(ds) accs_super = cvte_super(ds) If you are interested in how that would effect the results -- I would invite you to look at my recent poster at SfN 2012: http://haxbylab.dartmouth.edu/publications/HGG+12_sfn12_famfaces.png 2nd column, scatter plot "Why across identities?" where on x-axis you have z-scores for CV stats across identities while cross-validating searchlights on classification of personal familiarity to faces across functional runs, while on y-axis -- across pairs of individuals. Both results are in high agreement BUT in the "blue areas" -- early visual cortex, where if we cross-validate across functional runs, classifier might just learn identity information. Since identity of a face (subordinate category) here has clear association with familiarity (superordinate), it would provide significant classification results in those areas where there is strong identity information on stimuli (in our case in early visual cortex since the faces were actually different ;) ) but possibly no (strong) superord effects (let's forget for now about possible attention/engagement etc effects). By cross-validating across identities (subord), we can easily get rid of those subord-specific effects and capture the notion of the superord category effects more clearly. Alternative, even more stricter cross-validation scheme would involve cross-validation across runs BUT also bootstrapping then additional folds for each such a split with generating all those splits across identities. For that we have ExcludeTargetsCombinationsPartitioner docs for which are http://www.pymvpa.org/generated/mvpa2.generators.partition.ExcludeTargetsCombinationsPartitioner.html?highlight=excludetargetscombinationspartitioner and unittest https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_generators.py#L266 This one was used in the original hyperalignment paper (http://haxbylab.dartmouth.edu/publications/HGC+11.pdf) to do not fall into the trap of run order effects... I would be glad to see people reporting back comparing these 3 schemes (just across runs, across subord, across runs+subord) of cross-validation on their data with hierarchical categories design. Thanks in advance for sharing -- it would be great if we get a dialog going instead of my one-way blurbing... doh -- sharing! ;) Cheers, > On Wed, Oct 24, 2012 at 5:15 PM, Jacob Itzhacki <[1][email protected]> > wrote: > Please do! > and thank you for all the responses :D > Don't want to come across as lazy but I'm not a master coder at all so > sometimes figuring out what one line of code does can be quite the > ordeal, in my case. > J > On Wed, Oct 24, 2012 at 3:54 PM, Yaroslav Halchenko > <[2][email protected]> wrote: > On Wed, 24 Oct 2012, MS Al-Rawi wrote: > > � �Cross-validation is fine even in this case, you'll just need to > rearrange > > � �your data in a way to leave-a-set-of-stimuli out, instead of > > � �leave-one-run-out. Perhaps PyMVPA has some functionality to do > this.� > now it is getting interesting -- I think you got close to what I > thought > the question was about: �to investigate the conceptual/true effect of > e.g. �some super-ordinate category (e.g. �animate-vs-inanimate) �you > would like to cross-validate not across functional runs BUT across > sub-ordinate stimuli categories (e.g. train on > humans/reptiles/shoes/scissors to discriminate animacy and > cross-validate into bugs/houses, then continue with another pair to > take > out). �And that is what I thought for a moment the question was > about ;) > This all can be (was) done with PyMVPA although would require 3-4 > lines of code instead of 1 to accomplish ATM. �If anyone interested I > could provide an example ;)... ? -- Yaroslav O. Halchenko Postdoctoral Fellow, Department of Psychological and Brain Sciences Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

