[pymvpa] RFE problem w/ Multi-Class SVM classifier

Mark Lescroart Fri, 20 Nov 2009 18:11:45 -0800

Hello,

I'd like perform some feature selection (using Recursive FeatureElimination) on a data set I'm analyzing, but I haven't been able tomake it work.

I could not find any full example of how to use (rather than justcreate and/or train) a FeatureSelectionClassifier; I think a fullexample would be useful. The one example in the documentation showinghow to train a FeatureSelectionClassifier did it by calling


clf.train(dataset)

... and then calling dataset.selectFeatures(clf.feature_ids)

This didn't work for me (see the code and errors below). I was workingwith a different classifier (linear SVM multi-class instead of kNN),and I was working with a slightly different data set (masked data setloaded from a Matlab matrix), but it seems that the same principlesshould apply. What am I doing wrong?

I suspect my problem may have something to do with the (bug?) that Iwrote to you about previously (http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2009q4/000806.html)

To review, the function clf.getSensitivityAnalyzer(), rather thancombining feature sensitivities across comparisons of the data (thisis a multi-class classifier), was combining across features. Thus Igot 3 sensitivity values (for the comparisons of 1vs2, 1vs3, and 2vs3)rather than 649 values (1 per feature (voxel) in my data set). I wasable to read out the feature sensitivities by calling


clf.getSensitivityAnalyzer(transformer=None,combiner=None),

but now it seems like the RFE algorithm needs a correct combiner towork. I could not find any documentation on other arguments to providebesides "None" (combiner=??).


Help? Any idea what's going on?

The code I'm using and the error messages I get are provided below.

Thanks (again) for your time,

Mark


~~~~~~~~~~~~~


from scipy.io import loadmat
from mvpa.suite import *

DatFile = 'WholeBrainMatFile.mat' # 4-D .mat file of 2x2x2 voxels -440x80x60x69MaskFile = 'ROI_Mask.mat' # Contains a mask for 649 voxels in theLateral Occipital area

AttrFile = 'ConditionLabels.txt'

D = loadmat(DatFile)
Data = D['Data']
M = loadmat(MaskFile)
MaskMat = M['Mask']
attr = SampleAttributes(AttrFile)

# create masked data set

PyDat =MaskedDataset(samples=Data,labels=attr.labels,chunks=attr.chunks,mask=MaskMat)

zscore(PyDat,perchunk=True,targetdtype='float32')
# PyDat is: <Dataset / float32 440 x 649 uniq: 8 chunks 3 labels>

# Now: feature selection:

splitter = NFoldSplitter(cvtype=1)
rfesvm_split = SplitClassifier(LinearCSVMC(),splitter)
FtSelClf = FeatureSelectionClassifier(
        # use a linear SVM classifier:
        clf = LinearCSVMC(),
        # on features selected via RFE
        feature_selection = RFE(
                # based on sensitivity of a clf which does splitting internally

sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),#transformer=None

                transfer_error=ConfusionBasedError(
                        rfesvm_split,
                        confusion_state="confusion"),
                # and whose internal error we use
                feature_selector=FractionTailSelector(
                        0.2, mode='discard', tail='lower'),
                # remove 20% of features at each step
                enable_states=['feature_ids'],
                # update sensitivity at each step
                update_sensitivity=True),
        descr='LinSVM+RFE(splits_avg)')

# Option 1: simple training and check on feature IDs
print FtSelClf.trained # prints "False"
FtSelClf.train(PyDat)
print FtSelClf.trained # prints "True"
print FtSelClf.feature_ids
# (Generates error - see below)

# Option 2: Run cross-validated transfer error
terr = TransferError(FtSelClf)
splitter = NFoldSplitter(cvtype=1)
cvterr = CrossValidatedTransferError(
        terr,
        splitter)
Err = cvterr(PyDat)
print Err
# (Also generates error - having NOT run option 1)

To be clear - I only used EITHER Option 1 or Option 2 (one or theother was always commented out when I ran the code).


Option 1 gives the error:

Traceback (most recent call last):
  File "./FeatureSelection_Example.py", line 77, in <module>
    print FtSelClf.feature_ids

File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",line 1099, in __getattribute__

    return collections[known_attribs[index]].getvalue(index)

File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",line 353, in getvalue

    return self._items[index].value

File "/opt/local/lib/python2.5/site-packages/mvpa/misc/attributes.py", line 66, in _getVirtual

    return self._get()

File "/opt/local/lib/python2.5/site-packages/mvpa/misc/attributes.py", line 227, in _get

    raise UnknownStateError("Unknown yet value of %s" % (self.name))

mvpa.misc.exceptions.UnknownStateError: Exception: Unknown yet valueof feature_ids



And Option 2 gives the error:

Traceback (most recent call last):
  File "./FeatureSelection_Example.py", line 81, in <module>
    Err = cvterr(PyDat)

File "/opt/local/lib/python2.5/site-packages/mvpa/measures/base.py", line 105, in __call__

    result = self._call(dataset)

File "/opt/local/lib/python2.5/site-packages/mvpa/algorithms/cvtranserror.py", line 173, in _call

    result = transerror(split[1], split[0])

File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/transerror.py", line 1283, in __call__

    self._precall(testdataset, trainingdataset)

File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/transerror.py", line 1239, in _precall

    self.__clf.train(trainingdataset)

File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/base.py",line 354, in train

    result = self._train(dataset)

File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/meta.py",line 1058, in _train

    self.__testdataset)

File "/opt/local/lib/python2.5/site-packages/mvpa/featsel/rfe.py",line 268, in __call__

    wdataset = wdataset.selectFeatures(selected_ids)

File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/mapped.py", line 130, in selectFeatures

    sdata = Dataset.selectFeatures(self, ids=ids, sort=sort)

File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/base.py", line 1018, in selectFeatures

    new_data['samples'] = self._data['samples'][:, ids]
IndexError: index (2) out of range (0<=index<1) in dimension 1



~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Lescroart
(say it LESS-qua)

University of Southern California
Neuroscience Graduate Program
Image Understanding Lab
Email: [email protected]
Cell: (213) 447-0752

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

[pymvpa] RFE problem w/ Multi-Class SVM classifier

Reply via email to