Hello,

I'd like perform some feature selection (using Recursive Feature Elimination) on a data set I'm analyzing, but I haven't been able to make it work.

I could not find any full example of how to use (rather than just create and/or train) a FeatureSelectionClassifier; I think a full example would be useful. The one example in the documentation showing how to train a FeatureSelectionClassifier did it by calling

clf.train(dataset)

... and then calling dataset.selectFeatures(clf.feature_ids)

This didn't work for me (see the code and errors below). I was working with a different classifier (linear SVM multi-class instead of kNN), and I was working with a slightly different data set (masked data set loaded from a Matlab matrix), but it seems that the same principles should apply. What am I doing wrong?

I suspect my problem may have something to do with the (bug?) that I wrote to you about previously (http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2009q4/000806.html )

To review, the function clf.getSensitivityAnalyzer(), rather than combining feature sensitivities across comparisons of the data (this is a multi-class classifier), was combining across features. Thus I got 3 sensitivity values (for the comparisons of 1vs2, 1vs3, and 2vs3) rather than 649 values (1 per feature (voxel) in my data set). I was able to read out the feature sensitivities by calling

clf.getSensitivityAnalyzer(transformer=None,combiner=None),

but now it seems like the RFE algorithm needs a correct combiner to work. I could not find any documentation on other arguments to provide besides "None" (combiner=??).

Help? Any idea what's going on?

The code I'm using and the error messages I get are provided below.

Thanks (again) for your time,

Mark


~~~~~~~~~~~~~


from scipy.io import loadmat
from mvpa.suite import *

DatFile = 'WholeBrainMatFile.mat' # 4-D .mat file of 2x2x2 voxels - 440x80x60x69 MaskFile = 'ROI_Mask.mat' # Contains a mask for 649 voxels in the Lateral Occipital area
AttrFile = 'ConditionLabels.txt'

D = loadmat(DatFile)
Data = D['Data']
M = loadmat(MaskFile)
MaskMat = M['Mask']
attr = SampleAttributes(AttrFile)

# create masked data set
PyDat = MaskedDataset (samples=Data,labels=attr.labels,chunks=attr.chunks,mask=MaskMat)
zscore(PyDat,perchunk=True,targetdtype='float32')
# PyDat is: <Dataset / float32 440 x 649 uniq: 8 chunks 3 labels>

# Now: feature selection:

splitter = NFoldSplitter(cvtype=1)
rfesvm_split = SplitClassifier(LinearCSVMC(),splitter)
FtSelClf = FeatureSelectionClassifier(
        # use a linear SVM classifier:
        clf = LinearCSVMC(),
        # on features selected via RFE
        feature_selection = RFE(
                # based on sensitivity of a clf which does splitting internally
sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(), #transformer=None
                transfer_error=ConfusionBasedError(
                        rfesvm_split,
                        confusion_state="confusion"),
                # and whose internal error we use
                feature_selector=FractionTailSelector(
                        0.2, mode='discard', tail='lower'),
                # remove 20% of features at each step
                enable_states=['feature_ids'],
                # update sensitivity at each step
                update_sensitivity=True),
        descr='LinSVM+RFE(splits_avg)')

# Option 1: simple training and check on feature IDs
print FtSelClf.trained # prints "False"
FtSelClf.train(PyDat)
print FtSelClf.trained # prints "True"
print FtSelClf.feature_ids
# (Generates error - see below)

# Option 2: Run cross-validated transfer error
terr = TransferError(FtSelClf)
splitter = NFoldSplitter(cvtype=1)
cvterr = CrossValidatedTransferError(
        terr,
        splitter)
Err = cvterr(PyDat)
print Err
# (Also generates error - having NOT run option 1)

To be clear - I only used EITHER Option 1 or Option 2 (one or the other was always commented out when I ran the code).

Option 1 gives the error:

Traceback (most recent call last):
  File "./FeatureSelection_Example.py", line 77, in <module>
    print FtSelClf.feature_ids
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py", line 1099, in __getattribute__
    return collections[known_attribs[index]].getvalue(index)
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py", line 353, in getvalue
    return self._items[index].value
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/ attributes.py", line 66, in _getVirtual
    return self._get()
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/ attributes.py", line 227, in _get
    raise UnknownStateError("Unknown yet value of %s" % (self.name))
mvpa.misc.exceptions.UnknownStateError: Exception: Unknown yet value of feature_ids


And Option 2 gives the error:

Traceback (most recent call last):
  File "./FeatureSelection_Example.py", line 81, in <module>
    Err = cvterr(PyDat)
File "/opt/local/lib/python2.5/site-packages/mvpa/measures/ base.py", line 105, in __call__
    result = self._call(dataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/algorithms/ cvtranserror.py", line 173, in _call
    result = transerror(split[1], split[0])
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/ transerror.py", line 1283, in __call__
    self._precall(testdataset, trainingdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/ transerror.py", line 1239, in _precall
    self.__clf.train(trainingdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/base.py", line 354, in train
    result = self._train(dataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/meta.py", line 1058, in _train
    self.__testdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/featsel/rfe.py", line 268, in __call__
    wdataset = wdataset.selectFeatures(selected_ids)
File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/ mapped.py", line 130, in selectFeatures
    sdata = Dataset.selectFeatures(self, ids=ids, sort=sort)
File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/ base.py", line 1018, in selectFeatures
    new_data['samples'] = self._data['samples'][:, ids]
IndexError: index (2) out of range (0<=index<1) in dimension 1



~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Lescroart
(say it LESS-qua)

University of Southern California
Neuroscience Graduate Program
Image Understanding Lab
Email: [email protected]
Cell: (213) 447-0752

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to