[pymvpa] clf generalization over a different set of classes

Matthias Ekman Tue, 04 Aug 2009 04:19:40 -0700

Hi,

I just want to ckeck if a classifier trained on two classes A & B couldalso discriminate between two different classes C & D.

e.g. "is a classifier trained on face vs house samples also able to makecorrect predictions for cat vs. scissors?"


here is my code:

from mvpa.suite import *

attr = SampleAttributes(os.path.join(pymvpa_dataroot, 'attributes.txt'))
ds = NiftiDataset(samples=os.path.join(pymvpa_dataroot, 'bold.nii.gz'),
                      labels=attr.labels,
                      chunks=attr.chunks,
                      mask=os.path.join(pymvpa_dataroot, 'mask.nii.gz'))

detrend(ds, perchunk=True, model='regress', polyord=2)

zscore(ds, perchunk=True, targetdtype='float64', baselinelabels=[0])

# train ds: face vs house

ds1 = ds.select(labels=[1,2])
# predict ds: cat vs scissors
ds2 = ds.select(labels=[4,5])


# no of samples in each ds
s_ds1 = len(ds1.samples)
s_ds2 = len(ds2.samples)

# check for equal no. of samples
if s_ds1 < s_ds2:
   ds2 = ds2.getRandomSamples(s_ds1/2)
elif s_ds1 > s_ds2:
   ds1 = ds1.getRandomSamples(s_ds2/2)
elif s_ds1 == s_ds2:
   print ' > equal no. of samples'

# just check mean of of two classes for each ds
for i in ds1.uniquelabels:
   ds_tmp = ds1.select(labels=[i])
   print ' > mean of ds1 with label', i, N.mean(ds_tmp.samples)

for i in ds2.uniquelabels:

   ds_tmp = ds2.select(labels=[i])
   print ' > mean of ds2 with label', i, N.mean(ds_tmp.samples)


tmp_labels = ds2.labels.copy()
# fake labels
new_labels=tmp_labels.copy()
for l in xrange(tmp_labels.shape[0]):
   if tmp_labels[l] == 4.0:
       new_labels[l]=2.0
   elif tmp_labels[l] == 5.0:
       new_labels[l]=1.0

# assign new labels to predict ds
ds2.labels = new_labels

#setup clf
clf = LinearNuSVMC(nu=0.5, probability=0)

# setup validation procedure
terr = TransferError(clf)
terr.states.enable('confusion')
terr(ds1, ds1)

error = terr(ds1, ds2)print terr.confusion

This shows an accuracy of about 40 % ... however changing the assignmentof the faked labels to:


for l in xrange(tmp_labels.shape[0]):
   if tmp_labels[l] == 4.0:
       new_labels[l]=1.0 # instead of =2.0
   elif tmp_labels[l] == 5.0:
       new_labels[l]=2.0 # instead of=1.0

leads to an accuracy of about 60 %. I am not sure if this makes sense,or if I missed something here. For making a statement like "clf trainedon label A/B is also able to discriminate between classes C/D" would yousuggest to run both analysis above and calculate the mean of bothprediction errors... or how is this "usually" done?


Best regards,
Matthias



_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

[pymvpa] clf generalization over a different set of classes

Reply via email to