Hi,

I just want to ckeck if a classifier trained on two classes A & B could also discriminate between two different classes C & D.

e.g. "is a classifier trained on face vs house samples also able to make correct predictions for cat vs. scissors?"

here is my code:

from mvpa.suite import *

attr = SampleAttributes(os.path.join(pymvpa_dataroot, 'attributes.txt'))
ds = NiftiDataset(samples=os.path.join(pymvpa_dataroot, 'bold.nii.gz'),
                      labels=attr.labels,
                      chunks=attr.chunks,
                      mask=os.path.join(pymvpa_dataroot, 'mask.nii.gz'))
detrend(ds, perchunk=True, model='regress', polyord=2)
zscore(ds, perchunk=True, targetdtype='float64', baselinelabels=[0])
# train ds: face vs house
ds1 = ds.select(labels=[1,2])
# predict ds: cat vs scissors
ds2 = ds.select(labels=[4,5])


# no of samples in each ds
s_ds1 = len(ds1.samples)
s_ds2 = len(ds2.samples)

# check for equal no. of samples
if s_ds1 < s_ds2:
   ds2 = ds2.getRandomSamples(s_ds1/2)
elif s_ds1 > s_ds2:
   ds1 = ds1.getRandomSamples(s_ds2/2)
elif s_ds1 == s_ds2:
   print ' > equal no. of samples'

# just check mean of of two classes for each ds
for i in ds1.uniquelabels:
   ds_tmp = ds1.select(labels=[i])
   print ' > mean of ds1 with label', i, N.mean(ds_tmp.samples)
for i in ds2.uniquelabels:
   ds_tmp = ds2.select(labels=[i])
   print ' > mean of ds2 with label', i, N.mean(ds_tmp.samples)


tmp_labels = ds2.labels.copy()
# fake labels
new_labels=tmp_labels.copy()
for l in xrange(tmp_labels.shape[0]):
   if tmp_labels[l] == 4.0:
       new_labels[l]=2.0
   elif tmp_labels[l] == 5.0:
       new_labels[l]=1.0

# assign new labels to predict ds
ds2.labels = new_labels

#setup clf
clf = LinearNuSVMC(nu=0.5, probability=0)

# setup validation procedure
terr = TransferError(clf)
terr.states.enable('confusion')
terr(ds1, ds1)

error = terr(ds1, ds2) print terr.confusion


This shows an accuracy of about 40 % ... however changing the assignment of the faked labels to:

for l in xrange(tmp_labels.shape[0]):
   if tmp_labels[l] == 4.0:
       new_labels[l]=1.0 # instead of =2.0
   elif tmp_labels[l] == 5.0:
       new_labels[l]=2.0 # instead of=1.0


leads to an accuracy of about 60 %. I am not sure if this makes sense, or if I missed something here. For making a statement like "clf trained on label A/B is also able to discriminate between classes C/D" would you suggest to run both analysis above and calculate the mean of both prediction errors... or how is this "usually" done?

Best regards,
Matthias



_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to