[pymvpa] postproc=mean_sample(), is there a bug?

marco tettamanti Mon, 17 Aug 2015 07:50:17 -0700

Dear all,

I have encountered a strange behaviour in running a Monte Carlo permutationanalysis, though this may just be my lack of understanding.

Running the analysis without "postproc=mean_sample()" to collect p-values foreach fold seems to produce sensible results:


-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)

null_cv = CrossValidation(clf, ChainNode([partitioner, permutator],space=partitioner.get_space()))distr_est = MCNullDist(repeater, tail='left', measure=null_cv,enable_ca=['dist_samples'])cv_mc = CrossValidation(clf, partitioner, null_dist=distr_est,enable_ca=['confusion', 'stats'])

mcerr = cv_mc(fds)

In [72]: print cv_mc.ca.stats.stats['ACC']
0.537037037037

In [73]: p_mc = cv_mc.ca.null_prob

In [74]: print np.ravel(p_mc)
[ 0.66  0.34  0.59  0.42  0.28  0.14  0.3   0.06  0.64  0.03  0.34 0.26
  0.59  0.12  0.34  0.09  0.27  0.1 ]

In [75]: print np.mean(np.ravel(p_mc))
0.309444444444
-----------------

However, the same snippet, but with the addition of "postproc=mean_sample()",yields a quite strange p-value, if compared to the mean p-value across folds.What got me suspicious, is that this p-value always remains constant,independently of whether I repeat the analysis n-times (as if no randompermutation was actually occurring), and independently of whether I use adifferent permutator (e.g. AttributePermutator('targets', count=1, limit='chunks')):


-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)

null_cv = CrossValidation(clf, ChainNode([partitioner, permutator],space=partitioner.get_space()), postproc=mean_sample())distr_est = MCNullDist(repeater, tail='left', measure=null_cv,enable_ca=['dist_samples'])cv_mc = CrossValidation(clf, partitioner, postproc=mean_sample(),null_dist=distr_est, enable_ca=['confusion', 'stats'])

mcerr = cv_mc(fds)

In [50]: print cv_mc.ca.stats.stats['ACC']
0.537037037037

In [51]: p_mc = cv_mc.ca.null_prob

In [52]: print np.ravel(p_mc)
[ 0.00980392]

In [53]: print np.mean(np.ravel(p_mc))
0.00980392156863
-----------------


Can anybody reproduce this behaviour or explain what I am doing wrong?
Below you find some info on my (toy-)dataset and on my installation.

Thank you and all the best!
Marco

--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: [email protected]
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ






In [98]: print fds.summary()

Dataset: 108x100@float32, <sa: chunks,targets,time_coords,time_indices>, <fa:voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>

stats: mean=-0.0109682 std=0.992155 var=0.984372 min=-3.62952 max=4.32006

Counts of targets in each chunk:
  chunks\targets Whatd Whend Whetd
                  ---   ---   ---
       0.0         2     2     2
       1.0         2     2     2
       2.0         2     2     2
       3.0         2     2     2
       4.0         2     2     2
       5.0         2     2     2
       6.0         2     2     2
       7.0         2     2     2
       8.0         2     2     2
       9.0         2     2     2
      10.0         2     2     2
      11.0         2     2     2
      12.0         2     2     2
      13.0         2     2     2
      14.0         2     2     2
      15.0         2     2     2
      16.0         2     2     2
      17.0         2     2     2

Summary for targets across chunks
  targets mean std min max #chunks
  Whatd     2   0   2   2     18
  Whend     2   0   2   2     18
  Whetd     2   0   2   2     18

Summary for chunks across targets
  chunks mean std min max #targets
    0      2   0   2   2      3
    1      2   0   2   2      3
    2      2   0   2   2      3
    3      2   0   2   2      3
    4      2   0   2   2      3
    5      2   0   2   2      3
    6      2   0   2   2      3
    7      2   0   2   2      3
    8      2   0   2   2      3
    9      2   0   2   2      3
   10      2   0   2   2      3
   11      2   0   2   2      3
   12      2   0   2   2      3
   13      2   0   2   2      3
   14      2   0   2   2      3
   15      2   0   2   2      3
   16      2   0   2   2      3
   17      2   0   2   2      3
Sequence statistics for 108 entries from set ['Whatd', 'Whend', 'Whetd']
Counter-balance table for orders up to 2:
Targets/Order O1        |  O2        |
    Whatd:    35  1  0  |  34  2  0  |
    Whend:     0 35  1  |   0 34  2  |
    Whetd:     0  0 35  |   0  0 34  |
Correlations: min=-0.5 max=0.96 mean=-0.0093 sum(abs)=47





In [99]: mvpa2.wtf()
Out[99]:
Current date:   2015-08-17 16:32
PyMVPA:
 Version:       2.3.1
 Hash:          d1da5a749dc9cc606bd7f425d93d25464bf43454
 Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 Version control (GIT):

GIT information could not be obtained due"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"

SYSTEM:
 OS:            posix Linux 4.1.0-1-amd64 #1 SMP Debian 4.1.3-1 (2015-08-03)
 Distribution:  debian/stretch/sid
EXTERNALS:

Present: atlas_fsl, cPickle, ctypes, goodscipy.stats.rv_continuous._reduce_func(floc,fscale), goodscipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge2.4, mock, nibabel, nipy, nose, numpy, numpy_correct_unique, pprocess, pylab,pylab plottable, pywt, pywt wp reconstruct, reportlab, running ipython env,scipy, sg ge 0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.mpd,shogun.svmocas, skl, statsmodels, weaveAbsent: atlas_pymvpa, cran-energy, elasticnet, glmnet, goodscipy.stats.rdist, hcluster, lars, mass, nipy.neurospin, numpydoc, openopt, pywtwp reconstruct fixed, rpy2, shogun.krr, shogun.lightsvm, shogun.svrlight

 Versions of critical externals:
  ctypes      : 1.1.0
  h5py        : 2.5.0
  hdf5        : 1.8.13
  ipython     : 2.3.0
  lxml        : 3.4.4
  matplotlib  : 1.4.2
  mdp         : 3.4
  mock        : 1.3.0
  nibabel     : 2.0.1
  nipy        : 0.4.0.dev
  numpy       : 1.8.2
  pprocess    : 0.5
  reportlab   : 3.2.0
  scipy       : 0.14.1
  shogun      : 3.2.0
  shogun:full : 3.2.0_2014-2-17_18:46
  shogun:rev  : 197120
  skl         : 0.16.1
 Matplotlib backend: TkAgg
RUNTIME:
 PyMVPA Environment Variables:

PYTHONPATH :":/usr/lib/python2.7/lib-old:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages/wx-3.0-gtk2:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-dynload:/usr/bin:.:/home/marco/.ipython:/usr/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7:/usr/lib/python2.7/dist-packages/IPython/extensions:/usr/lib/python2.7:/usr/lib/python2.7/dist-packages/PILcompat:/home/marco/.cache/scipy/python27_compiled:/usr/lib/python2.7/dist-packages/gtk-2.0:/home/marco/data/bicocca/MVPA_IntentionToMove/mvpa_itm"

 PyMVPA Runtime Configuration:
  [general]
  verbose = 1

  [externals]
  have running ipython env = yes
  have ipython = yes
  have numpy = yes
  have scipy = yes
  have matplotlib = yes
  have h5py = yes
  have reportlab = yes
  have weave = yes
  have good scipy.stats.rdist = no
  have good scipy.stats.rv_discrete.ppf = yes
  have good scipy.stats.rv_continuous._reduce_func(floc,fscale) = yes
  have pylab = yes
  have lars = no
  have elasticnet = no
  have glmnet = no
  have skl = yes
  have ctypes = yes
  have libsvm = yes
  have shogun = yes
  have sg ge 0.6.5 = yes
  have shogun.mpd = yes
  have shogun.lightsvm = no
  have shogun.svrlight = no
  have shogun.krr = no
  have shogun.svmocas = yes
  have sg_fixedcachesize = yes
  have openopt = no
  have nibabel = yes
  have mdp = yes
  have mdp ge 2.4 = yes
  have nipy = yes
  have statsmodels = yes
  have pywt = yes
  have cpickle = yes
  have gzip = yes
  have cran-energy = no
  have griddata = yes
  have nipy.neurospin = no
  have lxml = yes
  have atlas_fsl = yes
  have atlas_pymvpa = no
  have hcluster = no
  have hdf5 = yes
  have joblib = yes
  have liblapack.so = yes
  have libsvm verbosity control = yes
  have mass = no
  have mock = yes
  have nose = yes
  have numpy_correct_unique = yes
  have numpydoc = no
  have pprocess = yes
  have pylab plottable = yes
  have pywt wp reconstruct = yes
  have pywt wp reconstruct fixed = no
  have rpy2 = no
  have sg ge 0.6.4 = yes
 Process Information:
  Name: ipython
  State:        R (running)
  Tgid: 8970
  Ngid: 0
  Pid:  8970
  PPid: 2053
  TracerPid:    0
  Uid:  1000    1000    1000    1000
  Gid:  1000    1000    1000    1000
  FDSize:       256

Groups: 6 7 20 24 25 27 29 30 44 46 100 104 113 114 116 121 124 132 139999 1000

  NStgid:       8970
  NSpid:        8970
  NSpgid:       8970
  NSsid:        2053
  VmPeak:        1748880 kB
  VmSize:        1099484 kB
  VmLck:               0 kB
  VmPin:               0 kB
  VmHWM:          883520 kB
  VmRSS:          225784 kB
  VmData:         376948 kB
  VmStk:             136 kB
  VmExe:            3220 kB
  VmLib:          119892 kB
  VmPTE:            1732 kB
  VmPMD:              16 kB
  VmSwap:              0 kB
  Threads:      11
  SigQ: 0/126979
  SigPnd:       0000000000000000
  ShdPnd:       0000000000000000
  SigBlk:       0000000000000000
  SigIgn:       0000000001001000
  SigCgt:       0000000180000002
  CapInh:       0000000000000000
  CapPrm:       0000000000000000
  CapEff:       0000000000000000
  CapBnd:       0000003fffffffff
  Seccomp:      0
  Cpus_allowed: ff
  Cpus_allowed_list:    0-7
  Mems_allowed: 00000000,00000001
  Mems_allowed_list:    0
  voluntary_ctxt_switches:      90582
  nonvoluntary_ctxt_switches:   5294



_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

[pymvpa] postproc=mean_sample(), is there a bug?

Reply via email to